Search

Shubh Shah Phones & Addresses

  • 1133 Buckbrush Dr, Folsom, CA 95630
  • Rancho Cordova, CA
  • Binghamton, NY
  • Johnson City, NY
  • Sacramento, CA

Work

Company: Intel corporation May 2016 to Aug 2019 Position: Design lead | senior engineering manager - graphics processor development

Education

School / High School: University of California, Davis 2014

Skills

Soc • Verilog • Debugging • Semiconductors • Vlsi • Product Management • Program Management • Hardware Architecture • Asic • Product Development • Leadership • Computer Architecture • Cross Functional Team Leadership • Business Development • Logic Design • System on A Chip • Very Large Scale Integration • Strategic Planning • Systemverilog • Fpga • Perl • Ic • Intel • Integrated Circuits • Financial Modeling • Capital Markets • Product Design • Statistical Data Analysis • Team Leadership • Market Research • Marketing Strategy • Cost Management • Process Improvement • Supply Chain • Operations Management • Mergers and Acquisitions • Venture Capital • System Verification • Business Strategy • Engineering Management • Product Engineering • Electronics • Vhdl • Functional Verification • Processors • Rtl Design

Languages

English

Industries

Semiconductors

Resumes

Resumes

Shubh Shah Photo 1

Gpu Core Subsystem

View page
Location:
1315 Vineyard Ct, Folsom, CA
Industry:
Semiconductors
Work:
Intel Corporation May 2016 - Aug 2019
Design Lead | Senior Engineering Manager - Graphics Processor Development

Amd May 2016 - Aug 2019
Gpu Core Subsystem - Head of Silicon Development

Intel Corporation Apr 2013 - May 2016
Technical Lead - Gpu Shader Core

Intel Corporation Jan 2006 - Apr 2013
Silicon Hardware Design Engineer - Graphics Processor Development

Jan 2006 - Apr 2013
Gpu Core Subsystem
Education:
University of California, Davis 2014
University of California, Davis - Graduate School of Management 2011 - 2014
Master of Business Administration, Masters, Marketing, Finance
Binghamton University 2003 - 2005
Masters, Electronics Engineering
L D College of Engineering 2003
L.d. College of Engineering
Shree Sahajanand School
Skills:
Soc
Verilog
Debugging
Semiconductors
Vlsi
Product Management
Program Management
Hardware Architecture
Asic
Product Development
Leadership
Computer Architecture
Cross Functional Team Leadership
Business Development
Logic Design
System on A Chip
Very Large Scale Integration
Strategic Planning
Systemverilog
Fpga
Perl
Ic
Intel
Integrated Circuits
Financial Modeling
Capital Markets
Product Design
Statistical Data Analysis
Team Leadership
Market Research
Marketing Strategy
Cost Management
Process Improvement
Supply Chain
Operations Management
Mergers and Acquisitions
Venture Capital
System Verification
Business Strategy
Engineering Management
Product Engineering
Electronics
Vhdl
Functional Verification
Processors
Rtl Design
Languages:
English

Publications

Us Patents

Instruction And Logic For Systolic Dot Product With Accumulate

View page
US Patent:
20210303299, Sep 30, 2021
Filed:
Jun 15, 2021
Appl. No.:
17/304153
Inventors:
- Santa Clara CA, US
SUPRATIM PAL - Bangalore, IN
ASHUTOSH GARG - Folsom CA, US
CHANDRA S. GURRAM - Folsom CA, US
JORGE E. PARRA - El Dorado Hills CA, US
JUNJIE GU - Santa Clara CA, US
KONRAD TRIFUNOVIC - Mierzyn, PL
HONG BIN LIAO - Beijing, CN
MIKE B. MACPHERSON - Portland OR, US
SHUBH B. SHAH - Folsom CA, US
SHUBRA MARWAHA - Santa Clara CA, US
STEPHEN JUNKINS - Bend OR, US
TIMOTHY R. BAUER - Santa Clara CA, US
VARGHESE GEORGE - Folsom CA, US
WEIYU CHEN - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/30
G06T 1/20
G06F 9/38
Abstract:
Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Sharing Register File Usage Between Fused Processing Resources

View page
US Patent:
20210089301, Mar 25, 2021
Filed:
Sep 25, 2019
Appl. No.:
16/582406
Inventors:
- Santa Clara CA, US
VARGHESE GEORGE - Folsom CA, US
JOYDEEP RAY - Folsom CA, US
ASHUTOSH GARG - Folsom CA, US
JORGE PARRA - El Dorado Hills CA, US
SHUBH SHAH - Folsom CA, US
SHUBRA MARWAHA - Folsom CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/30
G06F 17/16
G06F 9/50
Abstract:
Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

Sparse Matrix Multiplication Acceleration Mechanism

View page
US Patent:
20210073318, Mar 11, 2021
Filed:
Sep 5, 2019
Appl. No.:
16/561715
Inventors:
- Santa Clara CA, US
MATHEW NEVIN - FAIR OAKS CA, US
JORGE PARRA - EL DORADO HILLS CA, US
ASHUTOSH GARG - FOLSOM CA, US
SHUBRA MARWAHA - SANTA CLARA CA, US
SHUBH SHAH - FOLSOM CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 17/16
G06F 7/487
G06F 9/30
G06F 13/16
Abstract:
An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

Software Scoreboard Information And Synchronization

View page
US Patent:
20190362460, Nov 28, 2019
Filed:
Jun 11, 2019
Appl. No.:
16/437961
Inventors:
- Santa Clara CA, US
Supratim Pal - Bangalore, IN
Jorge E. Parra - El Dorado Hills CA, US
Chandra S. Gurram - Folsom CA, US
Ashwin J. Shivani - El Dorado Hills CA, US
Ashutosh Garg - Folsom CA, US
Brent A. Schwartz - Sacramento CA, US
Jorge F. Garcia Pabon - Folsom CA, US
Darin M. Starkey - Roseville CA, US
Shubh B. Shah - Folsom CA, US
Kaiyu Chen - San Jose CA, US
Konrad Trifunovic - Mierzyn, PL
Buqi Cheng - San Jose CA, US
Weiyu Chen - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06T 1/20
G06F 9/30
G06F 9/38
G06F 8/41
Abstract:
Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

Instruction And Logic For Systolic Dot Product With Accumulate

View page
US Patent:
20190324746, Oct 24, 2019
Filed:
Apr 19, 2018
Appl. No.:
15/957728
Inventors:
- Santa Clara CA, US
SUPRATIM PAL - Bangalore, IN
ASHUTOSH GARG - Folsom CA, US
CHANDRA S. GURRAM - Folsom CA, US
JORGE E. PARRA - El Dorado Hills CA, US
JUNJIE GU - Santa Clara CA, US
KONRAD TRIFUNOVIC - Mierzyn, PL
HONG BIN LIAO - Beijing, CN
MIKE B. MACPHERSON - Portland OR, US
SHUBH B. SHAH - Folsom CA, US
SHUBRA MARWAHA - Santa Clara CA, US
STEPHEN JUNKINS - Bend OR, US
TIMOTHY R. BAUER - Santa Clara CA, US
VARGHESE GEORGE - Folsom CA, US
WEIYU CHEN - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/30
G06F 9/38
G06T 1/20
Abstract:
Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Fusion Of Simd Processing Units

View page
US Patent:
20190265973, Aug 29, 2019
Filed:
Feb 23, 2018
Appl. No.:
15/903283
Inventors:
- Santa Clara CA, US
Supratim Pal - Bangalore, IN
Ashutosh Garg - Folsom CA, US
Darin M. Starkey - Roseville CA, US
Jorge E. Parra - El Dorado Hills CA, US
Shubh B. Shah - Folsom CA, US
Wei-Yu Chen - San Jose CA, US
Vikranth Vemulapalli - Folsom CA, US
Narsim Krishna - Bangalore, IN
Brent A. Schwartz - Sacramento CA, US
Chandra S. Gurram - Folsom CA, US
Wei Pan - Santa Clara CA, US
Ashwin J. Shivani - El Dorado Hills CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/30
G06F 9/38
G06T 1/20
Abstract:
Methods and apparatus relating to techniques for fusing SIMD processing units. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an instruction set for execution on at least two graphics processing execution units, determine whether the instruction set requires data dependent addressing, and select between a synchronized execution environment for the at least two graphics processing units and an unsynchronized execution environment for the at least two graphics processing units based at least in part on the determination whether the instruction set requires data dependent addressing. Other embodiments are also disclosed and claimed.

Reduced Power Implementation Of Computer Instructions

View page
US Patent:
20160189327, Jun 30, 2016
Filed:
Dec 26, 2014
Appl. No.:
14/583300
Inventors:
- Santa Clara CA, US
SHUBH B. SHAH - Folsom CA, US
ASHUTOSH GARG - Folsom CA, US
JIN XU - El Dorado Hills CA, US
THOMAS A. PIAZZA - Granite Bay CA, US
JORGE F. GARCIA PABON - Folsom CA, US
MICHAEL K. DWYER - El Dorado Hills CA, US
International Classification:
G06T 1/20
G09G 5/00
G06T 15/80
Abstract:
Systems and methods may provide a graphics processor that may identify operating conditions under which certain floating point instructions may utilize power to fewer hardware resources compared to when the instructions are executing under other operating conditions. The operating conditions may be determined by examining operands used in a given instruction, including the relative magnitudes of the operands and whether the operands may be taken as equal to certain defined values. The floating point instructions may include instructions for an addition operation, a multiplication operation, a compare operation, and/or a fused multiply-add operation.

Method And Apparatus For A High Throughput Rasterizer

View page
US Patent:
20160180585, Jun 23, 2016
Filed:
Dec 23, 2014
Appl. No.:
14/581701
Inventors:
- Santa Clara CA, US
THOMAS A. PIAZZA - New York NY, US
JORGE F. GARCIA PABON - Folsom CA, US
SHUBH B. SHAH - Folsom CA, US
International Classification:
G06T 17/10
G06K 9/46
Abstract:
An apparatus and method are described for a high throughput rasterizer. For example, one embodiment of an apparatus comprises: block selection logic to select a plurality of pixel blocks associated with edges of a primitive, the plurality of pixel blocks selected based on the pixel blocks having samples which are both inside and outside of the primitive; and edge determination logic to analyze samples of the plurality of pixel blocks selected by the block selection logic and responsively generate data identifying each edge of the primitive; and final mask determination logic to combine the data identifying each edge and generate a final mask representing the primitive.
Shubh B Shah from Folsom, CA, age ~42 Get Report