Shubh B Shah from 1133 Buckbrush Dr, Folsom, CA 95630, age 42

Resumes

Gpu Core Subsystem

View page

Location:

1315 Vineyard Ct, Folsom, CA

Industry:

Semiconductors

Work:

Intel Corporation May 2016 - Aug 2019
Design Lead | Senior Engineering Manager - Graphics Processor Development
Amd May 2016 - Aug 2019
Gpu Core Subsystem - Head of Silicon Development
Intel Corporation Apr 2013 - May 2016
Technical Lead - Gpu Shader Core
Intel Corporation Jan 2006 - Apr 2013
Silicon Hardware Design Engineer - Graphics Processor Development
Jan 2006 - Apr 2013
Gpu Core Subsystem

Education:

University of California, Davis 2014
University of California, Davis - Graduate School of Management 2011 - 2014
Master of Business Administration, Masters, Marketing, Finance Binghamton University 2003 - 2005
Masters, Electronics Engineering L D College of Engineering 2003
L.d. College of Engineering
Shree Sahajanand School

Skills:

Soc
Verilog
Debugging
Semiconductors
Vlsi
Product Management
Program Management
Hardware Architecture
Asic
Product Development
Leadership
Computer Architecture
Cross Functional Team Leadership
Business Development
Logic Design
System on A Chip
Very Large Scale Integration
Strategic Planning
Systemverilog
Fpga
Perl
Ic
Intel
Integrated Circuits
Financial Modeling
Capital Markets
Product Design
Statistical Data Analysis
Team Leadership
Market Research
Marketing Strategy
Cost Management
Process Improvement
Supply Chain
Operations Management
Mergers and Acquisitions
Venture Capital
System Verification
Business Strategy
Engineering Management
Product Engineering
Electronics
Vhdl
Functional Verification
Processors
Rtl Design

Languages:

English

Publications

Us Patents

Instruction And Logic For Systolic Dot Product With Accumulate

View page

US Patent:

20210303299, Sep 30, 2021

Filed:

Jun 15, 2021

Appl. No.:

17/304153

Inventors:

- Santa Clara CA, US
SUPRATIM PAL - Bangalore, IN
ASHUTOSH GARG - Folsom CA, US
CHANDRA S. GURRAM - Folsom CA, US
JORGE E. PARRA - El Dorado Hills CA, US
JUNJIE GU - Santa Clara CA, US
KONRAD TRIFUNOVIC - Mierzyn, PL
HONG BIN LIAO - Beijing, CN
MIKE B. MACPHERSON - Portland OR, US
SHUBH B. SHAH - Folsom CA, US
SHUBRA MARWAHA - Santa Clara CA, US
STEPHEN JUNKINS - Bend OR, US
TIMOTHY R. BAUER - Santa Clara CA, US
VARGHESE GEORGE - Folsom CA, US
WEIYU CHEN - San Jose CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 9/30
G06T 1/20
G06F 9/38

Abstract:

Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Sharing Register File Usage Between Fused Processing Resources

View page

US Patent:

20210089301, Mar 25, 2021

Filed:

Sep 25, 2019

Appl. No.:

16/582406

Inventors:

- Santa Clara CA, US
VARGHESE GEORGE - Folsom CA, US
JOYDEEP RAY - Folsom CA, US
ASHUTOSH GARG - Folsom CA, US
JORGE PARRA - El Dorado Hills CA, US
SHUBH SHAH - Folsom CA, US
SHUBRA MARWAHA - Folsom CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 9/30
G06F 17/16
G06F 9/50

Abstract:

Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

Sparse Matrix Multiplication Acceleration Mechanism

View page

US Patent:

20210073318, Mar 11, 2021

Filed:

Sep 5, 2019

Appl. No.:

16/561715

Inventors:

- Santa Clara CA, US
MATHEW NEVIN - FAIR OAKS CA, US
JORGE PARRA - EL DORADO HILLS CA, US
ASHUTOSH GARG - FOLSOM CA, US
SHUBRA MARWAHA - SANTA CLARA CA, US
SHUBH SHAH - FOLSOM CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 17/16
G06F 7/487
G06F 9/30
G06F 13/16

Abstract:

An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

Software Scoreboard Information And Synchronization

View page

US Patent:

20190362460, Nov 28, 2019

Filed:

Jun 11, 2019

Appl. No.:

16/437961

Inventors:

- Santa Clara CA, US
Supratim Pal - Bangalore, IN
Jorge E. Parra - El Dorado Hills CA, US
Chandra S. Gurram - Folsom CA, US
Ashwin J. Shivani - El Dorado Hills CA, US
Ashutosh Garg - Folsom CA, US
Brent A. Schwartz - Sacramento CA, US
Jorge F. Garcia Pabon - Folsom CA, US
Darin M. Starkey - Roseville CA, US
Shubh B. Shah - Folsom CA, US
Kaiyu Chen - San Jose CA, US
Konrad Trifunovic - Mierzyn, PL
Buqi Cheng - San Jose CA, US
Weiyu Chen - San Jose CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06T 1/20
G06F 9/30
G06F 9/38
G06F 8/41

Abstract:

Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

Instruction And Logic For Systolic Dot Product With Accumulate

View page

US Patent:

20190324746, Oct 24, 2019

Filed:

Apr 19, 2018

Appl. No.:

15/957728

Inventors:

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 9/30
G06F 9/38
G06T 1/20

Abstract:

Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Fusion Of Simd Processing Units

View page

US Patent:

20190265973, Aug 29, 2019

Filed:

Feb 23, 2018

Appl. No.:

15/903283

Inventors:

- Santa Clara CA, US
Supratim Pal - Bangalore, IN
Ashutosh Garg - Folsom CA, US
Darin M. Starkey - Roseville CA, US
Jorge E. Parra - El Dorado Hills CA, US
Shubh B. Shah - Folsom CA, US
Wei-Yu Chen - San Jose CA, US
Vikranth Vemulapalli - Folsom CA, US
Narsim Krishna - Bangalore, IN
Brent A. Schwartz - Sacramento CA, US
Chandra S. Gurram - Folsom CA, US
Wei Pan - Santa Clara CA, US
Ashwin J. Shivani - El Dorado Hills CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 9/30
G06F 9/38
G06T 1/20

Abstract:

Methods and apparatus relating to techniques for fusing SIMD processing units. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an instruction set for execution on at least two graphics processing execution units, determine whether the instruction set requires data dependent addressing, and select between a synchronized execution environment for the at least two graphics processing units and an unsynchronized execution environment for the at least two graphics processing units based at least in part on the determination whether the instruction set requires data dependent addressing. Other embodiments are also disclosed and claimed.

Reduced Power Implementation Of Computer Instructions

View page

US Patent:

20160189327, Jun 30, 2016

Filed:

Dec 26, 2014

Appl. No.:

14/583300

Inventors:

- Santa Clara CA, US
SHUBH B. SHAH - Folsom CA, US
ASHUTOSH GARG - Folsom CA, US
JIN XU - El Dorado Hills CA, US
THOMAS A. PIAZZA - Granite Bay CA, US
JORGE F. GARCIA PABON - Folsom CA, US
MICHAEL K. DWYER - El Dorado Hills CA, US

International Classification:

G06T 1/20
G09G 5/00
G06T 15/80

Abstract:

Systems and methods may provide a graphics processor that may identify operating conditions under which certain floating point instructions may utilize power to fewer hardware resources compared to when the instructions are executing under other operating conditions. The operating conditions may be determined by examining operands used in a given instruction, including the relative magnitudes of the operands and whether the operands may be taken as equal to certain defined values. The floating point instructions may include instructions for an addition operation, a multiplication operation, a compare operation, and/or a fused multiply-add operation.

Method And Apparatus For A High Throughput Rasterizer

View page

US Patent:

20160180585, Jun 23, 2016

Filed:

Dec 23, 2014

Appl. No.:

14/581701

Inventors:

- Santa Clara CA, US
THOMAS A. PIAZZA - New York NY, US
JORGE F. GARCIA PABON - Folsom CA, US
SHUBH B. SHAH - Folsom CA, US

International Classification:

G06T 17/10
G06K 9/46

Abstract:

An apparatus and method are described for a high throughput rasterizer. For example, one embodiment of an apparatus comprises: block selection logic to select a plurality of pixel blocks associated with edges of a primitive, the plurality of pixel blocks selected based on the pixel blocks having samples which are both inside and outside of the primitive; and edge determination logic to analyze samples of the plurality of pixel blocks selected by the block selection logic and responsively generate data identifying each edge of the primitive; and final mask determination logic to combine the data identifying each edge and generate a final mask representing the primitive.

Videos & Images

Youtube

Glimpse of Us - Joji | Cover by Shubh Shah (Acoust...

Glimpse of Us - Joji Cover by Shubh Shah (Acoustic ) I hope you enjoy ...

Duration:

1m 42s

Dil Ko Karaar Aaya - Cover by Shubh Shah

Dil Ko Karaar Aaya - Sidharth Shukla & Neha Sharma | Neha Kakkar & Yas...

Duration:

2m 50s

Shubh Shagun | | Full Episode 53 | New Show | ...

Shubh Shagun | | Full Episode 53 | New Show | Dangal TV #ShubhShagun...

Duration:

22m 4s

Phir Le Aaya Dil - Barfi| Arijit Singh, Pritam (Co...

Song: Phir Le Aaya Dil - Barfi | Arijit Singh, Pritam Cover by Shubh S...

Duration:

2m 18s

Until I Found Her - Stephen Sanchez | Cover by Shu...

Song: Until I Found You - Stephen Sanchez Cover by Shubh Shah I hope y...

Duration:

2m 32s

When You're Gone - Shawn Mendes (Cover by Shubh Sh...

When You're Gone - Shawn Mendes Cover by Shubh Shah I hope you enjoy i...

Duration:

1m 5s

Shubh B Shah

Shubh Shah Phones & Addresses

Work

Education

Skills

Languages

Industries

Resumes

Resumes

Gpu Core Subsystem

Publications

Us Patents

Instruction And Logic For Systolic Dot Product With Accumulate

Sharing Register File Usage Between Fused Processing Resources

Sparse Matrix Multiplication Acceleration Mechanism

Software Scoreboard Information And Synchronization

Instruction And Logic For Systolic Dot Product With Accumulate

Fusion Of Simd Processing Units

Reduced Power Implementation Of Computer Instructions

Method And Apparatus For A High Throughput Rasterizer

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah

Shubh Shah