Search

Sarthak Dash Phones & Addresses

  • Jersey City, NJ
  • New York, NY

Resumes

Resumes

Sarthak Dash Photo 1

Sarthak Dash

View page
Location:
Redmond, WA
Industry:
Research
Work:
Ibm Jun 2016 - Jun 2016
Cognitive Software Engineer

Columbia University In the City of New York Sep 2013 - May 2014
Graduate Student

The D. E. Shaw Group Jul 2011 - Jul 2013
Software Developer

Insead Jan 2011 - Jun 2011
Research Assistant

Tata Institute of Fundamental Research Jun 2009 - Jul 2009
Visiting Student Research Scholar
Education:
Columbia University In the City of New York 2013 - 2015
Masters, Computer Science
Birla Institute of Technology and Science, Pilani 2006 - 2011
Master of Science, Bachelor of Engineering, Bachelors, Masters, Mathematics, Computer Science
Tata Institute of Fundamental Research 2009
St. Paul's School, Rourkela
Skills:
Research
Teaching
Algorithms
Coding Experience
Web Applications
Applied Mathematics
Mathematics
Matlab
Mathematical Analysis
Mathematical Modeling
Java
Analysis
C++
Programming
Machine Learning
Python
Data Analysis
Interests:
Children
Environment
Education
Science and Technology
Arts and Culture
Languages:
English
Hindi
Oriya
Certifications:
Quantum Mechanics and Quantum Computation
Cryptography I
Natural Language Processing
Machine Learning
Coursera
Sarthak Dash Photo 2

Sarthak Dash

View page
Sarthak Dash Photo 3

Sarthak Dash

View page

Publications

Us Patents

Discovering Ranked Domain Relevant Terms Using Knowledge

View page
US Patent:
20210326636, Oct 21, 2021
Filed:
Apr 16, 2020
Appl. No.:
16/850735
Inventors:
- Armonk NY, US
Ruchi Mahindru - Elmsford NY, US
Md Faisal Mahbub Chowdhury - Woodside NY, US
Yu Deng - Yorktown Heights NY, US
Alfio Massimiliano Gliozzo - Brooklyn NY, US
Sarthak Dash - Jersey City NJ, US
Nicolas Rodolfo Fauceglia - Brooklyn NY, US
Gaetano Rossiello - Brooklyn NY, US
International Classification:
G06K 9/62
G06N 5/02
G06N 20/00
G06F 40/40
G06F 40/30
G06F 40/205
Abstract:
One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.

Deep Symbolic Validation Of Information Extraction Systems

View page
US Patent:
20200218968, Jul 9, 2020
Filed:
Jan 7, 2019
Appl. No.:
16/241569
Inventors:
- Armonk NY, US
Sarthak Dash - Jersey City NJ, US
Michael Robert Glass - Bayonne NJ, US
Mustafa Canim - Ossining NY, US
International Classification:
G06N 3/08
G06F 21/55
Abstract:
A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.

Noise Detection In Knowledge Graphs

View page
US Patent:
20200097861, Mar 26, 2020
Filed:
Sep 25, 2018
Appl. No.:
16/141303
Inventors:
- Armonk NY, US
Oktie Hassanzadeh - Port Chester NY, US
Alfio Massimiliano Gliozzo - Brooklyn NY, US
Sarthak Dash - Jersey City NJ, US
International Classification:
G06N 99/00
G06F 17/30
Abstract:
Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.

Greedy Active Learning For Reducing Labeled Data Imbalances

View page
US Patent:
20180032900, Feb 1, 2018
Filed:
Jul 27, 2016
Appl. No.:
15/220895
Inventors:
- Armonk NY, US
Sarthak Dash - Jersey City NJ, US
Alfio M. Gliozzo - Brooklyn NY, US
International Classification:
G06N 99/00
G06F 17/27
G06F 17/24
G06N 7/00
G06F 17/30
Abstract:
A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.

Greedy Active Learning For Reducing User Interaction

View page
US Patent:
20180032901, Feb 1, 2018
Filed:
Jul 27, 2016
Appl. No.:
15/220902
Inventors:
- Armonk NY, US
Sarthak Dash - Jersey City NJ, US
Alfio M. Gliozzo - Brooklyn NY, US
International Classification:
G06N 99/00
G06F 17/27
G06F 17/24
G06F 17/30
Abstract:
A method, system and computer-usable medium are disclosed for reducing user interaction when training an active learning system. Source input containing unlabeled instances and an input category are received. A Latent Semantic Analysis (LSA) similarity score, and a search engine score, are generated for each unlabeled instance, which in turn are used with the input category to rank the unlabeled instances. If a first threshold for negative instances has been met, a first unlabeled instance, having the highest ranking, is selected for annotation from the ranked collection of unlabeled instances and provided to a user for annotation with a positive label. If a second threshold for positive instances has been met, then second unlabeled instance, having the lowest ranking, is selected for annotation from the ranked collection of unannotated instances and automatically annotated with a negative label. The annotated instances are then used to train an active learning system.
Sarthak Dash from Jersey City, NJ Get Report