Inventors:
Girish Maskeri Rama - Bangalore, IN
Kenneth Heafield - Pittsburgh PA, US
Santonu Sarkar - Bangalore, IN
Assignee:
Infosys Limited - Bangalore
International Classification:
G06F 9/44
US Classification:
717122, 717107, 717108, 717115, 717116, 717121
Abstract:
Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.