Inventors:
Xin Allan Lu - Springboro OH
David James Miller - Dayton OH
John Richard Wassum - Springboro OH
Assignee:
Lexis-Nexis - Miamisburg OH
International Classification:
G06F 1730
Abstract:
A phrase recognition method breaks streams of text into text "chunks" and selects certain chunks as "phrases" useful for automated full text searching. The phrase recognition method uses a carefully assembled list of partition elements to partition the text into the chunks, and selects phrases from the chunks according to a small number of frequency based definitions. The method can also incorporate additional processes such as categorization of proper names to enhance phrase recognition. The method selects phrases quickly and efficiently, referring simply to the phrases themselves and the frequency with which they are encountered, rather than relying on complex, time-consuming, resource-consuming grammatical analysis, or on collocation schemes of limited applicability, or on heuristical text analysis of limited reliability or utility.