Inventors:
Quan G. Cung - Austin TX, US
Harry Roger Kolar - Scottsdale AZ, US
Kevin Eric Norsworthy - Austin TX, US
Julio Ortega - The Colony TX, US
Frederick J. Scheibl - Austin TX, US
Vasken Torossian - Round Rock TX, US
Ben Peter Yuhas - Baltimore MD, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 17/10
US Classification:
703 2, 382224, 703 1, 703 6, 703 22
Abstract:
Attributes of a data set to be employed in generating a predictive model are analyzed based on entropy, chi-square, or similar statistical measure. A target group of samples exhibiting one or more desired attributes is identified, then remaining attribute values for the target group are compared to corresponding attribute values for the whole sample population. A subset of all available attributes is then selected from those attributes which exhibit, when comparing attribute values of target group samples to attribute values for the whole sample population, the greatest relative difference or divergence. This subset is employed to generate the predictive model. Efficiency in generating the predictive model and the accuracy of the resulting predictive model is improved, since fewer attributes are employed and less computational resources are required.