Unsupervised Forward Selection (UFS) is a data reduction algorithm that selects from a data matrix a maximal linearly independent set of columns with a minimal amount of multiple correlation.
UFS was designed for use in the development of Quantitative Structure-Activity Relationship (QSAR) models, where the m by n data matrix contains the values of n variables (typically molecular properties) for m objects (typically compounds). QSAR data sets often contain redundancy (exact linear dependencies between subsets of the variables), and multicollinearity (high multiple correlations between subsets of the variables). Both of these features inhibit the development of QSAR models with the ability to generalise successfully to new objects. UFS produces a reduced data set that contains no redundancy and a minimal amount of multicollinearity.
UFS is a forward stepping algorithm that proceeds as follows. First, the two columns with the smallest squared pairwise correlation coefficient are chosen. Then, for each i > 2 , column i is chosen from the remaining columns to have the smallest squared multiple correlation coefficient with columns 1 to i-1. The process halts when the number of columns selected reaches the rank of the data matrix; that is, when the squared multiple correlation coefficient of each remaining column with those already selected equals one. Thus the algorithm builds a basis for the column space of the data matrix, minimising the multicollinearity in the reduced data set at each stage.
In practice, the
For further details, including applications of UFS to QSAR model
building, see D. C. Whitley, M. G. Ford and D. J. Livingstone,
Unsupervised Forward Selection: A Method for Eliminating Redundant
Variables, Journal of Chemical Information and Computer Sciences, 2000,
Copyright 2001 -- 2016 http://www.vcclab.org. All rights reserved.