http://www.vcclab.org

Virtual Computational Chemistry Laboratory

Home
About
Partners
Software
Articles
Servers
Download
Web Services
How to cite?
Contact










Welcome to the Partial Least Squares Regression (PLSR)

start the program

mirror connection

PLSR statistical analysis module performs model construction and prediction of activity/property using the Partial Least Squares (PLS) regression technique [1-3]. It is based on linear transition from a large number of original descriptors to a small number of orthogonal factors (latent variables) providing the optimal linear model in terms of predictivity (characterized by the Q2 value). More detailed explanation of method and algorithms is available.

It is well known that Partial Least Squares (PLS) regression is quite sensitive to the noise created by the excessive irrelevant descriptors. To achieve the best model quality, two-step descriptor selection procedure [4] is applied. The first step consists in the elimination of the low-variable (almost constant) descriptors that are different from a constant only for a few (2-3) compounds in the training set. Such descriptors cannot provide useful statistical information and simply help to fit these particular compounds, thus decreasing the predictivity. At the second step, the descriptor subset is optimized using Q2-guided descriptor selection by means of a genetic algorithm. Despite the stochastic nature of this technique, computational experiments demonstrate reasonable stability of the results.

The same code base is successfully employed in software implementing the Molecular Field Topology Analysis (MFTA) technique proposed by us [5] for QSAR studies of organic compounds.

This software was  developed by E.V. Radchenko, V.A. Palyulin and N.S. Zefirov, Department of Chemistry, Moscow State University, Moscow 119992 Russia. The data input format is described here.


References
  1. Martens H., Naes T. Multivariate Calibration. Chichester etc.: Wiley, 1989.
  2. Höskuldsson A. PLS regression methods. J. Chemometrics., 1988, 2(3) 211-228.
  3. Eriksson L., Johansson E., Kettaneh-Wold N., Wold S. Multi- and megavariate data analysis: Principles and applications. Umeå: Umetrics, 2001.
  4. Palyulin V.A., RadchenkoE.V., Baranova O.D.,  Oliferenko A.A.,  Zefirov N.S., MFTA: Recent Extensions of Molecular Field Topology Analysis, In EuroQSAR2002 Designing Drugs and Crop Protectants: processes, problems and solutions, Blackwell Publishing. Bournemouth, UK,  2003, 188-190.
  5. Palyulin V.A., Radchenko E.V., Zefirov N.S. Molecular Field Topology Analysis method in QSAR studies of organic compounds. J. Chem. Inf. Comp. Sci., 2000, 40(3), 659-667.


Copyright 2001 -- 2011 http://www.vcclab.org. All rights reserved.