Virtual Computational Chemistry Laboratory

Input data Output results Example List of key words



PARTITION

Keyword of Integer Type

Indicates how to subdivide initial training data set on learning and validation set.

no validation {0} -- use all data in the learning set. The neural network fits all data as much as possible.
validation {1} -- the data entries determined by keyword VALIDATION are used to monitor performance of neural networks. The statistical coefficients are calculated for two early stopping points (the first point, S1, corresponds to minimum RMSE for the validation set, and the second, S2, to RMSE minimum for the whole set) and minimum RMSE achieved by neural network.
random sets {2}  -- the same as procedure as validation, but in this case the validation set is selected by chance for each neural network in ensemble (see Figure). Thus, all networks are characterized by their own learning and validation sets. This allow to estimate LOO coefficients for the whole initial training set.
EPA {3} -- performs weighted selection of data entries for learning/validation set as described in Tetko & Villa, 1997. Besides weighting procedure the selection of data in both these sets is done by chance as with the random sets option.

We recommend to use random sets or EPA options. These options, by our opinion and experience provides the most correct estimation of the generalization of neural networks and does not suffer from overfitting/overtraining problems, as discussed in our publications. The other two options are provided mainly for comparison.
 

The default value and suggested value is random {2}.

See FAQ if you have questions. How to cite this applet? Are you looking for a new job in chemoinformatics?

Copyright 2001 -- 2023 https://vcclab.org. All rights reserved.