What's the tool?


This tool can be used to split training set and test set by picking a subset of diverse molecules. The similarity of ECFP6 fingerprints based on 'DiceSimilarity' is employed to calculate distances between molecular objects, which guarantees the molecular diversity.
| Step 1: Upload data file
Example


| Step 2: Set parameters
Set the training set size for the data
The value should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the training set.


Set the random state for the data
Seed for the random number generator.