News and Announcements » |
Description:
This script trains a supervised classifier using OTUs (or other continuous input sample x observation data) as predictors, and a mapping file column containing discrete values as the class labels.
- Outputs:
- predictions.txt: the labels predicted by the classifier for the given
- samples. Each sample is predicted by a model that was trained without it.
- probabilities.txt: the label probabilities for each of the given
- samples. (if available)
- summary.txt: a summary of the results, including the expected
- generalization error of the classifier
- features.txt: a list of discriminative OTUs with their associated
- importance scores (if available)
- params.txt: a list of any non-default parameters used in training
- the model.
It is strongly recommended that you remove low-depth samples and rare OTUs before running this script. This can drastically reduce the run-time, and in many circumstances will not hurt performance. It is also recommended to perform rarefaction to control for sampling effort before running this script. For example, to rarefy at depth 200, then remove remove OTUs present in < 10 samples run:
single_rarefaction.py -i otu_table_filtered.txt -d 200 -o otu_table_rarefied200.txt filter_otu_table.py -i otu_table_rarefied200.txt -s 10
Run this script with “–show_params” to see how to set any model-specific parameters. For an overview of the application of supervised classification to microbiota, see PubMed ID 21039646.
This script requires that R is installed and in the search path. To install R visit: http://www.r-project.org/. Once R is installed, run R and excecute the command “install.packages(“randomForest”)”, then type q() to exit.
Usage: supervised_learning.py [options]
Input Arguments:
Note
[REQUIRED]
[OPTIONAL]
Output:
Outputs a ranking of features (e.g. OTUs) by importance, an estimation of the generalization error of the classifier, and the predicted class labels and posterior class probabilities according to the classifier.
Simple example of random forests classifier:
supervised_learning.py -i otutable.txt -m map.txt -c 'Individual' -o ml
Simple example, filter OTU table first:
single_rarefaction.py -i otu_table_filtered.txt -d 200 -o otu_table_rarefied200.txt
filter_otu_table.py -i otu_table_rarefied200.txt -s 10
supervised_learning.py -i otutable_filtered_rarefied200.txt -m map.txt -c 'Individual' -o ml
Getting a sample params file for the random forests classifier:
supervised_learning.py -i otutable.txt -m map.txt -c 'Individual' -o ml --show_params
Running with a user-specified params file for the random forests classifier:
supervised_learning.py -i otutable.txt -m map.txt -c 'Individual' -o ml -p params.txt