News and Announcements » |
Description:
This filter allows for the removal of sequences and OTUs that either do or don’t match specified metadata, for instance, isolating samples from a specific set of studies or body sites. This script identifies samples matching the specified metadata criteria, and outputs a filtered mapping file and OTU table containing only the specified samples.
Usage: filter_by_metadata.py [options]
Input Arguments:
Note
[REQUIRED]
[OPTIONAL]
Output:
The result is a filtered OTU table and mapping file meeting the desired criteria.
Examples:
The following command can be used, where all options are passed (using the resulting OTU file from make_otu_table.py, the original Fasting_Map.txt, and keeping only the Control sequences in the Treatment field) with the resulting data being written to otu_table.txt.filtered.xls and Fasting_Map.txt.filtered.xls:
filter_by_metadata.py -i otu_table.txt -m Fasting_Map.txt -s 'Treatment:Control'
Some variations (not so useful on this dataset, but more useful on larger datasets) are:
Keeping both Control and Fast in the Treatment field (i.e. keeping everything):
filter_by_metadata.py -i otu_table.txt -m Fasting_Map.txt -s 'Treatment:Control,Fast'
Excluding Fast in the Treatment field (same as the first example) - the syntax here is “*” to keep everything, then !Fast to eliminate the Fast group:
filter_by_metadata.py -i otu_table.txt -m Fasting_Map.txt -s 'Treatment:*,!Fast'
Keeping only samples with both Control in the Treatment field and 20061218 in the DOB field:
filter_by_metadata.py -i otu_table.txt -m Fasting_Map.txt -s 'Treatment:Control;DOB:20061218'
Keeping only samples with Control in the Treatment field and OTUs with counts of at least 5 across samples:
filter_by_metadata.py -i otu_table.txt -m Fasting_Map.txt -s 'Treatment:Control' -n 5
Note that the filtered mapping file will automatically exclude any columns that are the same for all the samples that are left, and will also exclude (except for SampleID) any columns that are different for all the samples that are left, making it more useful for downstream analyses with the coloring tools.