|News and Announcements »|
This script compares two taxa summary files by computing the correlation coefficient between pairs of samples. This is useful, for example, if you want to compare the taxonomic composition of mock communities that were assigned using different taxonomy assigners in order to see if they are correlated or not. Another example use-case is to compare the taxonomic composition of several mock community replicate samples to a single expected, or known, sample community.
This script is also useful for sorting and filling taxa summary files so that each sample has the same taxa listed in the same order (with missing taxa reporting an abundance of zero). The sorted and filled taxa summary files can then be passed to a script, such as plot_taxa_summary.py, to visually compare the differences using the same taxa coloring scheme.
For more information and examples pertaining to this script, please refer to the accompanying tutorial, which can be found at http://qiime.org/tutorials/taxa_summary_comparison.html.
Usage: compare_taxa_summaries.py [options]
The script will always output at least three files to the specified output directory. Two files will be the sorted and filled versions of the input taxa summary files, which can then be used in plot_taxa_summary.py to visualize the differences in taxonomic composition. These files will be named based on the basename of the input files. If the input files’ basenames are the same, the output files will have ‘0’ and ‘1’ appended to their names to keep the filenames unique. The first input taxa summary file will have ‘0’ in its filename and the second input taxa summary file will have ‘1’ in its filename.
The third output file will contain the results of the overall comparison of the input taxa summary files using the specified sample pairings. The correlation coefficient, parametric p-value, nonparametric p-value, and a confidence interval for the correlation coefficient will be included.
If --perform_detailed_comparisons is specified, the fourth output file is a tab-separated file containing the correlation coefficients that were computed between each of the paired samples. Each line will contain the sample IDs of the samples that were compared, followed by the correlation coefficient that was computed, followed by the parametric and nonparametric p-values (uncorrrected and Bonferroni-corrected) and a confidence interval for the correlation coefficient.
The output files will contain comments at the top explaining the types of tests that were performed.
Paired sample comparison:
Compare all samples that have matching sample IDs between the two input taxa summary files using the pearson correlation coefficient. The first input taxa summary file is from the overview tutorial, using the RDP classifier with a confidence level of 0.60 and the gg_otus_4feb2011 97% representative set. The second input taxa summary file was generated the same way, except for using a confidence level of 0.80.
compare_taxa_summaries.py -i ts_rdp_0.60.txt,ts_rdp_0.80.txt -m paired -o taxa_comp
Paired sample comparison with sample ID map:
Compare samples based on the mappings in the sample ID map using the spearman correlation coefficient. The second input taxa summary file is simply the original ts_rdp_0.60.txt file with all sample IDs containing ‘PC.’ renamed to ‘S.’.
compare_taxa_summaries.py -i ts_rdp_0.80.txt,ts_rdp_0.60_renamed.txt -m paired -o taxa_comp_using_sample_id_map -s sample_id_map.txt -c spearman
Detailed paired sample comparison:
Compare all samples that have matching sample IDs between the two input taxa summary files using the pearson correlation coefficient. Additionally, compute the correlation coefficient between each pair of samples individually.
compare_taxa_summaries.py -i ts_rdp_0.60.txt,ts_rdp_0.80.txt -m paired -o taxa_comp_detailed --perform_detailed_comparisons
Compare all samples that have matching sample IDs between the two input taxa summary files using the pearson correlation coefficient. Perform a one-tailed (negative association) test of significance for both parametric and nonparametric tests. Additionally, compute a 90% confidence interval for the correlation coefficient. Note that the confidence interval will still be two-sided.
compare_taxa_summaries.py -i ts_rdp_0.60.txt,ts_rdp_0.80.txt -m paired -o taxa_comp_one_tailed -t low -l 0.90