News and Announcements » |
This tutorial covers how to perform Procrustes Analysis using QIIME to compare UniFrac PCoA plots generated by two different processing pipelines. Procrustes analysis takes as input two coordinate matrices with corresponding points (in QIIME, these are generated by running principal_coordinates.py on a distance matrix generated by beta_diversity.py), and transforming the second coordinate set by rotating, scaling, and translating it to minimize the distances between corresponding points in the two shapes. This is done with transform_coordinate_matrices.py. The results can then be visualized using QIIME by running make_emperor.py and using the -c/–compare_plots option. Both sets of coordinates will be plotted in the resulting figure, with bars connecting the corresponding points from each data set.
For example, Figure 1 is a plot that was generated by comparing two different sets of reads, one set generated on 454 and the other generated on Illumina, from the same collection of samples. In this scenario, Procrustes analysis allows us to determine whether we would derive the same beta diversity conclusions, regardless of which technology was used to sequence the amplicons. This tutorial illustrates the steps used to generate this plot, beginning with closed reference OTU picking on both the 454 and the Illumina data. The data sets used in this tutorial are a subset of the data that was used in Supplementary Figure 1 of Moving Pictures of the Human Microbiome.
p-values are generated using a Monte Carlo simulation: sample identifiers are shuffled in one of the PC matrices, and the M2 value is re-computed --random_trials times. The proportion of M2 values that are equal to or lower than the actual M2 value is the Monte Carlo p-value.
This tutorial works through several steps in QIIME, culminating in a Procrustes analysis of a subset of the data presented in Moving Pictures of the Human Microbiome. This analysis compares beta diversity PCoA plots generated from 454 and Illumina sequences to test the hypothesis that the same beta diversity conclusions are consistent regardless of sequencing technology. For the sake of runtime, this tutorial is based on approximately 60 samples from each sequencing technologies, and a small subset of the total number of reads for each of these samples.
Do some initial set up for the tutorial. This version of the tutorial is based on running the QIIME 1.4.0 EC2 instance. You can always find the latest QIIME AMIs here.
cd
wget https://s3.amazonaws.com/s3-qiime_tutorial_files/moving_pictures_procrustes_demo.tgz
tar -xzf moving_pictures_procrustes_demo.tgz
screen
cd moving_pictures_procrustes_demo
Through-out this tutorial we make use of a reference sequence collection, tree, and taxonomy derived from the Greengenes database. As these files may be store in different locations on your system, we’ll define them as environment variables using the paths as they would be if you’re running in a QIIME virtual machine (e.g., on AWS or with the Virtual Box). We’ll then reference the environment variables through-out this tutorial when they are used. If you’re not working on either of these systems, you’ll have to modify these paths. Run the following:
export QIIME_DIR=$HOME/qiime_software
export GG_DIR=$QIIME_DIR/gg_13_5_otus-release
export reference_seqs=$GG_DIR/rep_set/97_otus.fasta
export reference_tree=$GG_DIR/trees/97_otus.tree
export reference_tax=$GG_DIR/taxonomy/97_otu_taxonomy.txt
Pick OTUs on Illumina data and generate an OTU table (including taxonomic assignment of samples):
pick_closed_reference_otus.py -i ./subsampled_illumina_seqs.fna -o ./illumina_ucrC/ -r $reference_seqs -t $reference_tax -aO8 -p ./otu_params.txt
Determine the number of sequences per sample and related statistics. You’ll want to choose an even sampling depth for the beta diversity analysis from these data. In this tutorial we choose the smallest number of sequences per sample (239).
biom summarize-table -i ./illumina_ucrC/uclust_ref_picked_otus/otu_table.biom -o ./illumina_ucrC/uclust_ref_picked_otus/otu_table_summary.txt
Compute UniFrac distances between samples, run principal coordinates analysis, and build 3D PCoA plots:
beta_diversity_through_plots.py -i ./illumina_ucrC/otu_table.biom -e 239 -o ./illumina_ucrC/bdiv_even239/ -t $reference_tree -m ./illumina_map.txt -aO8 -p ./bdiv_params.txt
Repeat the above steps on the 454 data:
pick_closed_reference_otus.py -i ./subsampled_454_seqs.fna -o ./454_ucrC/ -r $reference_seqs -t $reference_tax -aO8 -p ./otu_params.txt
biom summarize-table -i ./454_ucrC/otu_table.biom -o ./454_ucrC/otu_table_summary.txt
beta_diversity_through_plots.py -i ./454_ucrC/otu_table.biom -e 135 -o ./454_ucrC/bdiv_even135/ -t $reference_tree -m ./454_map.txt -aO8 -p ./bdiv_params.txt
Perform Procrustes analysis:
transform_coordinate_matrices.py -i ./illumina_ucrC/bdiv_even239/unweighted_unifrac_pc.txt,./454_ucrC/bdiv_even135/unweighted_unifrac_pc.txt -s ./procrustes_sid_map.txt -r 100 -o ./454_v_illumina/
Generate Procrustes plot, including an explicit time axis:
make_emperor.py -c -i ./454_v_illumina/ -o ./454_v_illumina/plots/ -m ./procrustes_metadata_map.txt --custom_axes days_since_epoch
There will now be several results of interest. For the Procrustes analysis you can find the statistical results in ./454_v_illumina/procrustes_results.txt and you can view the Procrustes plot by opening ./454_v_illumina/plots/index.html in a web browser.
In the cases described here, we always have the same samples in our two principal coordinate matrices. If that is not the case for your study, you’ll need to pass a sample id mapping file (which is different from a QIIME metadata mapping file). For a description of this file format, see Sample id map.