QIIME Overview Tutorial¶

Introduction¶

This tutorial explains how to use the QIIME (Quantitative Insights Into Microbial Ecology) Pipeline to process data from high-throughput 16S rRNA sequencing studies. The purpose of this pipeline is to provide a start-to-finish workflow, beginning with multiplexed sequence reads and finishing with taxonomic and phylogenetic profiles and comparisons of the samples in the study. With this information in hand, it is possible to determine biological and environmental factors that alter microbial community ecology in your experiment.

As an example, we will use data from a study of the response of mouse gut microbial communities to fasting (Crawford et al., 2009). To make this tutorial run quickly on a personal computer, we will use a subset of the data generated from 5 animals kept on the control ad libitum fed diet, and 4 animals fasted for 24 hours before sacrifice. At the end of our tutorial, we will be able to compare the community structure of control vs. fasted animals. In particular, we will be able to compare taxonomic profiles for each sample type, differences in diversity metrics within the samples and between the groups, and perform comparative clustering analysis to look for overall differences in the samples.

To process our data, we will perform the following steps, each of which is described in more detail in the Data Analysis Steps:

Filter the sequence reads for quality and assign multiplexed reads to starting samples by nucleotide barcode.
Pick Operational Taxonomic Units (OTUs) based on sequence similarity within the reads, and pick a representative sequence from each OTU.
Assign the OTU to a taxonomic identity using reference databases.
Align the OTU sequences and create a phylogenetic tree.
Calculate diversity metrics for each sample and compare the types of communities, using the taxonomic and phylogenetic assignments.
Generate UPGMA and PCoA plots to visually depict the differences between the samples, and dynamically work with these graphs to generate publication quality figures.

Essential Files¶

All the files you will need for this tutorial are here (http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.2.1.zip). Descriptions of these files are below.

Sequences (.fna)¶

This is the 454-machine generated FASTA file. Using the Amplicon processing software on the 454 FLX standard, each region of the PTP plate will yield a fasta file of form 1.TCA.454Reads.fna, where “1” is replaced with the appropriate region number. For the purposes of this tutorial, we will use the fasta file Fasting_Example.fna.

Quality Scores (.qual)¶

This is the 454-machine generated quality score file, which contains a score for each base in each sequence included in the FASTA file. Like the fasta file mentioned above, the Amplicon processing software will generate one of these files for each region of the PTP plate, named 1.TCA.454Reads.qual, etc. For the purposes of this tutorial, we will use the quality scores file Fasting_Example.qual.

Mapping File (Tab-delimited .txt)¶

The mapping file is generated by the user. This file contains all of the information about the samples necessary to perform the data analysis. At a minimum, the mapping file should contain the name of each sample, the barcode sequence used for each sample, the linker/primer sequence used to amplify the sample, and a Description column. In general, you should also include in the mapping file any metadata that relates to the samples (for instance, health status or sampling site) and any additional information relating to specific samples that may be useful to have at hand when considering outliers (for example, what medications a patient was taking at time of sampling). Full format specifications can be found in the Documentation.

You are highly encouraged to validate your mapping file using check_id_map.py before attempting to analyze your data. This tool will check for errors, and make suggestions for other aspects of the file to be edited (errors and warnings are output to a log file, and suggested changes to invalid characters are output to a corrected_mapping.txt file). For the purposes of this tutorial, we will use the mapping file Fasting_Map.txt. The contents of the mapping file are shown here - as you can see, a nucleotide barcode sequence is provided for each of the 9 samples, as well as metadata related to treatment group and date of birth, and general run descriptions about the project. Fasting_Map.txt file contents:

Note

#SampleID BarcodeSequence LinkerPrimerSequence Treatment DOB Description
#Example mapping file for the QIIME analysis package. These 9 samples are from a study of the effects of
#exercise and diet on mouse cardiac physiology (Crawford, et al, PNAS, 2009).
PC.354 AGCACGAGCCTA YATGCTGCCTCCCGTAGGAGT Control 20061218 Control_mouse__I.D._354
PC.355 AACTCGTCGATG YATGCTGCCTCCCGTAGGAGT Control 20061218 Control_mouse__I.D._355
PC.356 ACAGACCACTCA YATGCTGCCTCCCGTAGGAGT Control 20061126 Control_mouse__I.D._356
PC.481 ACCAGCGACTAG YATGCTGCCTCCCGTAGGAGT Control 20070314 Control_mouse__I.D._481
PC.593 AGCAGCACTTGT YATGCTGCCTCCCGTAGGAGT Control 20071210 Control_mouse__I.D._593
PC.607 AACTGTGCGTAC YATGCTGCCTCCCGTAGGAGT Fast 20071112 Fasting_mouse__I.D._607
PC.634 ACAGAGTCGGCT YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._634
PC.635 ACCGCAGAGTCA YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._635
PC.636 ACGGTGAGTGTC YATGCTGCCTCCCGTAGGAGT Fast 20080116 Fasting_mouse__I.D._636

Flowgram File (.sff) - (Optional)¶

This is the 454-machine generated file which stores the sequencing trace data. This is the largest file returned from a 454 run. The sffinfo command in the 454 software package can be used to generate sequence and quality files from sff file(s) as follows

To generate a fasta file

sffinfo -s NAME_OF_SFF_FILES > OUTPUT_NAME.fna

To generate a quality score file

sffinfo -q NAME_OF_SFF_FILES >OUTPUT_NAME.qual

Data Analysis Steps¶

In this walkthrough, white text on a black background denote the command-line invocation of scripts. You can find full usage information for each script by passing the -h option (help) and/or by reading the full description in the Documentation. First, assemble the sequences (.fna), quality scores (.qual), and metadata mapping file into a directory. Execute all tutorial commands from within the qiime_tutorial directory, which can be downloaded from here.

Pre-processing 454 Data¶

Filter the reads based on quality, and assign multiplexed reads to starting sample by nucleotide barcode.

Check Mapping File¶

Before beginning the pipeline, you should ensure that your mapping file is formatted correctly with the check_id_map.py script.

check_id_map.py -m Fasting_Map.txt -o mapping_output/

If verbose (-v) is enabled, this utility will print to STDOUT a message indicating whether or not problems were found in the mapping file. Errors and warnings will the output to a log file, which will be present in the specified (-o) output directory. Errors will cause fatal problems with subsequent scripts and must be corrected before moving forward. Warnings will not cause fatal problems, but it is encouraged that you fix these problems as they are often indicative of typos in your mapping file, invalid characters, or other unintended errors that will impact downstream analysis. A corrected_mapping.txt file will also be created in the output directory, which will have a copy of the mapping file with invalid characters replaced by underscores, or a message indicating that no invalid characters were found.

Assign Samples to Multiplex Reads¶

The next task is to assign the multiplex reads to samples based on their nucleotide barcode. Also, this step performs quality filtering based on the characteristics of each sequence, removing any low quality or ambiguous reads. The script for this step is split_libraries.py. A full description of parameters for this script are described in the Documentation. For this tutorial, we will use default parameters (minimum quality score = 25, minimum/maximum length = 200/1000, no ambiguous bases allowed and no mismatches allowed in the primer sequence).:

split_libraries.py -m Fasting_Map.txt -f Fasting_Example.fna -q Fasting_Example.qual -o split_library_output

This invocation will create three files in the new directory split_library_output/:

split_library_log.txt : This file contains the summary of splitting, including the number of reads detected for each sample and a brief summary of any reads that were removed due to quality considerations.
histograms.txt : This tab delimited file shows the number of reads at regular size intervals before and after splitting the library.
seqs.fna : This is a fasta formatted file where each sequence is renamed according to the sample it came from. The header line also contains the name of the read in the input fasta file and information on any barcode errors that were corrected.

A few lines from the seqs.fna file are shown below:

Note

>PC.634_1 FLP3FBN01ELBSX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCC....
>PC.634_2 FLP3FBN01EG8AX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT bc_diffs=0
TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTT....
>PC.354_3 FLP3FBN01EEWKD orig_bc=AGCACGAGCCTA new_bc=AGCACGAGCCTA bc_diffs=0
TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGGCTATGCATCATTGCCTT....
>PC.481_4 FLP3FBN01DEHK3 orig_bc=ACCAGCGACTAG new_bc=ACCAGCGACTAG bc_diffs=0
CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGACT....

Workflow scripts and the parameters file¶

QIIME includes workflow scripts, which allow multiple tasks to be performed with one command. Within the QIIME directory there is a file qiime_parameters.txt, where the user can set parameters for specific steps within a workflow script. The user should make a copy of qiime_parameters.txt and place it into their working directory and give it a new filename (e.g. custom_parameters.txt), but DO NOT EDIT the original file. If you are using the tutorial dataset, the parameters file custom_parameters.txt is included, which has many parameters already set with appropriate values for the tutorial data. For more information on the qiime_parameters.txt file, please refer to here. In this tutorial, we will utilize the workflow scripts when appropriate and within each section where the workflow is used, we will discuss which options in the custom_parameters.txt file associate to each step within the workflow. Users can run the workflow scripts in parallel by passing “-a” option to each of the scripts, however, this means that if you are running these scripts on a laptop, there must be more than one core in the machine (e.g. Intel duo or quad core).

Pick Operational Taxonomic Units (OTUs) through making OTU table¶

Here we will be running the pick_otus_through_otu_table.py workflow, which consists of the following steps:

Pick OTUs (for more information, refer to pick_otus.py)
Pick a representative sequence set (for more information, refer to pick_rep_set.py)
Align the representative sequence set (for more information, refer to align_seqs.py)
Assign taxonomy (for more information, refer to assign_taxonomy.py)
Filter the alignment prior to tree building - remove positions which are all gaps, and specified as 0 in the lanemask (for more information, refer to filter_alignment.py)
Build a phylogenetic tree (for more information, refer to make_phylogeny.py)
Build an OTU table (for more information, refer to make_otu_table.py)

We will first go through each step and define the parameters in custom_parameters.txt and then at the end, we will run this workflow script.

Optionally, we can denoise the sequences based on clustering the flowgram sequences. For a single library/sff file we can simply use the workflow script pick_otus_through_otu_tables.py, by providing the script with the sff file and the metadata mapping file. For multiple sff files refer to the special purpose tutorial Denoising of 454 Data Sets.

Step 1. Pick OTUs based on Sequence Similarity within the Reads¶

At this step, all of the sequences from all of the samples will be clustered into Operational Taxonomic Units (OTUs) based on their sequence similarity. OTUs in QIIME are clusters of sequences, frequently intended to represent some degree of taxonomic relatedness. For example, when sequences are clustered at 97% sequence similarity with uclust, each resulting cluster is typically thought of as representing a genus. This model and the current techniques for picking OTUs are known to be flawed, and determining exactly how OTUs should be defined, and what they represent, is an active area of research. Thus, OTU-picking will identify highly similar sequences across the samples and provide a platform for comparisons of community structure. The script pick_otus.py takes as input the fasta file output from Assign Samples to Multiplex Reads above, and returns a list of OTUs detected and the fasta header for sequences that belong in that OTU. To make the workflow invoke pick_otus.py using uclust to cluster and the default setting of 97% similarity determining an OTU, include the following settings in the custom_parameters.txt file:

Note

# OTU picker parameters
pick_otus:otu_picking_method uclust
pick_otus:clustering_algorithm furthest
pick_otus:max_cdhit_memory 400
pick_otus:refseqs_fp
pick_otus:blast_db
pick_otus:similarity 0.97
pick_otus:max_e_value 1e-10
pick_otus:prefix_prefilter_length
pick_otus:trie_prefilter
pick_otus:prefix_length
pick_otus:suffix_length
pick_otus:optimal_uclust
pick_otus:exact_uclust
pick_otus:user_sort
pick_otus:suppress_presort_by_abundance_uclust
pick_otus:suppress_new_clusters

Note that tabs/space separate fields, e.g.: pick_otus:similarity 0.97. Many of these parameters are blank, therefore default values are used; however, the user can supply variables when necessary. Once this step in the workflow is run, in the newly created directory wf_da/uclust_picked_otus/, there will be two files. One is seqs.log, which contains information about the invocation of the script. The OTUs will be recorded in the tab-delimited file seqs_otus.txt. The OTUs are arbitrarily named by a number, which is recorded in the first column. The subsequent columns in each line identify the sequence or sequences that belong in that OTU.

Step 2. Pick Representative Sequences for each OTU¶

Since each OTU may be made up of many sequences, we will pick a representative sequence for that OTU for downstream analysis. This representative sequence will be used for taxonomic identification of the OTU and phylogenetic alignment. The script pick_rep_set.py uses the OTU file created above and extracts a representative sequence from the fasta file by one of several methods. To use the default method, where the most abundant sequence in the OTU is used as the representative sequence, set the parameters in custom_parameters.txt as follows:

Note

# Representative set picker parameters
pick_rep_set:rep_set_picking_method most_abundant
pick_rep_set:sort_by otu

In the wf_da/uclust_picked_otus/rep_set/ directory, the script has created two new files - the log file seqs_rep_set.log and the fasta file seqs_rep_set.fasta containing one representative sequence for each OTU. In this fasta file, the sequence has been renamed by the OTU, and the additional information on the header line reflects the sequence used as the representative:

Note

>0 PC.636_424
CTGGGCCGTATCTCAGTCCCAATGTGGCCGGTCGACCTCTC....
>1 PC.481_321
TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTC....

Step 3. Align OTU Sequences¶

Alignment of the sequences and phylogeny inference is necessary only if phylogenetic tools such as UniFrac will be subsequently invoked. Alignments can either be generated de novo using programs such as MUSCLE, or through assignment to an existing alignment with tools like PyNAST. For small studies such as this tutorial, either method is possible. However, for studies involving many sequences (roughly, more than 1000), the de novo aligners are very slow and assignment with PyNAST is preferred. Either alignment approach is accomplished with the script align_seqs.py. Since this is one of the most computationally intensive bottlenecks in the pipeline, large studies would benefit greatly from parallelization of this task (described in detail in the Documentation): When using PyNAST as an aligner, the user must supply a template alignment and if the user followed the instructions (4.Getting_started_with_QIIME.txt) in the Virtual Machine, then the greengenes files will be located in /home/qiime/. For the tutorial, we will use PyNAST as the alignment method, UCLUST for the pairwise alignment method, a minimum length of 150 and a minimum percent identity of 75.0 in custom_parameters.txt as follows:

Note

# Multiple sequence alignment parameters
align_seqs:template_fp /home/qiime/core_set_aligned.fasta.imputed
align_seqs:alignment_method pynast
align_seqs:pairwise_alignment_method uclust
align_seqs:blast_db
align_seqs:min_length 150
align_seqs:min_percent_id 75.0

A log file and an alignment file are created in the directory wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/.

Step 4. Assign Taxonomy¶

A primary goal of the QIIME pipeline is to assign high-throughput sequencing reads to taxonomic identities using established databases. This will give you information on the microbial lineages found in your samples. Using assign_taxonomy.py, you can compare your OTUs against a reference database of your choosing. For our example, we will set the assignment_method to the RDP classification system and a confidence of 0.8 in custom_parameters.txt. Note: the option “assign_taxonomy:e_value” is commented out, since it is not used for the rdp method and it will cause the parallel version of this workflow to fail.

Note

# Taxonomy assignment parameters
assign_taxonomy:id_to_taxonomy_fp
assign_taxonomy:reference_seqs_fp
assign_taxonomy:assignment_method rdp
assign_taxonomy:blast_db
assign_taxonomy:confidence 0.8
#assign_taxonomy:e_value 0.001

In the directory wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy, there will be a log file and a text file. The text file contains a line for each OTU considered, with the RDP taxonomy assignment and a numerical confidence of that assignment (1 is the highest possible confidence). For some OTUs, the assignment will be as specific as a bacterial species, while others may be assignable to nothing more specific than the bacterial domain. Below are the first few lines of the text file and the user should note that the taxonomic assignment and confidence numbers from their run may not coincide with the output shown below, due to the RDP classification algorithm:

Note

41 PC.356_347 Root;Bacteria 0.980
63 PC.635_130 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae” 0.960
353 PC.634_150 Root;Bacteria;Proteobacteria;Deltaproteobacteria 0.880
18 PC.355_1011 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Rikenellaceae;Alistipes 0.990

Step 5. Filter Alignment¶

Before building the tree, one must filter the alignment to removed columns comprised of only gaps. Note that depending on where you obtained the lanemask file from, it will either be named lanemask_in_1s_and_0s.txt or lanemask_in_1s_and_0s. If the user followed the instructions (4.Getting_started_with_QIIME.txt) in the Virtual Machine, then the greengenes files will be located in /home/qiime/. We will also set the allowed gap fraction as 0.999999, remove outliers to False and a threshold of 3.0 in custom_parameters.txt as follows:

Note

# Alignment filtering (prior to tree-building) parameters
filter_alignment:lane_mask_fp /home/qiime/lanemask_in_1s_and_0s.txt
filter_alignment:allowed_gap_frac 0.999999
filter_alignment:remove_outliers False
filter_alignment:threshold 3.0

A filtered alignment file is created in the directory wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/.

Step 6. Make Phylogenetic Tree¶

The filtered alignment file produced in the directory wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/ can be used to build a phylogenetic tree using a tree-building program. As an example, we can set the tree_method to fasttree and the root_method to tree_method_default in custom_parameters.txt.

Note

# Phylogenetic tree building parameters
make_phylogeny:tree_method fasttree
make_phylogeny:root_method tree_method_default

The Newick format tree file is written to seqs_rep_set.tre, which is located in the wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/fasttree_phylogeny directory . This file can be viewed in a tree visualization software, and is necessary for UniFrac diversity measurements (described below). For the following example, the FigTree program was used to visualize the phylogenetic tree obtained from seqs_rep_set.tre.

Step 7. Make OTU Table¶

Using these assignments and the OTU file created in Step 1. Pick OTUs based on Sequence Similarity within the Reads, we can make a readable matrix of OTU by Sample with meaningful taxonomic identifiers for each OTU. Currently there are no parameters in custom_parameters.txt for the user to define when making an OTU table.

The result of this step is seqs_otu_table.txt, which is located in the wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/ directory. The first few lines of seqs_otu_table.txt are shown below (OTUs 1-9), where the first column contains the OTU number, the last column contains the taxonomic assignment for the OTU, and 9 columns between are for each of our 9 samples. The value of each ij entry in the matrix is the number of times OTU i was found in the sequences for sample j.

Note

#Full OTU Counts
#OTU ID PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636 Consensus Lineage
0 0 0 0 0 0 0 0 1 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
1 0 0 0 0 0 1 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
2 0 0 0 0 0 0 0 0 1 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Porphyromonadaceae;Parabacteroides
3 2 1 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”;”Lachnospiraceae Incertae Sedis”
4 1 0 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
5 0 0 0 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales
6 0 0 0 0 0 0 0 1 0 Root;Bacteria;Actinobacteria;Actinobacteria
7 0 0 2 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Ruminococcaceae”
8 1 1 0 2 4 0 0 0 0 Root;Bacteria;Firmicutes;”Bacilli”;”Lactobacillales”;Lactobacillaceae;Lactobacillus
9 0 0 2 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”

Running pick_otus_through_otu_table.py¶

Now that we have set the parameters necessary for this workflow script, the user can run the following command, where we define the input sequence file “-i” (from split_libraries.py), the parameter file to use “-p” and the output directory “-o”:

pick_otus_through_otu_table.py -i split_library_output/seqs.fna -p custom_parameters.txt -o wf_da

Make OTU Heatmap¶

The QIIME pipeline includes a very useful utility to generate images of the OTU table. The script is make_otu_heatmap_html.py

make_otu_heatmap_html.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -o wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/OTU_Heatmap/

An html file is created in the directory “wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/Fasting_OTU_Heatmap/”. You can open this file with any web browser, and will be prompted to enter a value for “Filter by Counts per OTU”. Only OTUs with total counts at or above this threshold will be displayed. The OTU heatmap displays raw OTU counts per sample, where the counts are colored based on the contribution of each OTU to the total OTU count present in that sample (blue: contributes low percentage of OTUs to sample; red: contributes high percentage of OTUs). Click the “Sample ID” button, and a graphic will be generated like the figure below. For each sample, you will see in a heatmap the number of times each OTU was found in that sample. You can mouse over any individual count to get more information on the OTU (including taxonomic assignment). Within the mouseover, there is a link for the terminal lineage assignment, so you can easily search Google for more information about that assignment.

Alternatively, you can click on one of the counts in the heatmap and a new pop-up window will appear. The pop-up window uses a Google Visualization API called Magic-Table. Depending on which table count you clicked on, the pop-up window will put the clicked-on count in the middle of the pop-up heatmap as shown below. For the following example, the table count with the red arrow mouseover is the same one being focused on using the Magic-Table.

On the original heatmap webpage, if you select the “Taxonomy” button instead, you will generate a heatmap keyed by taxon assignment, which allows you to conveniently look for organisms and lineages of interest in your study. Again, mousing over an individual count will show additional information for that OTU and sample.

Make OTU Network¶

An alternative to viewing the OTU table as a heatmap is to create an OTU network, using the following command.:

make_otu_network.py -m Fasting_Map.txt -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -o wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/OTU_Network

To visualize the network, we use the Cytoscape program (which you can run by calling cytoscape from the command line – you may need to call this beginning either with a capital or lowercase ‘C’ depending on your version of Cytoscape), where each red circle represents a sample and each white square represents an OTU. The lines represent the OTUs present in a particular sample (blue for controls and green for fasting). For more information about opening the files in Cytoscape please refer here.

You can group OTUs by different taxonomic levels (division, class, family) with the script summarize_taxa.py. The input is the OTU table created above and the taxonomic level you need to group the OTUs. For the RDP taxonomy, the following taxonomic levels correspond to: 2 = Domain (Bacteria), 3 = Phylum (Actinobacteria), 4 = Class, and so on.

summarize_taxa.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -o wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/otu_table_Level3.txt -L 3 -r 0

The script will generate a new OTU table wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/otu_table_Level3.txt, where the value of each ij entry in the matrix is the count of the number of times all OTUs belonging to the taxon i (for example, Phylum Actinobacteria) were found in the sequences for sample j.

Note

#Full OTU Counts
Taxon PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636
Root;Bacteria;Actinobacteria 0.0 0.0 0.0 1.0 0.0 2.0 3.0 1.0 1.0
Root;Bacteria;Bacteroidetes 7.0 38.0 15.0 19.0 30.0 40.0 86.0 54.0 90.0
Root;Bacteria;Deferribacteres 0.0 0.0 0.0 0.0 0.0 3.0 5.0 2.0 7.0
Root;Bacteria;Firmicutes 136.0 102.0 115.0 117.0 65.0 66.0 37.0 63.0 34.0
Root;Bacteria;Other 5.0 6.0 18.0 9.0 49.0 35.0 14.0 27.0 14.0
Root;Bacteria;Proteobacteria 0.0 0.0 0.0 0.0 5.0 3.0 2.0 0.0 1.0
Root;Bacteria;TM7 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
Root;Bacteria;Verrucomicrobia 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
Root;Other 0.0 0.0 2.0 0.0 0.0 0.0 0.0 1.0 0.0

Make Taxonomy Summary Charts¶

To visualize the summarized taxa, you can use the plot_taxa_summary.py script, which shows which taxons are present in all samples. To use this script, we need to set the taxonomy level label “-l”, an output directory “-o”, and the background color “-k” as white:

plot_taxa_summary.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/otu_table_Level3.txt -l Phylum -o wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/Taxa_Charts -k white

To view the resulting charts, open the area or bar chart html file located in the wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/Taxa_Charts/ folder. The following chart shows the taxa assignments for each sample as an area chart. Users can mouseover the plot to see which taxa are contributing to the percentage shown.

The following chart shows the taxa assignments for each sample as a bar chart.

Compute Alpha Diversity within the Samples and Generate Rarefaction Curves¶

Community ecologists typically describe the microbial diversity within their study. This diversity can be assessed within a sample (alpha diversity) or between a collection of samples (beta diversity). Here, we will determine the level of alpha diversity in our samples using a series of scripts from the QIIME pipeline. To perform this analysis, we will use the alpha_rarefaction.py workflow script. We will first set the parameters in custom_parameters.txt, then at the end, we will run the script. This script performs the following steps:

Generate rarefied OTU tables (for more information, refer to multiple_rarefactions.py)
Compute alpha diversity metrics for each rarefied OTU table (for more information, refer to alpha_diversity.py)
Collate alpha diversity results (for more information, refer to collate_alpha.py)
Generate alpha rarefaction plots (for more information, refer to make_rarefaction_plots.py)

Step 1. Rarify OTU Table¶

For this highly artificial example, all of the samples had sequence counts between 146 and 150, which is discussed in more detail in Step 1. Rarify OTU Table to Remove Sample Heterogeneity (Optional). In real datasets, the range will generally be much larger. In practice, rarefaction is most useful when most samples have the specified number of sequences, so your upper bound of rarefaction should be close to the minimum number of sequences found in a sample. For the this workflow script the min/max values are defined by the workflow script. If the user would like to define their own values, they should perform each step individually. In custom_parameters.txt, the user can define the number of iterations at each sequence/sample level, where we will use “num-rep 5” and whether to include lineages, which we set to False:

Note

# Rarefaction parameters
multiple_rarefactions:num-reps 5
multiple_rarefactions:depth
multiple_rarefactions:lineages_included False

The directory wf_arare/rarefaction/ will contain many text files named rarefaction_##_#.txt; the first set of numbers represents the number of sequences sampled, and the last number represents the iteration number. If you opened one of these files, you would find an OTU table where for each sample the sum of the counts equals the number of samples taken.

Step 2. Compute Alpha Diversity¶

The rarefaction tables are the basis for calculating diversity metrics, which reflect the diversity within the sample based on taxon counts of phylogeny. The QIIME pipeline allows users to conveniently calculate more than two dozen different diversity metrics. The full list of available metrics is available here. Every metric has different strengths and limitations - technical discussion of each metric is readily available online and in ecology textbooks, but it is beyond the scope of this document. Here, we will calculate three metrics:

Chao1 metric estimates the species richness.
The Observed Species metric is simply the count of unique OTUs found in the sample.
Phylogenetic Distance (PD_whole_tree) is the only phylogenetic metric used in this script and requires a phylogenetic tree as an input.

In the custom_parameters.txt file, the user can define a comma-delimited list of alpha diversity metrics to use, as follows:

Note

# Alpha diversity parameters
alpha_diversity:metrics chao1,observed_species,PD_whole_tree

The result of this step produces several text files, located in the wf_arare/alpha_div/ directory.

Step 3. Collate Rarified OTU Tables¶

The output directory wf_arare/alpha_div/ will contain one text file alpha_rarefaction_##_# for every file input from wf_arare/rarefaction/, where the numbers represent the number of samples and iterations as before. The content of this tab delimited file is the calculated metrics for each sample. To collapse the individual files into a single combined table, the workflow uses the script collate_alpha.py. The user can define an “example_path” in the custom_parameters.txt file, however, for the tutorial, we will leave this blank.

Note

# Collate alpha
collate_alpha:example_path

In the newly created directory wf_arare/alpha_div_collated/, there will be one matrix for every diversity metric used in the alpha_diversity.py script. This matrix will contain the metric for every sample, arranged in ascending order from lowest number of sequences per sample to highest. A portion of the observed_species.txt file are shown below:

Note

Sequences per sample iteration PC.354 PC.355 PC.356 PC.481 PC.593
alpha_rarefaction_21_0.txt 21 0 14.0 16.0 18.0 18.0 13.0
alpha_rarefaction_21_1.txt 21 1 15.0 17.0 18.0 20.0 12.0
alpha_rarefaction_21_2.txt 21 2 15.0 16.0 21.0 19.0 13.0
alpha_rarefaction_21_3.txt 21 3 10.0 19.0 18.0 21.0 13.0
alpha_rarefaction_21_4.txt 21 4 14.0 18.0 16.0 15.0 12.0
...

Step 4. Generate Rarefaction Curves¶

The script make_rarefaction_plots.py takes a mapping file and any number of rarefaction files generated by collate_alpha.py and uses matplotlib to create rarefaction curves. Each curve represents a sample and can be colored by the sample metadata supplied in the mapping file. In the custom_parameters.txt file, the user can set the image format (i.e. png), resolution (i.e. 75), and background_color (i.e. white) as follows:

Note

# Make rarefaction plots parameters
make_rarefaction_plots:imagetype png
make_rarefaction_plots:resolution 75
make_rarefaction_plots:background_color white
make_rarefaction_plots:prefs_path

This step generates a wf_arare/alpha_rarefaction_plots/average_tables/ folder, which contains the rarefaction averages for each diversity metric, so the user can plot the rarefaction curves in another application, like MS Excel. The wf_arare/alpha_rarefaction_plots/average_plots/ folder contains the average plots for each metric and category and the wf_arare/alpha_rarefaction_plots/html_plots/ folder contains all the images used in the html page generated. To view the rarefaction plots the user can open the file wf_arare/alpha_rarefaction_plots/rarefaction_plots.html in a browser. Once the browser window is open, the user can select the metric and category for whichever rarefaction plots they would like to display. The user can also turn on/off lines in the plot by (un)checking the box next to each label in the legend. The user can click on the triangle next to each label in the legend to see all the samples that contribute to that category. Below each plot, the user will see the average data over all metrics for the specified category.

Running alpha_rarefaction.py¶

Now that we have set the parameters, necessary for this workflow script, the user can run the following command, where we define the input OTU table “-i” and tree file “-t” (from pick_otus_through_otu_table.py), the parameter file to use “-p”, and the output directory “-o”:

alpha_rarefaction.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -m Fasting_Map.txt -o wf_arare/ -p custom_parameters.txt -t wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/fasttree_phylogeny/seqs_rep_set.tre

Compute Beta Diversity and Generate 3D Principal Coordinate Analysis (PCoA) Plots¶

Here we will be running the beta_diversity_through_3d_plots.py workflow, which consists of the following steps:

Rarify OTU table (for more information, refer to single_rarefaction.py)
Compute Beta Diversity (for more information, refer to beta_diversity.py)
Generate Principal Coordinates (for more information, refer to principal_coordinates.py)
Make preferences file (for more information, refer to make_prefs_file.py)
Generate 3D PCoA plots (for more information, refer to make_3d_plots.py)

Step 1. Rarify OTU Table to Remove Sample Heterogeneity (Optional)¶

To remove sample heterogeneity, we can perform rarefaction on our OTU table. Rarefaction is an ecological approach that allows users to standardize the data obtained from samples with different sequencing efforts, and to compare the OTU richness of the samples using this standardized platform. For instance, if one of your samples yielded 10,000 sequence counts, and another yielded only 1,000 counts, the species diversity within those samples may be much more influenced by sequencing effort than underlying biology. The approach of rarefaction is to randomly sample the same number of OTUs from each sample, and use this data to compare the communities at a given level of sampling effort.

To perform rarefaction, you need to set the boundaries for sampling and the step size between sampling intervals. You can find the number of sequences associated with each sample by looking in the split_library_log.txt file generated in Assign Samples to Multiplex Reads above. The line from our tutorial is pasted here:

Note

Sample ct min/max/mean: 146 / 150 / 148.11

Since we are only removing sample heterogeneity from the OTU table, we will use the “-e” option, which only requires the depth of sampling. Rarefaction is most useful when most samples have the specified number of sequences, so your upper bound of rarefaction should be close to the minimum number of sequences found in a sample. For this case, we will set the depth to 146.

Step 2. Compute Beta Diversity¶

Beta-diversity metrics assess the differences between microbial communities. In general, these metrics are calculated to study diversity along an environmental gradient (pH or temperature) or different disease states (lean vs. obese). The basic output of this comparison is a square matrix where a “distance” is calculated between every pair of samples reflecting the similarity between the samples. The data in this distance matrix can be visualized with clustering analyses, namely Principal Coordinate Analysis (PCoA) and UPGMA clustering. Like alpha diversity, there are many possible metrics which can be calculated with the QIIME pipeline - the full list of options can be found here. For our example, we will calculate weighted and unweighted unifrac, which are phylogenetic measures used extensively in recent microbial community sequencing projects, by defining the metric parameter in the custom_parameters.txt file, as follows:

Note

# Beta diversity parameters
beta_diversity:metrics weighted_unifrac,unweighted_unifrac

The resulting distance matrices ( wf_bdiv_even146/unweighted_unifrac_seqs_otu_table.txt and wf_bdiv_even146/weighted_unifrac_seqs_otu_table.txt) are the basis for two methods of visualization and sample comparison: PCoA and UPGMA.

Step 3. Generate Principal Coordinates¶

Principal Coordinate Analysis (PCoA) is a technique that helps to extract and visualize a few highly informative gradients of variation from complex, multidimensional data. This is a complex transformation that maps the distance matrix to a new set of orthogonal axes such that a maximum amount of variation is explained by the first principal coordinate, the second largest amount of variation is explained by the second principal coordinate, etc. The principal coordinates can be plotted in two or three dimensions to provide an intuitive visualization of the data structure and look at differences between the samples, and look for similarities by sample category. The transformation is accomplished with the script principal_coordinates.py. Since this script only takes an input/output file, there are no parameters for the user to set in custom_parameters.txt.

The files wf_bdiv_even146/unweighted_unifrac_pc.txt and wf_bdiv_even146/weighted_unifrac_pc.txt lists every sample in the first column, and the subsequent columns contain the value for the sample against the noted principal coordinate. At the bottom of each Principal Coordinate column, you will find the eigenvalue and percent of variation explained by the coordinate. To determine which axes are useful for your project, you can generate a “scree plot” by plotting the eigenvalues of each principal component in descending order.

Step 4. Make Preferences File¶

In order to generate the PCoA plots, we want to generate a preferences file, which defines the colors for each of the samples or for a particular category within a mapping column. For more information on making a preferences file, please refer to make_prefs_file.py. In the custom_parameters.txt file, the user can set the background color to be used for the 3D PCoA plot (either black or white), the mapping header categories to plot (uses ALL if left blank) and the monte carlo distance to use (this is for make_distance_histograms.py, which we will do in a few steps).

Note

# Make prefs file parameters
make_prefs_file:background_color black
make_prefs_file:mapping_headers_to_use Treatment,DOB
make_prefs_file:monte_carlo_dists 10

Step 5. Generate 3D PCoA Plots¶

To plot the coordinates, you can use the QIIME scripts make_2d_plots.py and make_3d_plots.py. The two dimensional plot will be rendered as a html file which can be opened with a standard web browser, while the three dimensional plot will be a kinemage file which requires additional software to render and manipulate. The usage for both scripts use the same convention, detailed in make_3d_plots.py. Since the coloring was set for the preferences file parameters, we only need to set the custom_axes in the custom_parameters.txt, although we can leave it blank, as follows:

Note

# Make 3D plot parameters
make_3d_plots:custom_axes

The html files are created in wf_bdiv_even146/unweighted_unifrac_3d... and wf_bdiv_even146/weighted_unifrac_3d... directories. In the custom_parameters.txt, we specified that the samples should be colored by the value of the “Treatment” and “DOB” columns under the make_prefs_file parameters. For the “Treatment” column, all samples with the same “Treatment” will get the same color. For our tutorial, the five control samples are all blue and the four control samples are all green. This lets you easily visualize “clustering” by metadata category. The 3d visualization software allows you to rotate the axes to see the data from different perspectives. By default, the script will plot the first three dimensions in your file. Other combinations can be viewed using the “Views:Choose viewing axes” option in the KiNG viewer (may require the installation of kinemage software). The first 10 components can be viewed using “Views:Paralleled coordinates” option or typing “/”.

Running beta_diversity_through_3d_plots.py¶

Now that we have set the parameters, necessary for this workflow script, the user can run the following command, where we define the input OTU table “-i” and tree file “-t” (from pick_otus_through_otu_table.py), the parameter file to use “-p”, the user-defined mapping file “-m”, the output directory “-o” and set the sequences per sample depth to 146.

beta_diversity_through_3d_plots.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -m Fasting_Map.txt -o wf_bdiv_even146/ -p custom_parameters.txt -t wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/fasttree_phylogeny/seqs_rep_set.tre -e 146

Generate 2D PCoA Plots¶

To plot the coordinates for the unweighted unifrac principal coordinates in 2D, you can use the QIIME script make_2d_plots.py. Here we will use the same preferences file generated from the beta_diversity_through_3d_plots.py, set the background color “-k” to white and output the results to wf_bdiv_even146/unweighted_unifrac_2d:

make_2d_plots.py -i wf_bdiv_even146/unweighted_unifrac_pc.txt -m Fasting_Map.txt -o wf_bdiv_even146/unweighted_unifrac_2d -k white -p wf_bdiv_even146/prefs.txt

The html file created in directory wf_bdiv_even146/unweighted_unifrac_2d shows a plot for each combination of the first three principal coordinates. Since we specified Treatment and DOB to use for coloring the samples, each sample colored according to the category it corresponds. You can get the name for each sample by holding your mouse over the data point.

Generate Distance Histograms¶

Distance Histograms are a way to compare different categories and see which tend to have larger/smaller distances than others. For example, in the hand study, you may want to compare the distances between hands to the distances between individuals. Here we will use the distance matrix and prefs file generated by beta_diversity_through_3d_plots.py, the mapping file, an output directory wf_bdiv_even146/Distance_Histograms and write the output as html, as follows:

make_distance_histograms.py -d wf_bdiv_even146/unweighted_unifrac_seqs_otu_table_even146.txt -m Fasting_Map.txt -o wf_bdiv_even146/Distance_Histograms -p wf_bdiv_even146/prefs.txt --html_output

For each of these groups of distances a histogram is made. The output is a HTML file (wf_bdiv_even146/Distance_Histograms/QIIME_Distance_Histograms.html) where you can look at all the distance histograms individually, and compare them between each other. Within the webpage, the user can mouseover and/or select the checkboxes in the right panel to turn on/off the different distances within/between categories. For this example, we are comparing the distances between the samples in the Control versus themselves, along with samples from Fasting versus the Control.

UPGMA Clustering and Jackknifing Support¶

The steps performed by this workflow are:

Compute beta diversity distance matrix from OTU table (and tree, if applicable) (for more information, refer to beta_diversity.py)
Build UPGMA tree from full distance matrix; (for more information, refer to upgma_cluster.py)
Build rarefied OTU tables (for more information, refer to multiple_rarefactions.py)
Compute distance matrices for rarefied OTU tables (for more information, refer to beta_diversity.py)
Build UPGMA trees from rarefied OTU table distance matrices (for more information, refer to upgma_cluster.py)
Compare rarefied OTU table distance matrix UPGMA trees to tree full UPGMA tree and write support file and newick tree with support values as node labels (for more information, refer to tree_compare.py)

Steps 1 and 2. UPGMA Clustering¶

Unweighted Pair Group Method with Arithmetic mean (UPGMA) is type of UPGMA clustering method using average linkage and can be used to visualize the distance matrix produced by beta_diversity.py.

The output is a file that can be opened with tree viewing software, such as FigTree.

This tree shows the relationship among the 9 samples, and reveals that the 4 samples from the guts of fasting mice cluster together (PC.6xx, fasting data is in Fasting_Map.txt).

Steps 3, 4 and 5. Perform Jackknifing Support¶

To measure the robustness of this result to sequencing effort, we perform a jackknifing analysis, wherein a smaller number of sequences are chosen at random from each sample, and the resulting UPGMA tree from this subset of data is compared with the tree representing the entire available data set. This process is repeated with many random subsets of data, and the tree nodes which prove more consistent across jackknifed datasets are deemed more robust.

First the jackknifed OTU tables must be generated, by subsampling the full available data set. In this tutorial, each sample contains between 146 and 150 sequences, as shown in the split_library_log.txt file:

Note

Sample ct min/max/mean: 146 / 150 / 148.11

To ensure that a random subset of sequences is selected from each sample, we chose to select 110 sequences from each sample (75% of the smallest sample, though this value is only a guideline), which is designated by the “-e” option when running the workflow script (see below). In the custom_parameters.txt file, we set the number of jackknife replicates as follows:

Note

# Multiple Rarefactions
multiple_rarefactions_even_depth:num-reps 20

This generates 20 subsets of the available data, each subset a simulation of a smaller sequencing effort (110 sequences in each sample, as defined below).

We then calculate the distance matrix for each jackknifed dataset, using beta_diversity.py as before, but now in batch mode, which results in 20 distance matrix files written to the wf_jack/unweighted_unifrac/rare_dm/ and wf_jack/weighted_unifrac/rare_dm/ directories. Each of those is then used as the basis for UPGMA clustering, using upgma_cluster.py in batch mode and written to the wf_jack/unweighted_unifrac/rare_upgma/ and wf_jack/weighted_unifrac/rare_upgma/ directories.

Step 6. Compare Jackknifed Trees to Cluster Tree¶

UPGMA clustering of the 20 distance matrix files results in 20 UPGMA samples clusters, each based on a random sub-sample of the available sequence data. These are then compared to the UPGMA result using all available data.

This compares the UPGMA clustering based on all available data with the jackknifed UPGMA results. Three files are written to wf_jack/unweighted_unifrac/upgma_cmp/ and wf_jack/weighted_unifrac/upgma_cmp/:

master_tree.tre, which is virtually identical to jackknife_named_nodes.tre but each internal node of the UPGMA clustering is assigned a unique name

jackknife_named_nodes.tre

jackknife_support.txt explains how frequently a given internal node had the same set of descendant samples in the jackknifed UPGMA clusters as it does in the UPGMA cluster using the full available data. A value of 0.5 indicates that half of the jackknifed data sets support that node, while 1.0 indicates perfect support.

Running jackknifed_upgma.py¶

Now that we have set the parameter, necessary for this workflow script, the user can run the following command, where we define the input OTU table “-i” and tree file “-t” (from pick_otus_through_otu_table.py), the parameter file to use “-p”, the output directory “-o” and the number of sequences per sample “-e” (i.e. 100):

jackknifed_beta_diversity.py -i wf_da/uclust_picked_otus/rep_set/rdp_assigned_taxonomy/otu_table/seqs_otu_table.txt -o wf_jack -p custom_parameters.txt -e 110 -t wf_da/uclust_picked_otus/rep_set/pynast_aligned_seqs/fasttree_phylogeny/seqs_rep_set.tre -m Fasting_Map.txt

Generate Bootstrapped Tree¶

As an example, we can visualize the bootstrapped tree using unweighted unifrac using make_bootstrapped_tree.py, as follows:

make_bootstrapped_tree.py -m wf_jack/unweighted_unifrac/upgma_cmp/master_tree.tre -s wf_jack/unweighted_unifrac/upgma_cmp/jackknife_support.txt -o wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf

The resulting pdf shows the tree with internal nodes colored, red for 75-100% support, yellow for 50-75%, green for 25-50%, and blue for < 25% support. Although UPGMA shows that PC.354 and PC.593 cluster together and PC.481 with PC.6xx cluster together, we can not have high confidence in that result. However, there is excellent jackknife support for all fasted samples (PC.6xx) which are clustering together, separate from the non-fasted (PC.35x) samples.

Running Workflow Scripts in Parallel¶

Users can run the workflow scripts in parallel by passing “-a” option to each of the scripts. In the custom_parameters.txt file, the users can customize the number of jobs to start (i.e. jobs_to_start), whether to keep the temporary files generated (retain_temp_files), and the number of seconds to sleep (seconds_to_sleep). If running on a dual-core computer, you can set the number of jobs to start as 2, as follows:

Note

# Parallel options
parallel:jobs_to_start 2
parallel:retain_temp_files False
parallel:seconds_to_sleep 1

Running the QIIME Tutorial Shell Scripts¶

Now that we have gone through the whole tutorial and customized the custom_parameters.txt file, we can run the shell scripts via the Terminal, which contain all the commands that you ran in this tutorial. To run the shell scripts, you may need to allow all users to execute them, using the following commands:

chmod a+x ./qiime_tutorial_commands_serial.sh
chmod a+x ./qiime_tutorial_commands_parallel.sh

To run the QIIME tutorial in serial:

./qiime_tutorial_commands_serial.sh

To run the QIIME tutorial in parallel:

./qiime_tutorial_commands_parallel.sh

References¶

Crawford, P. A., Crowley, J. R., Sambandam, N., Muegge, B. D., Costello, E. K., Hamady, M., et al. (2009). Regulation of myocardial ketone body metabolism by the gut microbiota during nutrient deprivation. Proc Natl Acad Sci U S A, 106(27), 11276-11281.

Navigation

QIIME Overview Tutorial¶

Introduction¶

Essential Files¶

Sequences (.fna)¶

Quality Scores (.qual)¶

Mapping File (Tab-delimited .txt)¶

Flowgram File (.sff) - (Optional)¶

Data Analysis Steps¶

Pre-processing 454 Data¶

Check Mapping File¶

Assign Samples to Multiplex Reads¶

Workflow scripts and the parameters file¶

Pick Operational Taxonomic Units (OTUs) through making OTU table¶

Step 1. Pick OTUs based on Sequence Similarity within the Reads¶

Step 2. Pick Representative Sequences for each OTU¶

Step 3. Align OTU Sequences¶

Step 4. Assign Taxonomy¶

Step 5. Filter Alignment¶

Step 6. Make Phylogenetic Tree¶

Step 7. Make OTU Table¶

Running pick_otus_through_otu_table.py¶

Make OTU Heatmap¶

Make OTU Network¶

Make Taxonomy Summary Charts¶

Compute Alpha Diversity within the Samples and Generate Rarefaction Curves¶

Step 1. Rarify OTU Table¶

Step 2. Compute Alpha Diversity¶

Step 3. Collate Rarified OTU Tables¶

Step 4. Generate Rarefaction Curves¶

Running alpha_rarefaction.py¶

Compute Beta Diversity and Generate 3D Principal Coordinate Analysis (PCoA) Plots¶

Step 1. Rarify OTU Table to Remove Sample Heterogeneity (Optional)¶

Step 2. Compute Beta Diversity¶

Step 3. Generate Principal Coordinates¶

Step 4. Make Preferences File¶

Step 5. Generate 3D PCoA Plots¶

Running beta_diversity_through_3d_plots.py¶

Generate 2D PCoA Plots¶

Generate Distance Histograms¶

UPGMA Clustering and Jackknifing Support¶

Steps 1 and 2. UPGMA Clustering¶

Steps 3, 4 and 5. Perform Jackknifing Support¶

Step 6. Compare Jackknifed Trees to Cluster Tree¶

Running jackknifed_upgma.py¶

Generate Bootstrapped Tree¶

Running Workflow Scripts in Parallel¶

Running the QIIME Tutorial Shell Scripts¶

References¶

Table Of Contents

Site index

Quick search

Navigation