sampledoc
News and Announcements »

Introduction

Who Should Use this Software?

This software is designed for scientists investigating the microbial diversity within and between samples using SSU (16S or 18S) rRNA gene sequence surveys where the gene sequences are generated by PCR and (mostly) sequenced with 454 pyrosequencing. In order to analyze the diversity of sequences within multiple samples with a single pyrosequencer run, barcodes are assigned to samples, and the amplicons are pooled for sequencing. This software recognizes the barcodes and sorts sequences back to their original samples, then applies a comprehensive set of analysis tools to assess the diversity within samples and compare samples to one another. The analysis also incorporates data provided by the user, to relate known properties of samples or sets of samples to their microbial constituents.

This software is designed for use by scientists familiar with a basic approach to the analysis of microbial diversity. Familiarity with the concepts of OTUs, phylogenetic trees, diversity estimates, and methods of community comparisons such as UniFrac and PCoA, are necessary for the interpretation of the results. The software is designed for wide use, is user-friendly, and does not require specialized skills. it is designed for students and professionals, or anyone wishing to analyze a microbial diversity dataset.

Microbial Community Analysis

QIIME is designed to generate lists of OTUs from very large datasets (mostly those generated by pyrosequencing), and to perform phylogenetic and nonphylogenetic OTU-based analyses. It performs sequence quality checks and chimera-checking, chooses OTUs, performs Sequence-, UniFrac- and OTU-based clustering of samples from which the sequences were obtained, assigns sequences to microbial lineages, displays heatmaps of common and rare OTUs in samples, performs rarefaction analysis for an assessment of depth of coverage, builds very large trees for tree-based community comparisons such as UniFrac, and applies a network-based analysis.

Sequencing Technologies

QIIME is in principle compatible with all platforms: 454, Illumina, and/or Sanger sequencing. Different technologies require different strategies for building the tree: for short reads or for heterogeneous reads that don’t overlap, it is more effective to insert the reads into a tree built using full-length sequences. This is because trees built with reads of less than 200 nucleotides are often highly inaccurate (though still often sufficient for tree-based community comparisons such as UniFrac), and trees generally cannot be built de novo with sequences that do not overlap at all. For combining heterogeneous datasets, the best approach is usually to use split_libraries.py on each dataset separately, combine the FASTA files, make an OTU table using BLAST (rather than cd-hit), and then use the combined OTU table and a combined sample mapping file for downstream analysis. Another important consideration is error rate: pyrosequencing reads that are not denoised (see (Quince et al., 2009)) can lead to very large numbers of artifactual OTUs, even at relatively high levels. However, this noise problem affects alpha diversity estimates (estimates of diversity within a sample) far more than beta diversity estimates (diversity across samples, e.g. clustering with UniFrac). See (Reeder & Knight, 2009) for a commentary on this issue.


sampledoc