sampledoc
News and Announcements »

add_qiime_labels.py – Takes a directory, a metadata mapping file, and a column name that contains the fasta file names that SampleIDs are associated with, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels.

Description:

A metadata mapping file with SampleIDs and fasta file names (just the file name itself, not the full or relative filepath) is used to generate a combined fasta file with valid QIIME labels based upon the SampleIDs specified in the mapping file.

See: http://qiime.org/documentation/file_formats.html#metadata-mapping-files for details about the metadata file format.

Example mapping file: #SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description Sample.1 AAAACCCCGGGG CTACATAATCGGRATT seqs1.fna sample.1 Sample.2 TTTTGGGGAAAA CTACATAATCGGRATT seqs2.fna sample.2

This script is to handle situations where fasta data comes already demultiplexed into a one fasta file per sample basis. Only alters the fasta label to add a QIIME compatible label at the beginning.

Example: With the metadata mapping file above, and an specified directory containing the files seqs1.fna and seqs2.fna, the first line from the seqs1.fna file might look like this: >FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA

and in the output combined fasta file would be written like this >Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA

No changes are made to the sequences.

Usage: add_qiime_labels.py [options]

Input Arguments:

Note

[REQUIRED]

-m, --mapping_fp
SampleID to fasta file name mapping file filepath
-i, --fasta_dir
Directory of fasta files to combine and label.
-c, --filename_column
Specify column used in metadata mapping file for fasta file names.

[OPTIONAL]

-o, --output_dir
Required output directory for log file and corrected mapping file, log file, and html file. [default: .]
-n, --count_start
Specify the number to start enumerating sequence labels with. [default: 0]

Output:

A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file.

Example:

Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fasta

add_qiime_labels.py -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fasta

sampledoc