News and Announcements » |
Description:
A metadata mapping file with SampleIDs and fasta file names (just the file name itself, not the full or relative filepath) is used to generate a combined fasta file with valid QIIME labels based upon the SampleIDs specified in the mapping file.
See: http://qiime.org/documentation/file_formats.html#metadata-mapping-files for details about the metadata file format.
Example mapping file: #SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description Sample.1 AAAACCCCGGGG CTACATAATCGGRATT seqs1.fna sample.1 Sample.2 TTTTGGGGAAAA CTACATAATCGGRATT seqs2.fna sample.2
This script is to handle situations where fasta data comes already demultiplexed into a one fasta file per sample basis. Only alters the fasta label to add a QIIME compatible label at the beginning.
Example: With the metadata mapping file above, and an specified directory containing the files seqs1.fna and seqs2.fna, the first line from the seqs1.fna file might look like this: >FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
and in the output combined fasta file would be written like this >Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
No changes are made to the sequences.
Usage: add_qiime_labels.py [options]
Input Arguments:
Note
[REQUIRED]
[OPTIONAL]
Output:
A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file.
Example:
Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fasta
add_qiime_labels.py -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fasta