News and Announcements » |
Description:
A tab separated text file with SampleIDs and fasta file names (just the file name itself, not the full or relative filepath) is used to generate a combined fasta file with valid QIIME labels based upon the SampleIDs specified in the mapping file.
Example mapping file: Sample.1 fasta_dir/seqs1.fna Sample.2 fasta_dir/seqs2.fna
This script is to handle situations where fasta data comes already demultiplexed into a one fasta file per sample basis. Apart from altering the fasta label to add a QIIME compatible label at the beginning (example: >FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ could become >control.sample_1 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
Note that limited checking is done on the mapping file. The only tests are that every fasta file name is unique, and that SampleIDs are MIMARKS compliant (alphanumeric and period characters only). Duplicate SampleIDs are allowed, so care should be taken that there are no typos.
No changes are made to the sequences.
Usage: add_qiime_labels.py [options]
Input Arguments:
Note
[REQUIRED]
[OPTIONAL]
Output:
A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to Sample.1 and Sample.2.
Example:
Specify fasta_dir as the input directory of fasta files, use the SampleID to fasta file mapping file example_mapping.txt, start enumerating with 1000000 following SampleIDs, and output the data to the directory combined_fasta
add_qiime_labels.py -i fasta_dir -m example_mapping.txt -n 1000000 -o combined_fasta