sampledoc
News and Announcements »

validate_demultiplexed_fasta.py – Checks a fasta file to verify if it has been properly demultiplexed, i.e., it is in QIIME compatible format.

Description:

Checks file is a valid fasta file, does not contain gaps (‘.’ or ‘-‘ characters), contains only valid nucleotide characters, no fasta label is duplicated, SampleIDs match those in a provided mapping file, fasta labels are formatted to have SampleID_X as normally generated by QIIME demultiplexing, and the BarcodeSequence/LinkerPrimerSequences are not found in the fasta sequences. Optionally this script can also verify that the SampleIDs in the fasta sequences are also present in the tip IDs of a provided newick tree file, can test for equal sequence lengths across all sequences, and can test that all SampleIDs in the mapping file are represented in the fasta file labels.

Usage: validate_demultiplexed_fasta.py [options]

Input Arguments:

Note

[REQUIRED]

-m, --mapping_fp
Name of mapping file. NOTE: Must contain a header line indicating SampleID in the first column and BarcodeSequence in the second, LinkerPrimerSequence in the third. If no barcode or linkerprimer sequence is present, leave data fields empty.
-i, --input_fasta_fp
Path to the input fasta file

[OPTIONAL]

-o, --output_dir
Directory prefix for output files [default: .]
-t, --tree_fp
Path to the tree file; Needed to test if sequence IDs are a subset or exact match to the tree tips, options -s and -e [default: None]
-s, --tree_subset
Determine if sequence IDs are a subset of the tree tips, newick tree must be passed with the -t option. [default: False]
-e, --tree_exact_match
Determine if sequence IDs are an exact match to tree tips, newick tree must be passed with the -t option. [default: False]
-l, --same_seq_lens
Determine if sequences are all the same length. [default: False]
-a, --all_ids_found
Determine if all SampleIDs provided in the mapping file are represented in the fasta file labels. [default: False]

Output:

Example:

validate_demultiplexed_fasta.py -f seqs.fasta -m Mapping_File.txt

Site index


sampledoc