validate_demultiplexed_fasta.py – Checks a fasta file to verify if it has been properly demultiplexed, i.e., it is in QIIME compatible format.
Checks file is a valid fasta file, does not contain gaps (‘.’ or ‘-‘ characters), contains only valid nucleotide characters, no fasta label is duplicated, SampleIDs match those in a provided mapping file, fasta labels are formatted to have SampleID_X as normally generated by QIIME demultiplexing, and the BarcodeSequence/LinkerPrimerSequences are not found in the fasta sequences. Optionally this script can also verify that the SampleIDs in the fasta sequences are also present in the tip IDs of a provided newick tree file, can test for equal sequence lengths across all sequences, and can test that all SampleIDs in the mapping file are represented in the fasta file labels.
Usage: validate_demultiplexed_fasta.py [options]
- -m, --mapping_fp
- Name of mapping file. NOTE: Must contain a header line indicating SampleID in the first column and BarcodeSequence in the second, LinkerPrimerSequence in the third. If no barcode or linkerprimer sequence is present, leave data fields empty.
- -i, --input_fasta_fp
- Path to the input fasta file
- -o, --output_dir
- Directory prefix for output files [default: .]
- -t, --tree_fp
- Path to the tree file; Needed to test if sequence IDs are a subset or exact match to the tree tips, options -s and -e [default: None]
- -s, --tree_subset
- Determine if sequence IDs are a subset of the tree tips, newick tree must be passed with the -t option. [default: False]
- -e, --tree_exact_match
- Determine if sequence IDs are an exact match to tree tips, newick tree must be passed with the -t option. [default: False]
- -l, --same_seq_lens
- Determine if sequences are all the same length. [default: False]
- -a, --all_ids_found
- Determine if all SampleIDs provided in the mapping file are represented in the fasta file labels. [default: False]
- -b, --suppress_barcode_checks
- Suppress barcode checks [default: False]
- -p, --suppress_primer_checks
- Suppress primer checks [default: False]
validate_demultiplexed_fasta.py -f seqs.fasta -m Mapping_File.txt