News and Announcements » – This script can be applied to remove sequences from a fasta or fastq file based on input criteria.


Usage: [options]

Input Arguments:



-f, --input_fasta_fp
Path to the input fasta file
-o, --output_fasta_fp
The output fasta filepath


-m, --otu_map
An OTU map where sequences ids are those which should be retained
-s, --seq_id_fp
A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained
-b, --biom_fp
A biom file where otu identifiers should be retained
-a, --subject_fasta_fp
A fasta file where the seq ids should be retained.
-p, --seq_id_prefix
Keep seqs where seq_id starts with this prefix
Keep seqs where seq_id starts with a sample id listed in this file
-n, --negate
Discard passed seq ids rather than keep passed seq ids [default: False]
Mapping file path (for use with –valid_states) [default: None]
Description of sample ids to retain (for use with –mapping_fp) [default: None]


OTU map-based filtering:

Keep all sequences that show up in an OTU map. -f inseqs.fasta -o otu_map_filtered_seqs.fasta -m otu_map.txt

Chimeric sequence filtering:

Discard all sequences that show up in chimera checking output. NOTE: It is very important to pass -n here as this tells the script to negate the request, or discard all sequences that are listed via -s. This is necessary to remove the identified chimeras from inseqs.fasta. -f inseqs.fasta -o non_chimeric_seqs.fasta -s chimeric_seqs.txt -n

Sequence list filtering:

Keep all sequences from as fasta file that are listed in a text file. -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

biom-based filtering:

Keep all sequences that are listed as observations in a biom file. -f inseqs.fastq -o biom_filtered_seqs.fastq -b otu_table.biom

fastq filtering:

Keep all sequences from a fastq file that are listed in a text file (note: file name must end with .fastq to support fastq filtering). -f inseqs.fastq -o list_filtered_seqs.fastq -s seqs_to_keep.txt

sample id list filtering:

Keep all sequences from a fasta file where the sample id portion of the sequence identifier is listed in a text file (sequence identifiers in fasta file must be in post-split libraries format: sampleID_seqID). -f sl_inseqs.fasta -o sample_id_list_filtered_seqs.fasta --sample_id_fp map.txt
