sampledoc
News and Announcements »

split_sequence_file_on_sample_ids.py – Split a single post-split_libraries.py fasta (or post-split_libraries_fastq.py fastq) file into per-sample files.

Description:

Split a single post-split_libraries.py fasta (or post-split_libraries_fastq.py fastq) file into per-sample fasta files. This script requires that the sequences identitifers are in post-split_libraries.py format (i.e., SampleID_SeqID). A file will be created for each unique SampleID.

Usage: split_sequence_file_on_sample_ids.py [options]

Input Arguments:

Note

[REQUIRED]

-i, --input_seqs_fp
The input fasta file to split
-o, --output_dir
The output directory [default: None]

[OPTIONAL]

--buffer_size
The number of sequences to read into memory before writing to file (you usually won’t need to change this) [default: 500]
--file_type
Type of file. Either fasta or fastq

Output:

This script will produce an output directory with as many files as samples.

Split seqs.fna into one fasta file per sample and store the resulting fasta files in ‘out’

split_sequence_file_on_sample_ids.py -i seqs.fna -o out/

Split seqs.fastq into one fastq file per sample and store the resulting fastq files in ‘out_fastq’

split_sequence_file_on_sample_ids.py -i seqs.fastq --file_type fastq -o out_fastq/

sampledoc