sampledoc
News and Announcements »

join_paired_ends.py – Joins paired-end Illumina reads.

Description:

This script takes forward and reverse Illumina reads and joins them using the method chosen. Will optionally create an updated index reads file containing index reads for the surviving joined paired end reads. If the option to write an updated index file is chosen, be sure that the order and header format of the index reads is the same as the order and header format of reads in the files that will be joined (this is the default for reads generated on the Illumina instruments).

Currently, there are two methods that can be selected by the user to join paired-end data:

  1. fastq-join - Erik Aronesty, 2011. ea-utils : “Command-line tools for processing biological sequencing data” (http://code.google.com/p/ea-utils)
  2. SeqPrep - (https://github.com/jstjohn/SeqPrep)

Usage: join_paired_ends.py [options]

Input Arguments:

Note

[REQUIRED]

-f, --forward_reads_fp
Path to input forward reads in FASTQ format.
-r, --reverse_reads_fp
Path to input reverse reads in FASTQ format.
-o, --output_dir
Directory to store result files

[OPTIONAL]

-m, --pe_join_method
Method to use for joining paired-ends. Valid choices are: fastq-join, SeqPrep [default: fastq-join]
-b, --index_reads_fp
Path to the barcode / index reads in FASTQ format. Will be filtered based on surviving joined pairs.
-j, --min_overlap
Applies to both fastq-join and SeqPrep methods. Minimum allowed overlap in base-pairs required to join pairs. If not set, progam defaults will be used. Must be an integer. [default: None]
-p, --perc_max_diff
Only applies to fastq-join method, otherwise ignored. Maximum allowed % differences within region of overlap. If not set, progam defaults will be used. Must be an integer between 1-100 [default: None]
-y, --max_ascii_score
Only applies to SeqPrep method, otherwise ignored. Maximum quality score / ascii code allowed to appear within joined pairs output. For more information, please see: http://en.wikipedia.org/wiki/FASTQ_format. [default: J]
-n, --min_frac_match
Only applies to SeqPrep method, otherwise ignored. Minimum allowed fraction of matching bases required to join reads. Must be a float between 0-1. If not set, progam defaults will be used. [default: None]
-g, --max_good_mismatch
Only applies to SeqPrep method, otherwise ignored. Maximum mis-matched high quality bases allowed to join reads. Must be a float between 0-1. If not set, progam defaults will be used. [default: None]
-6, --phred_64
Only applies to SeqPrep method, otherwise ignored. Set if input reads are in phred+64 format. Output will always be phred+33. [default: False]

Output:

All paired-end joining software will return a joined / merged / assembled paired-end fastq file. Depending on the method chosen, additional files may be written to the user-specified output directory.

  1. fastq-join will output fastq-formatted files as:
    • “*.join”: assembled / joined reads output
    • “*.un1”: unassembled / unjoined reads1 output
    • “*.un2”: unassembled / unjoined reads2 output
  2. SeqPrep will output fastq-formatted gzipped files as:
    • “*_assembled.gz”: unassembled / unjoined reads1 output
    • “*_unassembled_R1.gz”: unassembled / unjoined reads1 output
    • “*_unassembled_R2.gz”: unassembled / unjoined reads2 output
  3. If a barcode / index file is provided via the ‘-b’ option, an updated barcodes file will be output as:
    • ”..._barcodes.fastq”: This barcode / index file must be used in conjunction with the joined paired-ends file as input to split_libraries_fastq.py. Except for missing reads that may result from failed merging of paired-ends, the index-reads and joined-reads must be in the same order.

Join paired-ends with ‘fastq-join’:

This is the default method to join paired-end Illumina data:

join_paired_ends.py -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -o $PWD/fastq-join_joined

Join paired-ends with ‘SeqPrep’:

Produces similar output to the ‘fastq-join’ but returns data in gzipped format.

join_paired_ends.py -m SeqPrep -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -o $PWD/SeqPrep_joined

Update the index / barcode reads file to match the surviving joined pairs.:

This is required if you will be using split_libraries_fastq.py.

join_paired_ends.py -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -b $PWD/barcodes.fastq -o $PWD/fastq-join_joined

sampledoc