Clean reads

General information

0.2.2 Version. clean_reads cleans NGS (Sanger, 454, Illumina and solid) reads. It can trim:

  • bad quality regions
  • adaptors
  • vectors
  • regular expresssions

It also filters out the reads that do not meet a minimum quality criteria based on the sequence length and the mean quality. It can run in parallel.

Ho to use

To submit clean_reads jobs to the queue system execute the command

send_clean_reads

It will ask few questions to build the script and submit it to the queue.

Performance

clean_reads can be executed in parallel and scales well up to 8 cores. For 12 cores the performance is very poor. In the table 1 we show the results of the benchmark. They have been executed in a 12 cores node with E5645 Xeon processors.

Execution time in seconds as a function of the number of cores
cores 1 4 8 12
Time (s) 1600 422 246 238
Speedup 1 3.8 6.5 6.7
Performance (%) 100 95 81 56

The used command has been

clean_reads -i in.fastq -o out.fastq -p illumina -f fastq -g fastq -a a.fna -d UniVec -n 20 --qual_threshold=20 --only_3_end False -m 60 -t 12

More information

clean_reads web page.