IDBA-UD | IZO-SGI Cálculo Científico

Información general

IDBA-UD 1.1.1 is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm. IDBA-UD also iterates from small k to a large k. In each iteration, short and low-depth contigs are removed iteratively with cutoff threshold from low to high to reduce the errors in low-depth and high-depth regions. Paired-end reads are aligned to contigs and assembled locally to generate some missing k-mers in low-depth regions. With these technologies, IDBA-UD can iterate k value of de Bruijn graph to a very large value with less gaps and less branches to form long contigs in both low-depth and high-depth regions.

Cómo usar

Para enviar trabajos a la cola se puede usar el comando

send_idba-ud

que realiza unas preguntas para configurar el cálculo.

Rendimiento

IDBA-UD se ejecuta en paralelo con un buen rendimiento medido hasta por lo menos 8 cores. Por encima no se han medido mejoras apreciables. El benchmark se ha realizado con --mimk 40 --step 20. Por algún motivo este cálculo tiene un salto cualitativo apreciable de 1 a dos cores. Si se pone un step de 10 el rendimiento a varios cores empeora como se observa en la segunda tabla.

		1 core como base		2 cores como base
Cores	Tiempo (s)	Aceleración	Rendimiento (%)	Aceleración	Rendimiento (%)
1	480	1	100
2	296	1.6	81	1.0	100
4	188	2.6	64	1.6	79
8	84	5.7	71	3.5	88
12	92	5.2	43	3.2	54

El segundo benchmark se ha realizado con un fichero mayor, con 10 millones de bases y las opciones --mink 20 --step 10 --min_support 2. Observamos un comportamiento más regular que en el benchmark anterior y como la paralelización es buena hasta los 4 cores.

Cores	Tiempo (s)	Aceleración	Rendimiento (%)
1	13050	1	100
2	6675	2.0	98
4	3849	3.4	85
8	3113	4.2	52
16	2337	5.6	35
20	2409	5.4	27

Más información

Página web de IDBA-UD.