DL_POLY

General information

4.02 version of the MD program for macromolecules, polymers, ionic systems, solutions and other molecular systems. Developed at the Daresbury Laboratory. In Pendulo the 2.2 version remains. There is already the DL_POLY_CLASSIC version which currently is not been developed.

How to submit to the queue

The program is installed in all the architectures, Arina and Pendulo (DL_POLY 2.2). To execute it include in the scripts:

/software/bin/DL_POLY/DL_POLY.Z

The program will exekute in GPGPUs if it starts in these kind of nodes. Besides, they can be selected by using the gpu label within [intlink id=”244″ type=”post”]the queue system[/intlink].

The GUI is also installed. To execute it use:

/software/bin/DL_POLY/gui

Some utilities has been installed in the /software/bin/DL_POLY/ directory.

Benchmark

We show a small benchmarks performed with dl_ploly_4.02. We stady the parallelization as well as the performance of the GPGPUs.

System 1 cores 4 cores 8 cores 16 cores 32 cores 64 cores
Itanium 1.6 GHz 1500 419 248 149 92 61
Opteron 1230 503 264 166 74
Xeon 2.27 GHz 807 227 126 67 37 25

We show in the firs benchamrk that DL_POLY scales very well and that the xeon nodes are the fastest ones, so we recomend them for large jobs.

System 1 cores 2 cores 4 cores 8 cores 16 cores 32 cores
Itanium 1.6 GHz 2137 303 165 93 47
Opteron 1592 482 177 134 55
Xeon 2.27 GHz 848 180 92 48 28
1 GPGPU 125 114 104 102
2 GPGPU 77 72 69
4 GPGPU 53 50
8 GPGPU 37
System 1 cores 2 cores 4 cores 8 cores 16 cores 32 cores 64 cores
Xeon 2.27 GHz 2918 774 411 223 122 71
1 GPGPU 362 333 338 337
2 GPGPU 240 222 220
4 GPGPU 145 142
8 GPGPU 97

We show that the GPGPUs speedup the calculation but each time we double the number of GPGPUs the speed up is multiplied but only 1.5. Because of this for large number of GPGPUs or cores is better to use the paralelization over cores. For example, one node has 8 cores and 2 GPGPUS. The 2 GPGPUs need 220 s while 8 cores need 411 s. Still 4 GPGPUs are faster than 16 cores but 32 cores with 71 s are faster than 8 GPGPUs that need 97 s. Therefore, the GPGPUS can speedup jobs in PCs or single nodes, but for jobs that require higher parallelization the cores parallelization is more effective.

DL_POLY is designed for big systems and the use up to thousand of cores. According to the documentation:

The DL_POLY_4 parallel performance and efficiency are considered very-good-to-excellent as long as (i) all CPU cores are loaded
with no less than 500 particles each and (ii) the major linked cells algorithm has no dimension less than 4.

More information

DL_POLY web page.

DL_POLY user guide (pdf).

DL_POLY GUI user guide (pdf).