Monthly Archives: August 2011

DL_POLY

General information

4.02 version of the MD program for macromolecules, polymers, ionic systems, solutions and other molecular systems. Developed at the Daresbury Laboratory. In Pendulo the 2.2 version remains. There is already the DL_POLY_CLASSIC version which currently is not been developed.

How to submit to the queue

The program is installed in all the architectures, Arina and Pendulo (DL_POLY 2.2). To execute it include in the scripts:

/software/bin/DL_POLY/DL_POLY.Z

The program will exekute in GPGPUs if it starts in these kind of nodes. Besides, they can be selected by using the gpu label within [intlink id=”244″ type=”post”]the queue system[/intlink].

The GUI is also installed. To execute it use:

/software/bin/DL_POLY/gui

Some utilities has been installed in the /software/bin/DL_POLY/ directory.

Benchmark

We show a small benchmarks performed with dl_ploly_4.02. We stady the parallelization as well as the performance of the GPGPUs.

System 1 cores 4 cores 8 cores 16 cores 32 cores 64 cores
Itanium 1.6 GHz 1500 419 248 149 92 61
Opteron 1230 503 264 166 74
Xeon 2.27 GHz 807 227 126 67 37 25

We show in the firs benchamrk that DL_POLY scales very well and that the xeon nodes are the fastest ones, so we recomend them for large jobs.

System 1 cores 2 cores 4 cores 8 cores 16 cores 32 cores
Itanium 1.6 GHz 2137 303 165 93 47
Opteron 1592 482 177 134 55
Xeon 2.27 GHz 848 180 92 48 28
1 GPGPU 125 114 104 102
2 GPGPU 77 72 69
4 GPGPU 53 50
8 GPGPU 37
System 1 cores 2 cores 4 cores 8 cores 16 cores 32 cores 64 cores
Xeon 2.27 GHz 2918 774 411 223 122 71
1 GPGPU 362 333 338 337
2 GPGPU 240 222 220
4 GPGPU 145 142
8 GPGPU 97

We show that the GPGPUs speedup the calculation but each time we double the number of GPGPUs the speed up is multiplied but only 1.5. Because of this for large number of GPGPUs or cores is better to use the paralelization over cores. For example, one node has 8 cores and 2 GPGPUS. The 2 GPGPUs need 220 s while 8 cores need 411 s. Still 4 GPGPUs are faster than 16 cores but 32 cores with 71 s are faster than 8 GPGPUs that need 97 s. Therefore, the GPGPUS can speedup jobs in PCs or single nodes, but for jobs that require higher parallelization the cores parallelization is more effective.

DL_POLY is designed for big systems and the use up to thousand of cores. According to the documentation:

The DL_POLY_4 parallel performance and efficiency are considered very-good-to-excellent as long as (i) all CPU cores are loaded
with no less than 500 particles each and (ii) the major linked cells algorithm has no dimension less than 4.

More information

DL_POLY web page.

DL_POLY user guide (pdf).

DL_POLY GUI user guide (pdf).

Espresso

General information

opEn-SourceP ackage for Research in Electronic Structure, Simulation, and Optimization

ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft).

The 6.1 version is availabe.  The home page of the code is in  DEMOCRITOS National Simulation Center of the Italian INFM.

Quantum ESPRESSO builds onto newly-restructured electronic-structure codes (PWscf, PHONON, CP90, FPMD, Wannier) that have been developed and tested by some of the original authors of novel electronic-structure algorithms – from Car-Parrinello molecular dynamics to density-functional perturbation theory – and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency is still our main focus.

How to use

[intlink id=”4795″ type=”post”] See how to send espresso  section.[/intlink]

Monitorization

  • remote_vi: Shows the *.out file of espresso.
  • myjobs: During the execution of a job it shows the CPU and memory (SIZE) usage.

Benchmark

We show various benchmarks results for  ph.x and pw.xy in our  service the machines. The best are the Xeon nodes and scale well up to 32 cores. Notice that the communication network in the  Xeon nodes is better.

Tabla 1:Execution times pw.x (4.2.1 version).
System 8 cores 16 cores 32 cores
Xeon 1405 709 378
Itanium2 2614 1368 858
Opteron 2.4 4320 2020 1174
Core2duo 2.1
Tabla 2: Execution times ph.x (versión 4.2.1)
System 8 cores 16 cores 32 cores
Xeon 2504 1348 809
Itanium2 2968 1934 1391
Opteron 2.4 6240 3501 2033
Core2duo 2.1

More information

ESPRESSO Web page.

On Line Documentation.

ESPRESSO Wiki.

send_espresso

send_espresso

To launch espresso calculations to the queue system the send_espresso script is available. Executing it, send_espresso [Enter], the syntax of the command is shown:

send_espresso input Executable Nodes Procs_per_node Time Mem [``Otherqueue options'' ]
Input Name of the espresso input file without extension
Executable Name of the espresso program you want to use: pw.x, ph.x, cp.x,…
Nodos Number of nodes
Procs_per_node: Is the number of processors per node
Time: The walltime (in hh:mm:ss format) or the queue name
Mem Memory in  GB (without the unit)
[“Otras opciones de Torque”] See example bellow

Examples

Example1: send_espresso job1 pw.x 1 4 04:00:00 1
Example2: send_espresso job2 cp.x 2 4 192:00:00 8 "-W depend=afterany:1234"
Example3: send_espresso job5 pw.x 4 8 192:00:00 8 "-m bea -M email@adress.com"

Traditional way

The executables can be found in /software/Espresso, for instance to execute pw.x in queue script use

source /software/Espresso/compilervars.sh
/software/Espresso/bin/pw.x -npool ncores < input_file > output_file
In the -npool ncores option substitute ncores by the number of cores of the job.

How to send Turbomole

send_turbo

To launch turbomole calculations to the queue system send_turbo is available. Executing it, send_turbo without arguments the syntax of the command and examples are shown:

send_turbo "EXEC and Options" JOBNAME TIME[or QUEUE] PROCS[property]  MEM [``Other queue options'' ]

  • EXEC: Name of the Turbomole program you wnat to use.
  • JOBNAME: Name of the Turbomole control file (usually control).
  • PROCS: is the number of processors (you can not include the node type).
  • TIME[or QUEUE]: the walltime (in hh:mm:ss format) or the queue name.
  • MEM: memory in  GB (without the unit).
  • [“Other queue options”]  see examples below.

Examples

To run Turbomole (jobex) with the control input file in 8 cores and 1 GB of RAM execute:

send_turbo jobex control 04:00:00 8 1

To run Turbomole (jobex -ri) with the control input file in 16 cores, 8 GB of RAM and after 1234 job has finished execute:

send_turbo jobex -ri control 192:00:00 16 8 ``-W depend=afterany:1234''

Turbomole

Presently TURBOMOLE is one of the fastest and most stable codes available for standard quantum chemical applications. Unlike many other programs, the main focus in the development of TURBOMOLE has not been to implement all new methods and functionals, but to provide a fast and stable code which is able to treat molecules of industrial relevance at reasonable time and memory requirements.

General information

TURBOMOLE is used by academic and industrial researchers. It is used in research areas ranging form homogeneous and heterogeneous catalysis, inorganic and organic chemistry to various types of spectroscopy, and biochemistry. The philosophy behind the development of the code was, and still is, its usefulness for applications.
It provides:
  • all standard and state of the art methods for ground state calculations (Hartree-Fock, DFT, MP2, CCSD(T))
  • excited state calculations at different levels (full RPA, TDDFT, CIS(D), CC2, ADC(2), …)
  • geometry optimizations, transition state searches, molecular dynamics calculations
  • various properties and spectra (IR, UV/Vis, Raman, CD)
  • fast and reliable code, approximations like RI are used to speed-up the calculations without introducing uncontrollable or unkown errors
  • parallel version for almost all kind of jobs
  • free graphical user interface

How to use it

The programme is in  guinness at /software/TURBOMOLE.We have created the send_turbo script to facilitate the way to send turbomole calculations to the queue. See [intlink id=”4755″ type=”post”]How to send Turbomole[/intlink].

TmoleX,  is also available, to help the input creationd and analisys of the results. There is a free download of TmoleX that you can install in your PC or it is available on Guinness. To use TmoleX execute:

TmoleX

To cleanly stop a job after the current iteration, for example the 1234.arina job, use the command:

turbomole_stop 1234

Remember to delete the “stop” file in the directory if you want to resubmit the calculation.

More Infromation

Turbomole web page.

Turbomole Manual

Turbomole Tutorial