Archive

Archive for the ‘HPC’ Category

El nuevo supeordenador más potente del mundo vuelve a ser chino

June 20th, 2016

El superodenador Sunway TaihuLight es el nuevo superordenador más potente del mundo. Este superordenador chino ha relegado al segundo lugar de la lista de supercomputadoras más potentes del mundo top500.org al también chino Tianhe-2, que ha permanecido 6 listas consecutivas (se actualiza semestralmente) como la supercomputadora más potente del mundo

Sunway TaihuLight tiene un rendimiento teórico de 125 PetaFLOPS y un rendimiento real de 93 PetaFLOPS con el benchmark HPL (High Performance Linpack) lo que representa una eficiencia del 74%. Con esto casi triplica el rendimiento de su antecesor el Tianhe-2 (34 PetaFLOPS y una eficiencia del 62%) con además una eficiencia superior.

sunway-taihulight-300x240Uno de los hitos más importantes de Sunway TaihuLigth es que está construido tecnología 100% china, entre ellas el procesador el SW26010, con lo que consigue la independencia tecnológica de EE.UU. SunWay TaihuLight no usa coprocesadores sino 40.960 nodos, con 4 procesadores cada nodo, con 64 cores de cómputo más otro de gestión que también participa del cálculo para un total de más de 10,5 millones de cores. No obstante, Sunway TaihuLight a causa de su acceso a memoria y su red de comunicación tiene un rendimiento más pobre en el bechmark más actual HPCG (High Performance Conjutate Gradients), donde incluso el Tianhe-2 lo supera, lo que puede suponer que no sea tan eficaz corriendo un espectro más amplio de aplicaciones. No obstante, si hay software que se ha desarrollado paralelamente al proyecto del hardware y que corre muy eficientemente en Sunway TaihuLigth.

China desbanca a EE.UU en el Top500

En la lista de Junio de 2016 es la primera vez en la historia del Top500 que EE.UU no es el país con el mayor número de sistemas instalados. En esta lista dispone de 165 sistemas que supera China con 167 supercomputadoras. China también es el país que más rendimiento aglutina en toda la lista, gracias especialmente a los dos superordenadores que tiene como números 1 y 2 de la lista.

 

General, HPC

El uso de HPC por PayPal

June 8th, 2016

Es este video PayPal nos muestra como necesita hacer uso de HPC para gestionar todas las pagos online y transacciones que realiza y gestionar la información de todos sus usuarios.

 

 

 

General, HPC

HPCren erabilera PAYPALen

June 8th, 2016

Bideo honetan PayPalek erakusten digu nola erabiltzen duen HPCa egunero egin kudeatu behar dituen ordainketentzako, transakzio ekonomikoentzat eta erabiltzailei buruzko inforamazioa.

 

 

 

General, HPC

Massively Parallel Systems Summer School (PUMPS-2016)

May 20th, 2016

Introduction

Barcelona Computing Week 2014, July 11-15, at BSC/UPC, Barcelona.

BSC - Barcelona Supercomputing CenterThe BSC/UPC has been awared by Nvidia as GPU Center of Excellence. The Programming and tUning Massively Parallel Systems Summer School (PUMPS) offers researchers and graduate students a unique opportunity to enrich their skills with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.

Participants will have access to a multi-node cluster of GPUs, and will learn to program and optimize applications in languages such as CUDA and OmpSs. Teaching assistants will be available to help with assignments.

 

Important information

  • Applications due: May 31
  • Notification of acceptance: June 10
  • Summer school dates: July 11-15, 2014
  • Location: Barcelona Supercomputing Center / Computer Architecture Dept. at Universitat Politecnica de Catalunya, Barcelona, Spain. Room TDB.

Lecturers

  • Distinguished Lectures:
    • Wen-mei Hwu, University of Illinois at Urbana-Champaign.
    • David Kirk, NVIDIA Fellow, former Chief Scientist, NVIDIA Corporation.
  • Invited Lecturer: Juan Gómez Luna (Universidad de Cordoba).
  • BSC/UPC Lecturers:
    • Xavier Martorell
    • Xabier Teruel
  • .Teaching Assistants:
    • Abdul Dakkak, Carl Pearson, Simon Garcia de Gonzalo, Marc Jorda, Pau Farre, Javier Bueno, Aimar Rodriguez.

The list of topics

  • CUDA Algorithmic Optimization Strategies.
  • Dealing with Sparse and Dynamic data.
  • Efficiency in Large Data Traversal.
  • Reducing Output Interference.
  • Controlling Load Imbalance and Divergence.
  • Acceleration of Collective Operations.
  • Dynamic Parallelism and HyperQ.
  • Debugging and Profiling CUDA Code.
  • Multi-GPU Execution.
  • Architecture Trends and Implications
  • FORTRAN Interoperability and CUDA Libraries.
  • Introduction to OmpSs and to the Paraver analysis tool.
  • OmpSs: Leveraging GPU/CUDA Programming.
  • Hands-on Labs: CUDA Optimizations on Scientific Codes. OmpSs Programming and Tuning.

More information

 

 

Anuncios, HPC

Breve revisión de la computación en GPU, evolución en Nvidia

May 17th, 2016

Autor: Luis Fer Coca, alumno del centro IEFPS Elorrieta-Erreka Mari en prácticas en el IZO-SGI.

 

La GPU o Unidad de Procesamiento de Gráficos es un coprocesador dedicado al procesamiento de gráficos, que sirve para aligerar la carga de trabajo del procesador en el procesamiento de los gráficos a mostrar en pantalla.

Pero en 2007 surgió el cálculo acelerado en la GPU, que lo podemos definir como el uso de una unidad de procesamiento gráfico en combinación con la CPU para acelerar los análisis y cálculos de aplicaciones de investigación, empresa, consumo, ingeniería,etc. Desde entonces, las GPU aceleradoras han pasado a instalarse en centros de datos de laboratorios gubernamentales, universidades, grandes compañías y PYMEs de todo el mundo.

El uso y mercado de las GPU, ya no es sólo como tarjetas gráficas para disfrutar de los últimos videojuegos con la mejor calidad, si no que también en el sector del HPC con ordenadores de alto rendimiento que ejecutan miles de millones de operaciones, y donde las GPUs hacen determinadas operaciones más rápidas que una CPU del mismo rango.

La GPU frente al procesador

cpu and gpuUna forma sencilla de entender la diferencia entre el procesador y la GPU es comparar la forma en que procesan las tareas. Un procesador está formado por varios núcleos de propósito general, mientras que una GPU consta de millares de núcleos más pequeños y eficientes, diseñados para ejecutar múltiples operaciones matemáticas simultáneamente.

Las GPUs en el Top500

gpu top500

Como podemos observar en la gráfica del Top500, entre los sistemas con coprocesadores, las GPU aceleradoras de NVIDIA y AMD dominan actualmente el Top500, sobre los Intel Phi y el nuevo acelerador del fabricante de chips japoneses PEZY COMPUTING.Asímismo observamos el crecimiento que este tipo de tecnologías están teniendo en el mundo del HPC.

 Evolución de las GPU Nvidia

Las claves que han hecho la plataforma de aceleración computacional de NVIDIA las más popular para cálculo científico es que combinan las GPUs aceleradoras, el modelo de procesamiento paralelo CUDA, un amplio ecosistema de desarrolladores, proveedores de software y fabricantes de sistemas para HPC y el haber sido una de las pioneras.

Recientemente, NVDIA presentó la nueva Tesla P100 y es oficialmente la GPU más potente del mercado, orientada al sector de los HPC, forma parte de la nueva generación de chips Pascal, que sucede a la generación Maxwell. Está basado en el nuevo proceso de fabricación FinFET de 16 nm, lo que ha permitido multiplicar por dos la densidad de transistores, usando un 70% menos de energía al mismo tiempo. Es una GPU más potente, pero también más eficiente.

En la tabla se observa evolución de algunas GPUs ejemplo de Nvidia.

Arquitectura Fermi Kepler Pascal
GPU M2050 K20 K80 P100
Procesadores 448 2496 4992 3584
Velocidad (MHz) 1150 706 562 1328
RAM (GB) 3 5 24 16
Rendimiento en precisión simple 1.03 TFLOPS 3.52 TFLOPS 8.74 TFLOPS 10.6 TFLOPS
Rendimiento en doble precisión 515 GFLOPS 1.17 TFLOPS 2.91 TFLOPS 5.3 TFLOPS
Número de transistores (millones) 3.1 7.1 14.2 15.3
Consumo eléctrico 225W 225W 300W 300W

GPUs Nvidia en el Servicio de Cálculo

El Servicio de Cálculo de la UPV/EHU ha ido adquiriendo esta tecnología desde 2010 para ofrecer un pequeño entorno de producción y test en este campo a los investigadores, actualmente tiene 4 GPUs Fermi C2050, 4 Fermi  C2070 y dos Kepler K20. Igualmente está prevista la adquisición de GPUs Kepler k80.

Conclusión

Las tecnologías de coprocesamiento, entre ellas las GPUs, siguen en continuo desarrollo permitiendo el progreso del cálculo científico y preservando la posición del mismo como una herramienta muy útil para la investigación e innovación en infinidad de campos.

Referencias

http://www.nvidia.es/object/tesla-high-performance-computing-es.html

http://wccftech.com/nvidia-pascal-gpu-gtc-2016/

http://www.omicrono.com/2016/04/nvidia-tesla-100/

http://www.almatech.es/nvidia-incorpora-gpu-a-la-arquitectura-kepler/

 

 

General, HPC

Installing Tensorflow 0.7 in Red Hat Enterprise Linux Server 6.4 with GPUs

April 7th, 2016

Red Hat Enterprise Linux Server  6.4 has already a quite old OS, and it is not possible to install the precompiled tensorflow packages with pip and so on. So we had to compile it.

The instructions we follow are based on this document:

https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#installation-for-linux

We want to install it to be run on GPUs, so first we need to register in nvidia

https://developer.nvidia.com/cudnn

to install the cuDNN libraries. Download and copy the include and libraries to the corresponding cuda version directories.

In order to compile tensorflow from source, we first need to compile the bazel compiler.

1.- Installing Bazel

The first problem is that the gcc/g++ compiler in RHELS 6.4 is old,  4.4.7, and at least 4.8 is required. We did install RH devtoolset-3 which provides 4.9.2 version of gcc/g++. We also need Java JDK 8 or later.

Now, we can download bazel:

git clone https://github.com/bazelbuild/bazel.git

And we set up our environment to compile bazel:

export JAVA_HOME=/software/jdk1.8.0_20
export PATH=/opt/rh/devtoolset-3/root/usr/bin:/software/anaconda2/bin:/software/jdk1.8.0_20/bin:$PATH
export LD_LIBRARY_PATH=/opt/rh/devtoolset-3/root/usr/lib64:/opt/rh/devtoolset-3/root/usr/lib:/software/anaconda2/lib64:/software/anaconda2/lib:$LD_LIBRARY_PATH

Then, we have to modify the bazel/tools/cpp/CROSSTOOL file to choose the commands from devtoolset-3 instead of the default ones, in the toolchain_identifier: “local_linux”  

toolchain {
abi_version: “local”
abi_libc_version: “local”
builtin_sysroot: “”
compiler: “compiler”
host_system_name: “local”
needsPic: true
supports_gold_linker: false
supports_incremental_linker: false
supports_fission: false
supports_interface_shared_objects: false
supports_normalizing_ar: false
supports_start_end_lib: false
supports_thin_archives: false
target_libc: “local”
target_cpu: “local”
target_system_name: “local”
toolchain_identifier: “local_linux”
tool_path { name: “ar” path: “/opt/rh/devtoolset-3/root/usr/bin/ar” }
tool_path { name: “compat-ld” path: “/opt/rh/devtoolset-3/root/usr/bin/ld” }
tool_path { name: “cpp” path: “/opt/rh/devtoolset-3/root/usr/bin/cpp” }
tool_path { name: “dwp” path: “/opt/rh/devtoolset-3/root/usr/bin/dwp” }
tool_path { name: “gcc” path: “/opt/rh/devtoolset-3/root/usr/bin/gcc” }
cxx_flag: “-std=c++0x”
linker_flag: “-lstdc++”
linker_flag: “-B/opt/rh/devtoolset-3/root/usr/bin/

# TODO(bazel-team): In theory, the path here ought to exactly match the path
# used by gcc. That works because bazel currently doesn’t track files at
# absolute locations and has no remote execution, yet. However, this will need
# to be fixed, maybe with auto-detection?
cxx_builtin_include_directory: “/opt/rh/devtoolset-3/root/usr/lib/gcc/
cxx_builtin_include_directory: “/opt/rh/devtoolset-3/root/usr/include
tool_path { name: “gcov” path: “/opt/rh/devtoolset-3/root/usr/bin/gcov” }

# C(++) compiles invoke the compiler (as that is the one knowing where
# to find libraries), but we provide LD so other rules can invoke the linker.
tool_path { name: “ld” path: “/opt/rh/devtoolset-3/root/usr/bin/ld” }

tool_path { name: “nm” path: “/opt/rh/devtoolset-3/root/usr/bin/nm” }
tool_path { name: “objcopy” path: “/opt/rh/devtoolset-3/root/usr/bin/objcopy” }
objcopy_embed_flag: “-I”
objcopy_embed_flag: “binary”
tool_path { name: “objdump” path: “/opt/rh/devtoolset-3/root/usr/bin/objdump” }
tool_path { name: “strip” path: “/opt/rh/devtoolset-3/root/usr/bin/strip” }
compilation_mode_flags {
mode: DBG
# Enable debug symbols.
compiler_flag: “-g”
}
compilation_mode_flags {
mode: OPT
# No debug symbols.
# Maybe we should enable https://gcc.gnu.org/wiki/DebugFission for opt or even generally?
# However, that can’t happen here, as it requires special handling in Bazel.
compiler_flag: “-g0”

# Conservative choice for -O
# -O3 can increase binary size and even slow down the resulting binaries.
# Profile first and / or use FDO if you need better performance than this.
compiler_flag: “-O2”

# Disable assertions
compiler_flag: “-DNDEBUG”

# Removal of unused code and data at link time (can this increase binary size in some cases?).
compiler_flag: “-ffunction-sections”
compiler_flag: “-fdata-sections”
}
linking_mode_flags { mode: DYNAMIC }
}

Now, we can compile it with the command:

./compile.sh

It will create a binary bazel, that we will use now to compile tensorflow.

 

2.- Tensorflow

Download Tensorflow

git clone –recurse-submodules https://github.com/tensorflow/tensorflow

We define the environment to compile tensorflow with the devtools-3 and with cuda

export JAVA_HOME=/software/jdk1.8.0_20

export PATH=/software/jdk1.8.0_20/bin:/opt/rh/devtoolset-3/root/usr/bin:/software/anaconda2/bin:/software/cuda-7.5.18/bin:$PATH

export LD_LIBRARY_PATH=/opt/rh/devtoolset-3/root/usr/lib64:/opt/rh/devtoolset-3/root/usr/lib:/software/cuda-7.5.18/lib64:/software/anaconda2/lib64:/software/anaconda2/lib:$LD_LIBRARY_PATH

We run configure in tensorflow to setup our cuda envirnment:

cd tensorflow

Fix the google/protobuf/BUILD file changing:

LINK_OPTS = [“-lpthread”]

to

LINK_OPTS = [“-lpthread”,”-lrt”,”-lm”]

and configure it

./configure
Please specify the location of python. [Default is /software/anaconda2/bin/python]:
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /software/cuda-7.5.18
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /software/cuda-7.5.18]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: “3.5,5.2”]: 3.5

 

As we did in bazel, we need to fix the CROSSTOOL file third_party/gpus/crosstool/CROSSTOOL:

 

toolchain {
abi_version: “local”
abi_libc_version: “local”
builtin_sysroot: “”
compiler: “compiler”
host_system_name: “local”
needsPic: true
supports_gold_linker: false
supports_incremental_linker: false
supports_fission: false
supports_interface_shared_objects: false
supports_normalizing_ar: false
supports_start_end_lib: false
supports_thin_archives: false
target_libc: “local”
target_cpu: “local”
target_system_name: “local”
toolchain_identifier: “local_linux”

tool_path { name: “ar” path: “/opt/rh/devtoolset-3/root/usr/bin/ar” }
tool_path { name: “compat-ld” path: “/opt/rh/devtoolset-3/root/usr/bin/ld” }
tool_path { name: “cpp” path: “/opt/rh/devtoolset-3/root/usr/bin/cpp” }
tool_path { name: “dwp” path: “/opt/rh/devtoolset-3/root/usr/bin/dwp” }
# As part of the TensorFlow release, we place some cuda-related compilation
# files in third_party/gpus/crosstool/clang/bin, and this relative
# path, combined with the rest of our Bazel configuration causes our
# compilation to use those files.
tool_path { name: “gcc” path: “clang/bin/crosstool_wrapper_driver_is_not_gcc” }
# Use “-std=c++11” for nvcc. For consistency, force both the host compiler
# and the device compiler to use “-std=c++11”.
cxx_flag: “-std=c++11”
linker_flag: “-lstdc++”
linker_flag: “-B/opt/rh/devtoolset-3/root/usr/bin/”

# TODO(bazel-team): In theory, the path here ought to exactly match the path
# used by gcc. That works because bazel currently doesn’t track files at
# absolute locations and has no remote execution, yet. However, this will need
# to be fixed, maybe with auto-detection?
cxx_builtin_include_directory: “/opt/rh/devtoolset-3/root/usr/lib/gcc/”
cxx_builtin_include_directory: “/usr/local/include”
cxx_builtin_include_directory: “/usr/include”
cxx_builtin_include_directory: “/opt/rh/devtoolset-3/root/usr/include”
tool_path { name: “gcov” path: “/opt/rh/devtoolset-3/root/usr/bin/gcov” }

# C(++) compiles invoke the compiler (as that is the one knowing where
# to find libraries), but we provide LD so other rules can invoke the linker.
tool_path { name: “ld” path: “/opt/rh/devtoolset-3/root/usr/bin/ld” }

tool_path { name: “nm” path: “/opt/rh/devtoolset-3/root/usr/bin/nm” }
tool_path { name: “objcopy” path: “/opt/rh/devtoolset-3/root/usr/bin/objcopy” }
objcopy_embed_flag: “-I”
objcopy_embed_flag: “binary”
tool_path { name: “objdump” path: “/opt/rh/devtoolset-3/root/usr/bin/objdump” }
tool_path { name: “strip” path: “/opt/rh/devtoolset-3/root/usr/bin/strip” }

# Anticipated future default.
unfiltered_cxx_flag: “-no-canonical-prefixes”

# Make C++ compilation deterministic. Use linkstamping instead of these
# compiler symbols.
unfiltered_cxx_flag: “-Wno-builtin-macro-redefined”
unfiltered_cxx_flag: “-D__DATE__=\”redacted\””
unfiltered_cxx_flag: “-D__TIMESTAMP__=\”redacted\””
unfiltered_cxx_flag: “-D__TIME__=\”redacted\””

# Security hardening on by default.
# Conservative choice; -D_FORTIFY_SOURCE=2 may be unsafe in some cases.
# We need to undef it before redefining it as some distributions now have
# it enabled by default.
compiler_flag: “-U_FORTIFY_SOURCE”
compiler_flag: “-D_FORTIFY_SOURCE=1”
compiler_flag: “-fstack-protector”
compiler_flag: “-fPIE”
linker_flag: “-pie”
linker_flag: “-Wl,-z,relro,-z,now”

# Enable coloring even if there’s no attached terminal. Bazel removes the
# escape sequences if –nocolor is specified. This isn’t supported by gcc
# on Ubuntu 14.04.
# compiler_flag: “-fcolor-diagnostics”

# All warnings are enabled. Maybe enable -Werror as well?
compiler_flag: “-Wall”
# Enable a few more warnings that aren’t part of -Wall.
compiler_flag: “-Wunused-but-set-parameter”
# But disable some that are problematic.
compiler_flag: “-Wno-free-nonheap-object” # has false positives

# Keep stack frames for debugging, even in opt mode.
compiler_flag: “-fno-omit-frame-pointer”

# Anticipated future default.
linker_flag: “-no-canonical-prefixes”
unfiltered_cxx_flag: “-fno-canonical-system-headers”
# Have gcc return the exit code from ld.
linker_flag: “-pass-exit-codes”
# Stamp the binary with a unique identifier.
linker_flag: “-Wl,–build-id=md5”
linker_flag: “-Wl,–hash-style=gnu”
# Gold linker only? Can we enable this by default?
# linker_flag: “-Wl,–warn-execstack”
# linker_flag: “-Wl,–detect-odr-violations”

compilation_mode_flags {
mode: DBG
# Enable debug symbols.
compiler_flag: “-g”
}

Similarly, we also need to fix /third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc to choose the devtools-3 tools.

Now, we will build a build_pip_package:

bazel build -c opt –config=cuda –genrule_strategy=standalone –verbose_failures //tensorflow/tools/pip_package:build_pip_package

I got an error, "ImportError: No module named argparse" so I had to change also the first line of /third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc to:

#!/usr/bin/env /software/anaconda2/bin/python2

Then, we create the python wheel package:

tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

In my case, I had to run

/root/.cache/bazel/_bazel_root/28150a65056607dabfb056aa305868ed/tensorflow/bazel-out/local_linux-opt/bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

And finally we can install it using pip:

bazel build -c opt –config=cuda –genrule_strategy=standalone //tensorflow/cc:tutorials_example_trainer

pip install /tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-any.whl

We also tried to compile the tutorials_example_trainer as shown in the tensorflow webpage:

bazel build -c opt –config=cuda –genrule_strategy=standalone //tensorflow/cc:tutorials_example_trainer

And successfully run the test:

tutorials_example_trainer –use_gpu

 

We would like to thank Kike from IXA-taldea for sharing with us his guide to compile tensorflow in CPUs.

 

 

General, HPC

Tianhe-2 jarraitzen du 5. aldiz munduko superordenagailurik boteretsuena bezala

August 19th, 2015

top500.org munduko ordenagailu ahaltsuenen zerrendaren arabera, superkonputazioaren ahalmena geldialdi batean sartu da azkeneko urteotan. Gaur egun, 2015ko Ekaina, Tianhe-2 (Esne-bidea-2) da zerrendako superkonputagailu ahaltsuena. Superkonputagailu hau, Txinako National University of Defense Technology (NUDT) unibertsitatean dago eta Tianhe-1A superkonputagailuaren ondorengo teknologikoa da, azken hau, Tianjingo National Supercomputer Center-ean dago eta 2010ko Azaroko top500.org zerrendan lehengo postua lortu zuen jada.

500. superodenagailua top500 zerrendanErraldoi hau hobeto ezagutu baino lehenengo, esan behar da sei hilabetean behin eguneratzen den zerrenda honetan, Tianhe-2 superkonputagailuak bi urte eta erdi jarraian daramatza lehenengo lekuan eta Earth Simulator ordenadore japoniarrak ezarritako marka berdindu duela (2002-2004). Hau oso arraroa da ordenagailuak oso azkar garatzen direlako, baina ez da ordenagailu ahaltsuagorik eraiki azkeneko bi urtetan. Ez da datu hau ordea kezkatzeko modukoa den bakarra. Zerrendako lehenengo hamar ordenagailuen artean, azkeneko 4 zerrendetan bakarrik 4 berri sartu dira eta top10-ean agertzen diren superordenagailuen “adinak” ez du aurrekaririk. Geldialdi hau argi eta garbi erakusten duen beste datu bat, zerrendako azken tokian dagoen superkonputagailuaren potentzia da, 500. ordenagailua alegia. Honen ahalmena, zerrendako lehenen ordenagailuarenak baino joera askoz leunagoa edo jarraiagoa du. 1994-2008 tartean, potentzia honek urtero %90-eko hazkundea izan du, baina azken 6 urtetan %55-ekoa izan da soilik. Azken zerrenda honetan efektu hau 500 ordenagailuen potentziaren gehiketan argi nabarmentzen da ere. Orain arte mantentzen zuen aurreko joera top10 ordenagailuei ezker. Espero dugu tendentzia hau haustea hurrengo urtetan, izan ere, EEBB-tako gobernuak CORAL proiektua jarri du martxan, “exascale” bidean (exaFLOPS=1000 PFLOPS), bi superodenagailu eraikitzeko 2017 urte inguran eta 235 milioi dolarrekin finantziatu du 100 milioi gehituz garapen teknologikorako. Gainera Obama presidenteak 5 urteko ekimen presidentziala sortu du superkonputazioa bultzatzeko.

Tianhe-2 Supercomputer

Tianhe-2 Supercomputer

Tianhe-2 16.000 nodoz osatuta dago. Nodo bakoitza bi Intel Xeon Ivy Bridge prozesagailuz eta 3 Xeon Phi koprozesagailuz, kalkulu matematikoak azkartzeko, osatuta dago. Guztira 3.120.000 kalkulu kore ditu, 384.000 Xeon koretan eta 2.736.000 Phi koretan banatuta. Hiru koprozesadore izatea berrikuntza bat da ohikoena 1 edo 2 izatea baitzen orain arte. Kore kopuru izugarri honek aplikazio matematikoak egikaritzeko 54,9 PFLOPS-etako (Peta=10^15 Floating-point Operation Per Second, 1.000.000.000.000.000 segundo bakoitzeko operazio kopurua) errendimendu teorikoa ematen dio eta LINPACK benchmarka egikaritzean 33,9 PFLOPS-etara iristen da, ia zerrendako aurreko lehenengo postua zuen Titan superordenagailuaren bikoitza.

Nodoen arteko komunikazio sarea beraiek diseinatutako elektronika darama, TH Express-2, eta komunikazioak normalean dakarren prozesuen eraginkortasuna galera saihestea du helburu. Sare hoenen ezaugarriak 16 GB/s-etako banda-zabalera bi-direkzionala, latentzia baxua eta fat tree topologia dira. Tianhe-2 superkonputagailuak Linuxen oinarritutako eta HPCrako optimizatutako Kylin sistema eragilea darama, NUDT-n garatutakoa baita ere. Linux sistema eragilean oinarrituta egoteak erraztasun asko ematen ditu programak exekutatzeko berprogramazio beharrik ez dagoelako, Linux oso zabalduta baitago zientzia arloan.

Tianhe-2 superkonputagailuak 17.8 MW elektrizitate erabiltzen ditu, gutxi gorabehera 27.000 familiek behar duten elektrizitatea. Hala ere energetikoki oso ordenagailu eraginkorra da FLOPS asko egiten baititu watio bakoitzeko, 1,902 MFLOPS/W hain zuzen ere. Izan ere energetikoki munduko ordenagailu eraginkorren green500.org zerrendan 32. postuan dago.

NUDT-ren arabera Tianhe-2 superordenagailua simulazioentzako, analisientzako eta nazioaren segurtasunerako erabiliko da.

Datu adierazgarrienak

Marka eta modeloa
Diseinu propioa
Kore kopurua
3.120.000: 384.000 xeon kore eta 2.736.000 Phi kore
Procesagailua Intel Xeon E-2692, 12 koretakoa 2.2 GHz-etara
Coprocesador Intel Phi 31S1P, 57 koretakoa a 1,1 GHz-etara
Interconexioa TH Express-2
Sistema eragilea Kylin
FLOPS teorikoak 54,9 PetaFLOPS
FLOPS Linpack 33.9 PetaFLOPS
Potentzia elektrikoa 17.8 MW
FLOPS/W 1.9 GigaFLOPS

HPC

Munduko superordenagailurik boteretsuena

August 19th, 2015

top500.org munduko ordenagailu ahaltsuenen zerrendaren arabera, superkonputazioaren ahalmena geldialdi batean sartu da azkeneko urteotan. Gaur egun, 2015ko Ekaina, Tianhe-2 (Esne-bidea-2) da zerrendako superkonputagailu ahaltsuena. Superkonputagailu hau, Txinako National University of Defense Technology (NUDT) unibertsitatean dago eta Tianhe-1A superkonputagailuaren ondorengo teknologikoa da, azken hau, Tianjingo National Supercomputer Center-ean dago eta 2010ko Azaroko top500.org zerrendan lehengo postua lortu zuen jada.

500. superodenagailua top500 zerrendanErraldoi hau hobeto ezagutu baino lehenengo, esan behar da sei hilabetean behin eguneratzen den zerrenda honetan, Tianhe-2 superkonputagailuak bi urte eta erdi jarraian daramatza lehenengo lekuan eta Earth Simulator ordenadore japoniarrak ezarritako marka berdindu duela (2002-2004). Hau oso arraroa da ordenagailuak oso azkar garatzen direlako, baina ez da ordenagailu ahaltsuagorik eraiki azkeneko bi urtetan. Ez da datu hau ordea kezkatzeko modukoa den bakarra. Zerrendako lehenengo hamar ordenagailuen artean, azkeneko 4 zerrendetan bakarrik 4 berri sartu dira eta top10-ean agertzen diren superordenagailuen “adinak” ez du aurrekaririk. Geldialdi hau argi eta garbi erakusten duen beste datu bat, zerrendako azken tokian dagoen superkonputagailuaren potentzia da, 500. ordenagailua alegia. Honen ahalmena, zerrendako lehenen ordenagailuarenak baino joera askoz leunagoa edo jarraiagoa du. 1994-2008 tartean, potentzia honek urtero %90-eko hazkundea izan du, baina azken 6 urtetan %55-ekoa izan da soilik. Azken zerrenda honetan efektu hau 500 ordenagailuen potentziaren gehiketan argi nabarmentzen da ere. Orain arte mantentzen zuen aurreko joera top10 ordenagailuei ezker. Espero dugu tendentzia hau haustea hurrengo urtetan, izan ere, EEBB-tako gobernuak CORAL proiektua jarri du martxan, “exascale” bidean (exaFLOPS=1000 PFLOPS), bi superodenagailu eraikitzeko 2017 urte inguran eta 235 milioi dolarrekin finantziatu du 100 milioi gehituz garapen teknologikorako. Gainera Obama presidenteak 5 urteko ekimen presidentziala sortu du superkonputazioa bultzatzeko.

Tianhe-2 Supercomputer

Tianhe-2 Supercomputer

Tianhe-2 16.000 nodoz osatuta dago. Nodo bakoitza bi Intel Xeon Ivy Bridge prozesagailuz eta 3 Xeon Phi koprozesagailuz, kalkulu matematikoak azkartzeko, osatuta dago. Guztira 3.120.000 kalkulu kore ditu, 384.000 Xeon koretan eta 2.736.000 Phi koretan banatuta. Hiru koprozesadore izatea berrikuntza bat da ohikoena 1 edo 2 izatea baitzen orain arte. Kore kopuru izugarri honek aplikazio matematikoak egikaritzeko 54,9 PFLOPS-etako (Peta=10^15 Floating-point Operation Per Second, 1.000.000.000.000.000 segundo bakoitzeko operazio kopurua) errendimendu teorikoa ematen dio eta LINPACK benchmarka egikaritzean 33,9 PFLOPS-etara iristen da, ia zerrendako aurreko lehenengo postua zuen Titan superordenagailuaren bikoitza.

Nodoen arteko komunikazio sarea beraiek diseinatutako elektronika darama, TH Express-2, eta komunikazioak normalean dakarren prozesuen eraginkortasuna galera saihestea du helburu. Sare hoenen ezaugarriak 16 GB/s-etako banda-zabalera bi-direkzionala, latentzia baxua eta fat tree topologia dira. Tianhe-2 superkonputagailuak Linuxen oinarritutako eta HPCrako optimizatutako Kylin sistema eragilea darama, NUDT-n garatutakoa baita ere. Linux sistema eragilean oinarrituta egoteak erraztasun asko ematen ditu programak exekutatzeko berprogramazio beharrik ez dagoelako, Linux oso zabalduta baitago zientzia arloan.

Tianhe-2 superkonputagailuak 17.8 MW elektrizitate erabiltzen ditu, gutxi gorabehera 27.000 familiek behar duten elektrizitatea. Hala ere energetikoki oso ordenagailu eraginkorra da FLOPS asko egiten baititu watio bakoitzeko, 1,902 MFLOPS/W hain zuzen ere. Izan ere energetikoki munduko ordenagailu eraginkorren green500.org zerrendan 32. postuan dago.

NUDT-ren arabera Tianhe-2 superordenagailua simulazioentzako, analisientzako eta nazioaren segurtasunerako erabiliko da.

Datu adierazgarrienak

Marka eta modeloa
Diseinu propioa
Kore kopurua
3.120.000: 384.000 xeon kore eta 2.736.000 Phi kore
Procesagailua Intel Xeon E-2692, 12 koretakoa 2.2 GHz-etara
Coprocesador Intel Phi 31S1P, 57 koretakoa a 1,1 GHz-etara
Interconexioa TH Express-2
Sistema eragilea Kylin
FLOPS teorikoak 54,9 PetaFLOPS
FLOPS Linpack 33.9 PetaFLOPS
Potentzia elektrikoa 17.8 MW
FLOPS/W 1.9 GigaFLOPS

HPC

Tianhe2 continues as the most powerfull supercomputer of the world

August 19th, 2015

According to the well known top500.org, the list of the most powerful supercomputers in the world, the supercomputing speedup is suffering an slow down. Today June 2015, the world’s most powerful computer is Tianhe-2 (Milky Way-2). This supercomputer is installed at the National University of Defense Technology (NUDT) in China and is the technological successor of Tianhe-1A, installed at the National Supercomputer Center in Tianjin, that was already No. 1 on the top500 list in November 2010.

#500 supercomputer in top500Before describing this colossus we must to mention that is the 5th consecutive time it is positioned in the first place of the top500.org semestral list, 2 years and a half, and equals the Japanese Earth Simulator(2002-2004) milestone in the list. This is an atypical situation. The computing evolves very fast and it is abnormal that in 2 years there has not been a faster supercomputer. But this is not the only worrying fact. In 4 last lists there have been only 4 new systems in the top 10 positions of the list and the age of this top 10 systems has not precedents. In addition, the power of the 500th system, the last one of the list, which use to have a smoother and continuous behavior, increased a 90% a year in the 1994-2008 range, but in the last 6 years it has dropped to a just 55% of increase. In this last list also the sum of the power of all the systems starts dropping, it mantained for few years more the previous grow due to the push of the top10 computers. We hope this tendency will be broken soon thanks to the CORAL project in which the USA government is going to fund with 235 million dollars to build 2 pre-exascale (exaFLOPS=1000 PFLOPS) systems and with other 100 million dollars for technological development. In addition, the president Obama has launched a presidential 5 year program to boost the supercomputing.

Tianhe-2 Supercomputer

Tianhe-2 Supercomputer

Tianhe-2 consists of 16,000 nodes each of which has 2 Intel Xeon Ivy Bridge processors and 3 Xeon Phi co-processors, to accelerate the calculations. This makes a total of 3,120,000 computing cores, 384,000 Xeon Ivy Bridge cores and 2,736,000 cores in the Phi coprocessors. It is a novelty that it has 3 coprocessors per node, usually there are one or two. These cores provide to Tianhe-2 a theoretical peak performance running mathematical operations of 54.9 PFLOPS (Peta = 10 ^ 15 Floating-point Operation Per Second, 1,000,000,000,000,000 mathematical operations per second) and with the LINPACK benchmark the real performance reaches 33 .9 PFLOPS, which almost doubles the previous record of the Titan supercomputer located at the Oak Ridge National Laboratory (ORNL) in Tennessee.

The interconnection network between the nodes, namely TH Express-2, is a proprietary design firstly installed in Tianhe-2 and aims to avoid communications bottlenecks implementing a bidirectional bandwidth of 16 GB/s, low latency and fat tree topology. Tianhe-2 uses the Kylin operating system based on Linux optimized for high performance computing and also developed by the NUDT. Being based on a standard Linux Kylin gives great flexibility to run many codes without specifically reprogram them.

Tianhe-2 consumes 17.8 MW, which is roughly equivalent to the consumption of 27,000 families. Nevertheless, it is energetically a very efficient supercomputer given the high number of FLOPS per watt it obtains. In the green500.org list of the energetically most efficient computers of the world Tianhe-2 is in the 32th position with 1,902 MFLOPS/W.

According to the NUDT Tianhe-2 will be dedicated to simulations, analysis and national security.

Most outstanding data

Model Own design
Nunber of cores
3.120.000: 384.000 Xeon Ivy-Bridge cores y 2.736.000 Phi cores
Processor Intel Xeon E-2692 with 12 cores at 2.2 GHz
Coprocessor Intel Phi 31S1P with 57 cores at 1,1 GHz
Interconection TH Express-2
Operative System Kylin
Theoretical FLOPS
54,9 PetaFLOPS
Linpack FLOPS
33.9 PetaFLOPS
Electric power
17.8 MW
FLOPS/W 1.9 GigaFLOPS

 

 

HPC

The most powerful supercomputer of the world

August 19th, 2015

According to the well known top500.org, the list of the most powerful supercomputers in the world, the supercomputing speedup is suffering an slow down. Today June 2015, the world’s most powerful computer is Tianhe-2 (Milky Way-2). This supercomputer is installed at the National University of Defense Technology (NUDT) in China and is the technological successor of Tianhe-1A, installed at the National Supercomputer Center in Tianjin, that was already No. 1 on the top500 list in November 2010.

#500 supercomputer in top500Before describing this colossus we must to mention that is the 5th consecutive time it is positioned in the first place of the top500.org semestral list, 2 years and a half, and equals the Japanese Earth Simulator(2002-2004) milestone in the list. This is an atypical situation. The computing evolves very fast and it is abnormal that in 2 years there has not been a faster supercomputer. But this is not the only worrying fact. In 4 last lists there have been only 4 new systems in the top 10 positions of the list and the age of this top 10 systems has not precedents. In addition, the power of the 500th system, the last one of the list, which use to have a smoother and continuous behavior, increased a 90% a year in the 1994-2008 range, but in the last 6 years it has dropped to a just 55% of increase. In this last list also the sum of the power of all the systems starts dropping, it mantained for few years more the previous grow due to the push of the top10 computers. We hope this tendency will be broken soon thanks to the CORAL project in which the USA government is going to fund with 235 million dollars to build 2 pre-exascale (exaFLOPS=1000 PFLOPS) systems and with other 100 million dollars for technological development. In addition, the president Obama has launched a presidential 5 year program to boost the supercomputing.

Tianhe-2 Supercomputer

Tianhe-2 Supercomputer

Tianhe-2 consists of 16,000 nodes each of which has 2 Intel Xeon Ivy Bridge processors and 3 Xeon Phi co-processors, to accelerate the calculations. This makes a total of 3,120,000 computing cores, 384,000 Xeon Ivy Bridge cores and 2,736,000 cores in the Phi coprocessors. It is a novelty that it has 3 coprocessors per node, usually there are one or two. These cores provide to Tianhe-2 a theoretical peak performance running mathematical operations of 54.9 PFLOPS (Peta = 10 ^ 15 Floating-point Operation Per Second, 1,000,000,000,000,000 mathematical operations per second) and with the LINPACK benchmark the real performance reaches 33 .9 PFLOPS, which almost doubles the previous record of the Titan supercomputer located at the Oak Ridge National Laboratory (ORNL) in Tennessee.

The interconnection network between the nodes, namely TH Express-2, is a proprietary design firstly installed in Tianhe-2 and aims to avoid communications bottlenecks implementing a bidirectional bandwidth of 16 GB/s, low latency and fat tree topology. Tianhe-2 uses the Kylin operating system based on Linux optimized for high performance computing and also developed by the NUDT. Being based on a standard Linux Kylin gives great flexibility to run many codes without specifically reprogram them.

Tianhe-2 consumes 17.8 MW, which is roughly equivalent to the consumption of 27,000 families. Nevertheless, it is energetically a very efficient supercomputer given the high number of FLOPS per watt it obtains. In the green500.org list of the energetically most efficient computers of the world Tianhe-2 is in the 32th position with 1,902 MFLOPS/W.

According to the NUDT Tianhe-2 will be dedicated to simulations, analysis and national security.

Most outstanding data

Model Own design
Nunber of cores
3.120.000: 384.000 Xeon Ivy-Bridge cores y 2.736.000 Phi cores
Processor Intel Xeon E-2692 with 12 cores at 2.2 GHz
Coprocessor Intel Phi 31S1P with 57 cores at 1,1 GHz
Interconection TH Express-2
Operative System Kylin
Theoretical FLOPS
54,9 PetaFLOPS
Linpack FLOPS
33.9 PetaFLOPS
Electric power
17.8 MW
FLOPS/W 1.9 GigaFLOPS

 

 

HPC