«

»

May 02

Print this Post

GPU Workshop program released

The agenda for the upcoming workshop – seating is limited, please register in advance:movie The Bye Bye Man streaming

8:30-8:45 Prof. Dr. Wolfgang E. Nagel (ZIH) – Welcome
8:45-9:15 Dominik Friedrich (Bull) – Bullx Supercomputer HRSK-II, Phase 1
9:15-9:45 NVIDIA – GPU Computing with NVIDIA’s Kepler Architecture

9:45-10:15 Coffee Break

10:15-11:00 Thomas William (ZIH) – IUmd – Performance Analysis of a Molecular Dynamics Code
11:00-11:45 Dr. Michael Bussmann (HZDR) – Petaflop Plasma Physics: Pursuing Particle Paths

11:45-13:00 Lunch Break

13:00-13:30 Prof. Dr. Torsten Hoefler (ETH Zuerich) – A Study of the Alternative: Modeling Communication in Cache-Coherent SMP Systems – A Case-Study with Xeon Phi
13:30-14:00 Dr. Yannis Kaleidzidis (MPI-CBG) – Integration of OpenCL into Pluk-based integro-differetial equation fitting software FitModelPDE2 and its application for learning microscopic kinetic characteristic of endosomal network by quantitative analysis of snap-shot microscopy images
14:00-14:30 Robert Hoppe (TU Dresden) – Making Ptychography a real-time microscopy technique

14:30-15:00 Coffee Break

15:00-15:30 Prof. Dr. Dirk Pleiter (FZ Juelich) – NVIDIA Application Lab at Juelich
15:30-16:00 Robert Dietrich (ZIH) – GPU Performance Analysis – current tools and tool interfaces
16:00-16:30 Dr. Peter Gottschling (SimuNova & IWR) – The CUDA MTL4: Productive Scientific Programming on GPGPUs

Abstracts

Dominik Friedrich (Bull) – Bullx Supercomputer HRSK-II, Phase 1

An architectural overview of the new Bullx Supercomputer HRSK-II, phase 1 plus an outlook on phase 2.

NVIDIA – GPU Computing with NVIDIA’s Kepler Architecture

Computational researchers, scientists and engineers are rapidly shifting to computing solutions running on GPUs as this offers significant advantages in performance and energy efficiency. The talk will give an overview about the latest Kepler GPU architecture including new features like Hyper-Q, Dynamic Parallelism and GPUDirect RDMA. In addition NVIDIA’s parallel computing platform with the different approaches to use and program the GPUs will be presented.

Thomas William (ZIH) – IUmd – Performance Analysis of a Molecular Dynamics Code

IUmd is a highly diverse molecular dynamics application for the simulation of certain physical properties of neutron stars and white dwarfs. It can be compiled as a serial, OpenMP, MPI or hybrid program. The configuration of available code blocks, compiler flags and runtime parameters for a given architecture create a vast parameter space to be evaluated. The talk will start with the serial code analysis to provide the best candidates for parallel parameter sweeps using different MPI/OpenMP settings. Using PAPI counters and applying the Vampir toolchain, a thorough analysis of the performance behaviour is done. The resulting changes in the OpenMP part of the code, yielding higher parallel efficiency, are then compared to new GPGPU code versions that are currently in development. The talk will show first results obtained using PGI CUDA Fortran and OpenACC (CAPS/PGI).

Dr. Michael Bussmann (HZDR) – Petaflop Plasma Physics: Pursuing Particle Paths

We present first results on radiative signatures of the Kelvin-Helmholtz Instability in relativistic streams. Large-scale simulations enable us to follow particle trajectories through turbulent plasma flows by computing the radiation of the charged particles inside the plasma. This allows imaging the plasma particle flow with unprecedented resolution by detecting the far field radiation coming from the plasma. Our results bring forward the understanding of the complex dynamics of plasmas especially in situations where direct plasma probing is impossible.

Prof. Dr. Torsten Hoefler (ETH Zuerich) – A Study of the Alternative: Modeling Communication in Cache-Coherent SMP Systems – A Case-Study with Xeon Phi

Most multi-core and some many-core processors implement cache coherency protocols that heavily complicate the design of optimal parallel algorithms. Communication is performed implicitly by cache line transfers between cores, complicating the understanding of performance properties. We developed an intuitive performance model for cache-coherent architectures and demonstrate its use with the currently most scalable cache-coherent many-core architecture, Intel Xeon Phi. Using our model, we develop several optimal and optimized algorithms for complex parallel data exchanges. All algorithms that were developed with the model beat the performance of the highly-tuned vendor-specific Intel OpenMP and MPI libraries by up to a factor of 4.3. The model can be simplified to satisfy the tradeoff between complexity of algorithm design and accuracy. We expect that our model can serve as a vehicle for advanced algorithm design, similar to established network models such as LogP.

Dr. Yannis Kaleidzidis (MPI-CBG) – Integration of OpenCL into Pluk-based integro-differetial equation fitting software FitModelPDE2 and its application for learning microscopic kinetic characteristic of endosomal network by quantitative analysis of snap-shot microscopy images

Endocytosis is a process of nutrient and signalling molecules internalization by eukaryotic cells. Endosomes are small intercellular organelles that form a dynamic network by exchanging and redistributing cargo in cascades of fusion and fission reactions. Previously we have developed integro-differential model to derive integral cargo traffic properties from dynamic characteristics of individual endosomes [1]. Fitting this model to experimental data allowed learn microscopic kinetic characteristic of endocytic network from set of snap-shot microscopic images. The inverse problems are, in general, computationally demanding, since on every step of fitting procedure the integro-differetial equation in partial derivatives has to be integrated. We developed software FitModelPDE2 (based on Pluk, C++ and OpenCL) for fitting user-defined integro-PDE model to experimental data. The software generates on-fly OpenCL code for the most time-consuming part of calculation and executes it on GPU. High efficiency of GPU computing allowed us to expand model by “free-shape-functions” for finding unknown dependency of endocytic network that accompany cargo progression.

Robert Hoppe (TU Dresden) – Making Ptychography a real-time microscopy technique

Ptychographic imaging is a scanning microscopy technique using coherent radiation. Thereby the object is scanned throughout a confined illumination and a far-field diffraction pattern is taken at each scan point. A direct interpretation of the far-field image set is not possible, in fact a calculation intense algorithm must be applied to the data to retrieve the images of the object and illumination. During the past few years ptychography has become a routinely used microscopy method. But on the fly data analysis was not possible, so far. Porting the reconstruction algorithm to GPU based computing structures decreased the reconstruction time by orders of magnitude. A quasi real-time reconstruction is possible in this way and nowadays ptychography is an online microscopy method.

Prof. Dr. Dirk Pleiter (FZ Juelich) – NVIDIA Application Lab at Juelich

In 2012 Forschungszentrum Jülich and NVIDIA established a new model for collaboration on enabling scientific applications for GPU computing and enhancing GPU-based HPC technologies and architectures. The work of this lab is mainly driven by the work on a variety of applications from different research areas ranging from life sciences to radio astronomy. We will highlight a few of these applications to discuss opportunities and challenges of using GPUs for large-scale scientific high-performance computing.

Robert Dietrich (ZIH) – GPU Performance Analysis – current tools and tool interfaces

Although hardware technology providers are constantly improving software support for their products, it is still fairly complex to efficiently program processors like GPGPUs. Furthermore the heterogeneous composition of computing systems makes their software development more challenging, as additional programming models for accelerators have to be applied. To get over the complexity of software development for those systems, it is mandatory to use the capabilities of available tools as efficiently as possible. In the area of performance analysis there are several vendor specific and third-party tools with their individual strengths and weaknesses which should be investigated.

Dr. Peter Gottschling (SimuNova & IWR) – The CUDA MTL4: Productive Scientific Programming on GPGPUs

The Matrix Template Library v4 has been proven to provide high performance on different platforms while — maybe even more importantly — allowing for high productivity in the development process. The intuitive notation provides an easy entry level and quick programming progress while scientists do not need to invest their time into deep technical details. The CUDA version of MTL4 is designed with the goal to enable the same productivity on GPGPUs while allowing for maximal performance. MTL4 has the same interface on GPUs as on CPUs. Thus, all MTL4 applications can use CUDA acceleration without program modifications. Since not all operations are already supported by CUDA yet, the library statically selects the appropriate processing platform. We will show several program examples and give some insight into the implementation.

Permanent link to this article: https://gcoe-dresden.de/gpu-workshop-program-released/

1 ping

  1. GPU Workshop, Training, and Hands-On Session at ZIH » Dresden CUDA Center of Excellence

    […] GPU Workshop program released » […]

Comments have been disabled.