Aug 22

Print this Post

GCoE Presentations | August/September 2016

In “GCoE Presentations” we want to announce upcoming presentations from our GCoE members, this time for August and September 2016. Our highlight will be the GTC Europe in Amsterdam in September, 28-29.

GTC Europe, 28-29/09/2016, Amsterdam

Alpaka – One Programming Model for Parallel Kernel Acceleration of Heterogeneous systems.
Alexander Matthes
This session will dive deep into the Alpaka library for parallel kernel acceleration that provides a uniform abstract C++ interface to a range of parallel programming models such as CUDA and OpenMP. The talk will show how Alpaka can achieve platform and performance portability across various types of architectures by exploiting parallelism and memory hierarchies at all levels available in current hardware. See more on GTC EU website.

Hands-On Performance Analysis for OpenACC/CUDA/OpenCL/MPI/OpenMP Applications with Score-P and Vampir
Guido Juckeland
Participants will work with Score-P/Vampir to learn how to dive into the execution properties of CUDA and OpenACC applications. We’ll show how to use Score-P to generate a trace file and how to study it with Vampir. Additionally, we’ll use the newly established OpenACC tools interface to also present how OpenACC applications can be studied for performance bottlenecks. See more on GTC EU website.

Further Presentations

20th ADBIS, 28-31/08/2016, Prague
Limitations of Intra-Operator Parallelism using Heterogeneous Computing Resources
Tomas Karnagel
The hardware landscape is changing from homogeneous multi-core systems towards wildly heterogeneous systems combining different computing units, like CPUs and GPUs. To utilize these heterogeneous environments, database query execution has to adapt to cope with different architectures and computing behaviors. In this paper, we investigate the simple idea of partitioning an operator’s input data and processing all data partitions in parallel, one partition per computing unit. For heterogeneous systems, data has to be partitioned according to the performance of the computing units. We define a way to calculate the partition sizes, analyze the parallel execution exemplarily for two database operators, and present limitations that could hinder significant performance improvements. The findings in this paper can help system developers to assess the possibilities and limitations of intra-operator parallelism in heterogeneous environments, leading to more informed decisions if this approach is beneficial for a given workload and hardware environment. (Link)

VLDB PhD. Workshop, 09/09/16, New Dehli, India
Heterogeneity-Aware Query Optimization
Tomas Karnagel
The hardware landscape is changing from homogeneous systems towards multiple heterogeneous computing units within one system. For database systems, this is an opportunity to accelerate query processing if the heterogeneous resources can be utilized efficiently. For this goal, we investigate novel query optimization concepts for heterogeneous resources like placement granularity, execution estimation, optimization granularity, and data handling. In the end, we combine these concepts in a specialized optimization stage during query optimization together with a unique way of evaluating our optimizations in existing database systems .

European Materials Research Society (EMRS), 19-22/09/2016, Warsaw
Experimental-Scale Kinetic Lattice Monte-Carlo Studies on GPU
Jeffrey Kelling, Karl-Heinz Heinig, Sibylle Gemming
Micro- and nano-structured materials are crucial for future energy technologies. Key processes during production and life-time are governed by self-organization in phase separation processes at the micro and nano scale. […] Simulations of these out-of-equilibrium, inhomogeneous real world systems provide important insights, finding potential for optimization of structures and process parameters. To this end, kinetic lattice Monte Carlo simulations can be used to model physical systems at experimental scales in an atomistic way, thereby side-stepping many caveats connected with the alternative phase-field simulations. In this contribution, we present two massively parallel implementations for large-scale simulations on GPUs: One is optimized to offer fast time-to-solution on experimental-scale simulations [Eur. J. Phys.: Spec. Top. 210, 175 (2012)], the other provides highly efficient parameter studies or large sample sizes for large-scale simulations [Phys. Rev. E (2016) submitted]. Harnessing the compute power of modern (multi-)GPU installations leads to increased energy efficiency as well as reduced time-to-solution.

Perspectives of GPU computing in Science, 26-28/09/2016, Rome
Pushing the Limits of Lattice Monte-Carlo Simulations using GPUs (Invited Talk)
Jeffrey Kelling
Lattice Monte-Carlo methods are used to study out-of- and towards-equilibrium systems, like surface growth, spin systems and even phase separation in solid mixtures using kinetic Metropolis lattice Monte-Carlo (KLMC). Applications range from the study of universal scaling or aging behaviors to concrete systems, where coarsening of nanocomposites or self-organization of functional nanostructures is relevant, for example spinodal decomposition in solar cell absorber layers. In these systems, scaling needs to be followed for long times to allow structures to grow over orders of magnitude, which requires large-scale simulations. For the evolution of nanostructures, atomistic simulations at experimental spatiotemporal scales are often desired.
This talk will give an overview over a variety of lattice Monte-Carlo algorithms, which have been found or made suitable for implementation on GPUs: Stochastic cellular automata can be implemented very efficiently [1-3] and are suitable for many systems. The efficient implementation of random sequential dynamics is more challenging. Solutions will be presented for a dimer lattice gas mapped to surface growth [4,5] and KLMC [6]. The latter was also extended to implement dynamics driven by ion-beam mixing triggering long-range interactions. However, these implementations hinge on the fact, that only a very small number of states need to be encoded at each lattice site. A more flexible implementation, employing a variation of multisurface-coding to enable vectorization, will be presented for simulations of restricted solid-on-solid and Potts models with random sequential dynamics [7].
[1] Block, B., Virnau, P., Preis, T.: Comp. Phys. Comm. 181(9), 1549 (2010)
[2] Lulli, M., Bernaschi, M., Parisi, G.: Comp. Phys. Comm. 196, 290 (2015)
[3] Kelling, J., Ódor, G., Gemming, S.: 2016 IEEE Int. Conf. Intell. Eng. Syst., arXiv:1606.00310 (2016)
[4] Kelling, J., Ódor, G.: Phys. Rev. E 84, 061150 (2011)
[5] Ódor, G., Kelling, J., Gemming, S.: Phys. Rev. E 89, 032146 (2014)
[6] Kelling, J., Ódor, G., Nagy, M. F., Schulz, H., Heinig, K.: EPJST 210, 175 (2012)
[7] Kelling, J., Ódor, G., Gemming, S.: arXiv:1605.02620 (2016)

12th International Workshop on Boolean Problems (IWSBP), 22-23/09/2016, Freiberg
Multi-GPU Approximation Methods for Silent Data Corruption of AN Codes
Matthias Werner, Till Kolditz, Tomas Karnagel, Dirk Habich, Wolfgang Lehner
Multi-bit flip rates are assumed to increase dramatically with future transistor technologies, especially in computer DRAM. The silent data corruption probability of a code is determined by its distance distribution, whose computational complexity is exponential for non-linear codes like AN coding. We provide exact and approximation
algorithms for computing the distance distribution on GPUs for AN codes.

Permanent link to this article: https://gcoe-dresden.de/gcoe-presentations-augustseptember-2016/