Mar 15

Print this Post

Our feet are wet, Eurohack day 2

Day 2 of the hackathon went by fast and filled with progress. This blog post tries to summarize the events and findings that were presented in the daily scrum.

One team resolved a thread divergence on the device by reducing this conditional bifurcation:

if((s > edge[0]) ^ (s > edge[2])
    e = -(a + f - M[s]) * sgn / 2;
    e =  (a + f - M[s]) * sgn / 2;

by replacing it with:

e =  (a + f - M[s]) * sgn / 2;
if((s > edge[0]) ^ (s > edge[2])
    e = -e;

This modification removed a 99% thread divergence reported by nvvp, which resulted in a 12% performance improvement according to the team.

Another team observed a very low occupancy of their GPU device code as it required to access an array of random numbers from global memory. They switched to generating the random numbers on device instead of accessing an array of precomputed random numbers. This increased their occupancy from 14% to 83%. Related to this, there was an interesting discussion on our Eurohack slack channel about good random number generators for GPUs. Here are libraries that were suggested besides cuRAND:

So people were benchmarking those to find the ideal balance between performance and quality of random numbers, which is project dependent. Feel free to share your experiences with these in the comments section below.

Other teams reported that they spent the day improving their data structures to become GPU ready, think Array-of-Structures versus Structure-of-Arrays.

My team started to port parts of this blog post related to an CUDA aware MPI implementation of a Jacobi solver that is quite relevant for them. We so far replaced one function that runs the Jacobi stencil on the device by a thrust based implementation using thrust::for_each. If people are interested in the code, please post a comment below and I can share it.

Permanent link to this article: https://gcoe-dresden.de/our-feet-are-wet-eurohack-day-2/