Mar 16

Print this Post

Treading in Water, Eurohack day 3

As the title suggests, this day was the middle of no man's land. The teams started to have a grip on handling the GPU or exploiting it's hardware. At the same time, teams typically start to get a feeling how they need to change their project's source code to flexibly accommodate GPU hardware. The daily scrum contained again a lot of interesting discussions.

It started of with a report on the use and misuse of atomicAdd operations on data from GPU global memory. A handful of mentors immediately made suggestion on how to do atomicAdds more efficiently. Here is their bag of tricks:

  • play with block/grid sizes and check the device utilization in the presence of atomicAdd operations (lower block sizes might bring you better performance by higher occupancy)
  • this GTC talk on replacing atomic operations was recommended
  • also warp aggregated atomics may be of help as well
  • documentation on double precision atomic add emulated with atomicCAS

There was some discussion about comparisons of CPU performance numbers compared to GPU speeds especially on the Power8 systems we were using. One of the mentors suggested to take care of pinning the application to the socket of the Power8 system if you want maximal performance (using e.g. numactl -m0 -N0 <yourapp> to bind compute threads and memory allocations to socket 0). Using GCC 5.4.0 might also not a very good choice on Power8, although it is the highest GCC version supported by CUDA 8 to the best of our knowledge. The discussion boiled down to the fact you'd expect that the best performant binaries are only produced by the hardware vendor's compiler infrastructure. But often, these compilers lack support for C++11 and beyond, or choke on the slightest sign of C++ templates.

And the last tip of the day, always consider using -Minfo with the PGI compiler to infer more information on what the PGI compiler does with your code.

Permanent link to this article: https://gcoe-dresden.de/treading-in-water-eurohack-day-3/

1 comment

  1. Andreas Herten

    I like to use -Minfo=accel to limit the status output of the PGI compiler to only those parts relevant for OpenACC accelerations.

Comments have been disabled.