Feb 24

Print this Post

Performance Analysis Tools for Parallel Codes

For analyzing the performance of your algorithm on your computer you might use performance analysis tools
like gprof, nvprof, perf tools, etc. But how about analyzing code that runs on dozens or even thousands of heterogeneous CPU+GPU nodes in a cluster?

Besides of gathering all the performance metrics of each process, the communication between the processes must be analyzed as well. Of course, the analysis program should scale with the number of processes. Finally, you want to have a graphical interface to access all this data conveniently, so it becomes easy to find hotspots and inefficient program paths.

There are several parallel performance tools available.
In this post you get an Hands-On for Score-P, Cube, Vampir and CASITA.

Download Hands-On OpenACC Support in Score-P and Vampir

More informations on the tools:
Score-P is a highly scalable, easy-to-use tool for tracing, profiling and online analysis of applications on high performance compute systems. The trace files are stored by Score-P in the Open Trace Format Version 2 (OTF2) which other tools like Vampir, Scalasca or Casita can read. The profiles are written in the open Cube4 file format, which you can analyze with the free visualization tool Cube. Score-P also offers a runtime API for interactions during profiling process. Periscope is a scalable automatic performance analysis tool, which uses that online-access mode of Score-P.

Score-P Download page
Score-P Manual [PDF]

Vampir is a sophisticated analysis framework that enables to visualize the program behavior at any level of detail.
Link to Vampir Manual

Cube is a visualization tool for call-path profiles that are generated by Score-P and can be downloaded here:

Casita is a tool for identifying critical optimization targets in distributed heterogeneous applications. It performs an automatic analysis on Score-P trace files (the tool is also demonstrated in the Hands-On above).
https://github.com/rdietric/public_codes/raw/master/casita-1.4.1.tar.gz (BSD Open Source)

Permanent link to this article: https://gcoe-dresden.de/performance-analysis-tools-for-parallel-codes/

2 pings

  1. One Week of GPU Hacking as a Mentor » Dresden GPU Center of Excellence

    […] visualization were instrumental here and helped us pinpoint these hot spots within minutes (see this post) on how to profile with Score-P). Going further, we started to throw some `$!acc kernels` sections […]

  2. Score-P/Vampir Hands-On Training attracted over 30 attendees » Dresden GPU Center of Excellence

    […] Guido Juckeland presented the Hands-On Lab “L6104 – In-Depth Performance Analysis for OpenACC/CUDA®/OpenCL Applications with Score-P and Vampir” on Monday at the NVIDIA Graphics Technology Conference. This training on the rather specific topic attracted over 30 attendees.The 90 minute session included recording CUDA/MPI/OpenMP concurrently with Score-P and analyzing the data with Vampir. The session recording will be made available online by NVIDIA after the conference (and will be linked here). All material necessary for your own trial is already listed here. […]

Comments have been disabled.