Skip to Content.
Sympa Menu

charm - [charm] Charm++ performance issue on CSCS PizDaint

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Charm++ performance issue on CSCS PizDaint


Chronological Thread 
  • From: "Bignamini Christopher" <christopher.bignamini AT cscs.ch>
  • To: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: [charm] Charm++ performance issue on CSCS PizDaint
  • Date: Tue, 8 Mar 2022 15:21:35 +0000
  • Accept-language: en-US, de-CH
  • Authentication-results: ppops.net; spf=pass smtp.mailfrom=christopher.bignamini AT cscs.ch; dmarc=none

Hello,

 

my name is Christopher Bignamini (CSCS-ETH Zurich) and I’m contacting you concerning a recent issue I’m having with NAMD/Charm++ on one of our systems.

We have a regression testsuite that we use to measure system performances and it includes some CPU/GPU NAMD tests, where we measure performances

of our NAMD/Charm++ installation (Intel based): after a recent upgrade of the system (not of the Charm++/NAMD version), performance degradation is present

for the CPU only version of NAMD in a 16 nodes run configuration while tests running on smaller set of nodes (6) and/or with GPU acceleration (same tests) do not show any issue.

By profiling the executable, I found a possible indication that the problem I am seeing could related to the interaction between Charm++ and our system network,

particularly through the following function:

 

https://charm.cs.illinois.edu/doxygen/charm/machine-common-core_8C-source.shtml#l01818

 

Results of my profiling runs show a completely different time usage between a 6 nodes and a 16 nodes job (percentage refers to sampling number):

6 nodes: 39.0% CmiGetNonLocalNodeQ
16 nodes: 51.6% CmiGetNonLocalNodeQ

with the amount of time spent in USER function (NAMD computation) that is quite reduced

6 nodes: 19.2% USER
16 nodes: 9.5% USER

I know that I am only providing general, high level information but I am still trying to understand what is the problem and I am a bit lost: do you have any suggestion

concerning where I can look in Charm++ code to understand what is reducing its performances? Any help is appreciated!

 

Thank you very much in advance for your attention, have a nice day!

Cheers
Christopher




Archive powered by MHonArc 2.6.19.

Top of Page