Skip to Content.
Sympa Menu

charm - [charm] MPI Build of Charm++ on Linux Cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] MPI Build of Charm++ on Linux Cluster


Chronological Thread 
  • From: "Wang, Felix Y." <wang65 AT llnl.gov>
  • To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: [charm] MPI Build of Charm++ on Linux Cluster
  • Date: Tue, 24 Jul 2012 15:49:48 -0700
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello PPL,

I was wondering if you could give me a bit of insight as to why the MPI build of Charm++ on a linux cluster ('cab' at LLNL) experiences large gaps in the communication phases whereas on a bluegene/p architecture ('udawn' at LLNL), there are no such gaps and the computation proceeds as expected. See attached images for an illustration of what I mean by these gaps. The code that is being run in both these instances are identical, and is of a port of a shock hydrodynamics proxy application, LULESH, that is mostly computation heavy with a few instances where communication across domains is necessary.

I have tested both the net build, and the mpi-smp build on the same cluster, cab, and have found that these effects that you see with the mpi build are nonexistent, although there are other problems with much more than preferred idle time. In the case of the mpi-smp build, every once in a while, one of the PE's gets 'stuck', essentially blocking all ongoing communication for a reason that I have yet to understand.

Essentially, I'm wondering if anyone has noticed this kind of behavior on other clusters before, and what is needed to get rid of the behavior (e.g. Some sort of configuration parameter). Currently, I have tested the 'mpi' version of my code port with different compilers (both icc and gcc) as well as different versions of mvapich (1 and 2). Could it be the case that these implementations simply are not supported well by Charm++? What is the configuration that is typically used by the PPL group to test the MPI build of Charm++ during development?

Thanks,

--- Felix

Attachment: Cab64_Timeline.png
Description: Cab64_Timeline.png

Attachment: Udawn64_Timeline.png
Description: Udawn64_Timeline.png




Archive powered by MHonArc 2.6.16.

Top of Page