Skip to Content.
Sympa Menu

charm - Re: [charm] Scalability issues using large chare array

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Scalability issues using large chare array


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Steve Petruzza <spetruzza AT sci.utah.edu>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Scalability issues using large chare array
  • Date: Mon, 1 Aug 2016 12:04:11 -0500

Hi Steve,

I'm going to address your message in two separate parts, because they deal with very different issues.

On Mon, Aug 1, 2016 at 7:44 AM, Steve Petruzza <spetruzza AT sci.utah.edu> wrote:
If I run on 1024 cores I get the following at the startup:

Charm++> Running on Gemini (GNI) with 1024 processes
Charm++> static SMSG
Charm++> SMSG memory: 5056.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 1024,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-281-g8d5cdd9
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 64 unique compute nodes (16-way SMP).

Charm++> Warning: the number of SMP threads (32) is greater than the number of physical cores (16), so threads will sleep while idling. Use +\
CmiSpinOnIdle or +CmiSleepOnIdle to control this directly.

WARNING: +p1024 is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.

I’m running using:
aprun -n 1024 -N 16 ./charm_app +p1024 

and charm is built as: ./build charm++ gni-crayxe   smp  -j16  --with-production

If I add the +ppn16 (or 15 or less) to the charm_app the number of SMP threads multiply by that factor, so I don’t know how to remove that Warning (the number of SMP…).

On Cray systems, the -n argument to the aprun command indicates how many processes the system should launch. This has a few implications, that all play into your observations
- Because aprun is doing the process launching rather than the charmrun utility that we'd use on a commodity cluster, there is nothing to process the +p argument. It would be meaningless in this context.
- An 'smp' build of Charm++ runs at least two threads in each process: a communication thread, and one or more worker threads. So, aprun launches 16 processes on each 16-core node, and each of those processes has two threads
- The +ppn argument sets the number of worker threads to spawn per process, and so multiplies the oversubscription as you noted

What you want is for aprun to launch a smaller number of processes (e.g. 1 or 2 per node) and for the Charm RTS to spawn threads on each core within the bounds of those processes. Here's what that would look like -
One process per node:
aprun -n 64 -N 1 ./charm_app +ppn15
Two processes per node:
aprun -n 128 -N 2 ./charm_app +ppn7

Very alternately, if your application would rather use every core for computation, without a dedicated communication thread or the benefit of shared-memory communication within each node, you could build without the smp option, and then run almost as you did initially:
aprun -n 1024 -N 16 ./charm_app

We're working on code to automate a lot of this process and thread launching tedium in a near-future release. I don't think we'll be able to soundly and automatically back down from explicitly commanded oversubscription, though.

Phil



Archive powered by MHonArc 2.6.16.

Top of Page