Skip to Content.
Sympa Menu

charm - [charm] Scalability issues using large chare array

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Scalability issues using large chare array


Chronological Thread 
  • From: Steve Petruzza <spetruzza AT sci.utah.edu>
  • To: charm <charm AT lists.cs.illinois.edu>
  • Subject: [charm] Scalability issues using large chare array
  • Date: Mon, 1 Aug 2016 15:44:38 +0300

Hi all,

In my application I have a single chare array in the main chare that creates thousands of chare tasks that eventually will execute some tasks and communicate between them (not all simultaneously).
If I run on 1024 cores I get the following at the startup:

Charm++> Running on Gemini (GNI) with 1024 processes
Charm++> static SMSG
Charm++> SMSG memory: 5056.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 1024,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-281-g8d5cdd9
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 64 unique compute nodes (16-way SMP).

Charm++> Warning: the number of SMP threads (32) is greater than the number of physical cores (16), so threads will sleep while idling. Use +\
CmiSpinOnIdle or +CmiSleepOnIdle to control this directly.

WARNING: +p1024 is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.

I’m running using:
aprun -n 1024 -N 16 ./charm_app +p1024 

and charm is built as: ./build charm++ gni-crayxe   smp  -j16  --with-production

If I add the +ppn16 (or 15 or less) to the charm_app the number of SMP threads multiply by that factor, so I don’t know how to remove that Warning (the number of SMP…).

By the way if I run some stats I see something like the following:

Charm Kernel Summary Statistics:
Proc 0: [11 created, 11 processed]
Proc 1: [0 created, 0 processed]
Proc 2: [0 created, 0 processed]
Proc 3: [0 created, 0 processed]

… all the others 0,0

Charm Kernel Detailed Statistics (R=requested P=processed):

         Create    Mesgs     Create    Mesgs     Create    Mesgs
         Chare     for       Group     for       Nodegroup for
PE   R/P Mesgs     Chares    Mesgs     Groups    Mesgs     Nodegroups
---- --- --------- --------- --------- --------- --------- ----------
   0  R         11         0        14         1         8      1024
      P         11      7732        14         2         8         0
   1  R          0         0         0         1         0         0
      P          0         0        14         2         0         1
   2  R          0         0         0         2         0         0
      P          0         0        14         3         0         0
   3  R          0         0         0         2         0         0
      P          0         0        14         3         0         0

… all the others like PE 1,2,3…

Is the chare 0 processing all the messages? Why? This does not look scalable. 
Infact when I go over 120K chares it crashes with segfault (_pmiu_daemon(SIGCHLD): [NID 16939] [c5-0c2s5n1] [Mon Aug  1 03:12:58 2016] PE RANK 975 exit signal Segmentation fault).

Am I building or running improperly?
How can I make sure that the chares are spread on more nodes and procs in order to avoid crazy memory allocation on a few nodes? 
Is there any strong coupling between the chare who creates a chare array and their actual execution nodes/procs? If I create more (smaller) chare arrays in the main chare at different execution times, instead of one large at the beginning, could it change anything?

Thank you,
Steve












Archive powered by MHonArc 2.6.16.

Top of Page