Skip to Content.
Sympa Menu

charm - Re: [charm] Scalability issues using large chare array

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Scalability issues using large chare array


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Steve Petruzza <spetruzza AT sci.utah.edu>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Scalability issues using large chare array
  • Date: Mon, 1 Aug 2016 12:21:23 -0500

And here's the response to the other part of your message

On Mon, Aug 1, 2016 at 7:44 AM, Steve Petruzza <spetruzza AT sci.utah.edu> wrote:
In my application I have a single chare array in the main chare that creates thousands of chare tasks that eventually will execute some tasks and communicate between them (not all simultaneously).
 
By the way if I run some stats I see something like the following:

Charm Kernel Summary Statistics:
Proc 0: [11 created, 11 processed]
Proc 1: [0 created, 0 processed]
Proc 2: [0 created, 0 processed]
Proc 3: [0 created, 0 processed]

… all the others 0,0

Charm Kernel Detailed Statistics (R=requested P=processed):

         Create    Mesgs     Create    Mesgs     Create    Mesgs
         Chare     for       Group     for       Nodegroup for
PE   R/P Mesgs     Chares    Mesgs     Groups    Mesgs     Nodegroups
---- --- --------- --------- --------- --------- --------- ----------
   0  R         11         0        14         1         8      1024
      P         11      7732        14         2         8         0
   1  R          0         0         0         1         0         0
      P          0         0        14         2         0         1
   2  R          0         0         0         2         0         0
      P          0         0        14         3         0         0
   3  R          0         0         0         2         0         0
      P          0         0        14         3         0         0

… all the others like PE 1,2,3…

Is the chare 0 processing all the messages? Why? This does not look scalable. 
Infact when I go over 120K chares it crashes with segfault (_pmiu_daemon(SIGCHLD): [NID 16939] [c5-0c2s5n1] [Mon Aug  1 03:12:58 2016] PE RANK 975 exit signal Segmentation fault).

Am I building or running improperly?
How can I make sure that the chares are spread on more nodes and procs in order to avoid crazy memory allocation on a few nodes? 
Is there any strong coupling between the chare who creates a chare array and their actual execution nodes/procs? If I create more (smaller) chare arrays in the main chare at different execution times, instead of one large at the beginning, could it change anything?

To really understand what's going on, it would be helpful to see a bit of your code (particular, the outline of the .ci file(s) and the methods that are calling ckNew to instantiate additional objects). From what you've written already, here's what I'm comfortable saying.

A chare array is an enumerated collection of chare objects. Instances of a chare array inherently have their elements distributed over the many PEs in a parallel job. The baseline distribution of these objects can be changed from its default by providing an 'array map' from element index to home PE. The dynamic distribution of these objects is adjusted by the periodic dynamic load balancing strategies that users can request.

Non-array chare objects are created as 'seed' messages. The recipient of that message is either specified in the ckNew call, or defaults to the PE running the code making the call. These seed messages can be redistributed by 'seed balancing' strategies that users can choose to link. By default, no such strategy is enabled, because it would create overhead for the many applications that don't use it.

It's not clear whether the issue here is one of application design, or system usage. Perhaps the chare array elements should be called upon to work in parallel, without spawning ancillary chare objects beyond them. Or perhaps the chare objects are a sensible part of the design, and having them all created on PE 0 is a serial bottleneck that needs to be spread out.

Please follow up with more details, so we can help you further.

Phil



Archive powered by MHonArc 2.6.16.

Top of Page