charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Many individual chares vs chare array

From: Phil Miller <mille121 AT illinois.edu>
To: Jozsef Bakosi <jbakosi AT gmail.com>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: Re: [charm] Many individual chares vs chare array
Date: Fri, 10 Jul 2015 14:51:25 -0500
List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

As a point of interest, features to do aggregation of non-reduction messages are available in Charm++ through a library called TRAM, which was the topic of Lukasz Wesolowski's PhD thesis. Right now, the interface to that library is quite 'raw', but we are adding support for an entry method attribute that would automatically deliver messages through TRAM.

If you'd like to test this functionality out, have a look at the current TRAM documentation [1] or the pending change that adds automation for TRAM [2]

Phil

[1] http://charm.cs.illinois.edu/manuals/html/libraries/5.html

[2] http://charm.cs.uiuc.edu/gerrit/454

On Fri, Jul 10, 2015 at 9:51 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:

Follow-up quesion:

Does the aggregation happen only with reductions, via, e.g., contribute( CkCallback( CkReductionTarget( Host, hostfn ), hostinstance )), or also with simply calling the non-reductiontarget member function, hostfn_noreduct(), from the workers? The host .ci file in that case would be

chare Host {
entry [reductiontarget] void hostfn();
entry void hostfn_noreduct();
};

Again, I suspect, the non-reduct will not aggregate, but I might be wrong. Can you calirfy?

On Thu, Jul 9, 2015 at 1:01 PM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
Thanks Phil, that's interesting. I guess that (at least partially) explains (I hope) the pretty unsatisfactory weak scaling behavior I'm getting with a simple particle (i.e., Monta Carlo) code.

Thanks for the clarification,
J

On Thu, Jul 9, 2015 at 12:35 PM, Phil Miller <mille121 AT illinois.edu> wrote:
You're exactly right - reductions locally combine the contributions of all chare array elements on each PE, and then in each process, and transmit a single message up a process tree to the root. At large machine scales, this isn't just faster, it's the difference between the code running at all, and crashing due to a message overload.

On Thu, Jul 9, 2015 at 1:10 PM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
Hi folks,

I suspect I know the answer to this question but I'd like some clarification on it.

What is the main difference between creating (a potentially large number of) individual chares and those calling back to a single host proxy or creating the workers instead as a chare array and using reduction. I assume the latter will do some kind of message aggregation under the hood (i.e., using a tree) and collect messages (in the form of an entry method arguments) from individual array elements and send only aggregated messages to the single host. Is this correct? If so, I guess, I should get better performance...

Thanks,
Jozsef

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm

[charm] Many individual chares vs chare array, Jozsef Bakosi, 07/09/2015
- Re: [charm] Many individual chares vs chare array, Phil Miller, 07/09/2015
  - Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/09/2015
    - Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015
      - Re: [charm] [ppl] Many individual chares vs chare array, Jonathan Lifflander, 07/10/2015
        
        Re: [charm] [ppl] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015
      - Re: [charm] Many individual chares vs chare array, Phil Miller, 07/10/2015
        
        Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015