charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Many individual chares vs chare array

From: Jozsef Bakosi <jbakosi AT gmail.com>
To: Jonathan Lifflander <jliffl2 AT illinois.edu>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: Re: [charm] [ppl] Many individual chares vs chare array
Date: Fri, 10 Jul 2015 14:24:47 -0600
List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Ok, I'm not sure if I was very clear in my question, but I think you are implicitly answering it. Let me re-phrase it to see if I understand it:

Only those calls to a single host from array elements get aggregated that originate via a contribute call, i.e., an explicit reduction to a reduction target. Those entry method calls that are simply calls from workers to the host are not aggregated.

Considering Phil's answer about TRAM (in the other email) this blurs a picture a bit, because it is possible to use TRAM to aggregate non-reduction calls from array elements (that can even take the network topology into account). Another (future) way will be via the [aggregate] keyword.

Correct?

If so, it seems like I will now have to write a new reduction type with a custom reduction function that can aggregate std::vectors...based on which then I have another question: I only see using POD data types used as CkReductionMsg passed to a custom reduction function in the manual as well as in the barnes-charm example. What I need to do, though, is to aggregate vectors of different sizes. I'm basically estimating a histogram to which each worker chare contributes a different number of counters that are collected on a host for a final histogram, I believe this is similar to the histogram_group example. Actually, now that I take a closer look at that example, it is pretty much what I need, since in there the HistogramMerger is a chare group (one instance per PE), collecting contributions from the Histogram chare array (potentially many more than one per PE).

Well, thanks for the help. You guys are awesome!

On Fri, Jul 10, 2015 at 1:45 PM, Jonathan Lifflander <jliffl2 AT illinois.edu> wrote:

The "reductiontarget" keyword enables the annotated entry method to be
the target of a reduction (the method that is called when the
reduction is finished). It does not change how aggregation happens
under the hood.

In order to get a reduction tree + aggregation, you need to use a chare array:

array [1d] { ... }

When you call contribute from the elements of the chare array, a
reduction tree is used along with local aggregation of individual
contributions inside the node.

Calling contribute from an array requires that all the members
contribute. You will need to create a section if that is not the case.

Thanks,

Jonathan

On 10 July 2015 at 09:51, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
> Follow-up quesion:
>
> Does the aggregation happen only with reductions, via, e.g., contribute(
> CkCallback( CkReductionTarget( Host, hostfn ), hostinstance )), or also with
> simply calling the non-reductiontarget member function, hostfn_noreduct(),
> from the workers? The host .ci file in that case would be
>
> chare Host {
> entry [reductiontarget] void hostfn();
> entry void hostfn_noreduct();
> };
>
> Again, I suspect, the non-reduct will not aggregate, but I might be wrong.
> Can you calirfy?
>
> On Thu, Jul 9, 2015 at 1:01 PM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
>>
>> Thanks Phil, that's interesting. I guess that (at least partially)
>> explains (I hope) the pretty unsatisfactory weak scaling behavior I'm
>> getting with a simple particle (i.e., Monta Carlo) code.
>>
>> Thanks for the clarification,
>> J
>>
>> On Thu, Jul 9, 2015 at 12:35 PM, Phil Miller <mille121 AT illinois.edu>
>> wrote:
>>>
>>> You're exactly right - reductions locally combine the contributions of
>>> all chare array elements on each PE, and then in each process, and transmit
>>> a single message up a process tree to the root. At large machine scales,
>>> this isn't just faster, it's the difference between the code running at all,
>>> and crashing due to a message overload.
>>>
>>> On Thu, Jul 9, 2015 at 1:10 PM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> I suspect I know the answer to this question but I'd like some
>>>> clarification on it.
>>>>
>>>> What is the main difference between creating (a potentially large number
>>>> of) individual chares and those calling back to a single host proxy or
>>>> creating the workers instead as a chare array and using reduction. I assume
>>>> the latter will do some kind of message aggregation under the hood (i.e.,
>>>> using a tree) and collect messages (in the form of an entry method
>>>> arguments) from individual array elements and send only aggregated messages
>>>> to the single host. Is this correct? If so, I guess, I should get better
>>>> performance...
>>>>
>>>> Thanks,
>>>> Jozsef
>>>>
>>>> _______________________________________________
>>>> charm mailing list
>>>> charm AT cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>>
>>>
>>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>

[charm] Many individual chares vs chare array, Jozsef Bakosi, 07/09/2015
- Re: [charm] Many individual chares vs chare array, Phil Miller, 07/09/2015
  - Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/09/2015
    - Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015
      - Re: [charm] [ppl] Many individual chares vs chare array, Jonathan Lifflander, 07/10/2015
        
        Re: [charm] [ppl] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015
      - Re: [charm] Many individual chares vs chare array, Phil Miller, 07/10/2015
        
        Re: [charm] Many individual chares vs chare array, Jozsef Bakosi, 07/10/2015