charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [EXTERNAL] Re: Question on quiescence detection mechanism

From: Phil Miller <mille121 AT illinois.edu>
To: "Kolla, Hemanth NMN" <hnkolla AT sandia.gov>
Cc: "Kale, Laxmikant V" <kale AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: Re: [charm] [EXTERNAL] Re: Question on quiescence detection mechanism
Date: Wed, 27 Jul 2016 11:33:37 -0500

Hi Hemanth,

Depending on your purposes and your application's overall architecture, the choice between a reduction and QD has a few factors to consider.

The first is of course performance - whether the phases of execution separated by these synchronizing operations run faster with one or the other. There should not be a large difference between them (in their current implementations), but it may be worthwhile for us to measure in a microbenchmark.

A second, bigger picture consideration is whether you intend to embed this code in a larger application, that may have other work executing concurrently. In that case, if you use QD, then it won't re-activate your code until all of that concurrent work is finished as well. This was an issue in the EpiSimdemics application, where the "other work executing concurrently" was actually replicas of the same code running different but related simulation scenarios. This concern is described briefly in section IV(B) of this paper from IPDPS 2014:
https://e-reports-ext.llnl.gov/pdf/768650.pdf

Note that because their use cases required an accounting that all messages sent were received (which a single reduction can't give you, in the general case of unknown message counts), they used the intermediate 'Completion Detection' library mechanism in Charm++ instead.

More broadly, if the reduction is just a signalling mechanism that carries no data, you might consider whether any synchronization at all is necessary at that point in execution. It may instead be preferable to structure the control flow and message handling of each individual object to keep processing in the right order. While the cost of the synchronization operation is pretty low (milliseconds at petascale), local coordination avoids the cascading impact of noise, communication delays, load imbalance, and many other larger performance impediments. SDAG code in your objects is potentially helpful here. Jonathan can probably point you in the right direction on this account.

Phil

On Tue, Jul 26, 2016 at 10:14 PM, Kolla, Hemanth NMN <hnkolla AT sandia.gov> wrote:

Thanks for the explanation Prof. Kale. I did get a clarification from Jonathan (luckily we are office neighbours), and your explanation adds more depth to the concept, which is helpful. In my application I initially had reductions the target of which was the entire chare array (sort of like an all_reduce) but that maybe overkill and adding unnecessary phases of idle time, so I'm exploring the possibility of replacing the global reductions with QD.

Best,

Hemanth.

Sent from my iPad

On Jul 26, 2016, at 6:44 PM, Kale, Laxmikant V <kale AT illinois.edu> wrote:

CkStartQD can be called from any place, not just main chare. It just starts a distributed asynchronous algorithms (that runs concurrently with the application) for detecting this condition, and when detected, informs the application via the callback you selected. Only one call to CkStartQD should be made. So it is sometimes convenient to call it from main chare.. But you can call it, for example, from a particular element of a chare array. It doesn’t need to be called after everyone has reached some state. ( but if you need that, start it after a reduction). The system is *always* doing the local bookkeeping needed to detect quiescence. Your triggering via CkStartQD simply asks it to run the distributed algorithm and notify via callback.

Now, if you have a situation where every chare has reached a certain point and then you want to be notified, a reduction might be the simplest thing to do. (which is what it sounded like from your description). But I suspect you have messages in flight even after all chares have reached a certain point in their lifecycle and so need QD.

Hoe that helps.

Laxmikant (Sanjay) Kale   http://charm.cs.uiuc.edu

Professor, Computer Science     kale AT illinois.edu

201 N. Goodwin Avenue           Ph:  (217) 244-0094

Urbana, IL  61801-2302

From: "Kolla, Hemanth NMN" <hnkolla AT sandia.gov>
Reply-To: "Kolla, Hemanth NMN" <hnkolla AT sandia.gov>
Date: Tuesday, July 26, 2016 at 7:01 PM
To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: [charm] Question on quiescence detection mechanism

Hi,

I've a question on the quiescence detection mechanism, and its proper (well-defined) use.

Can a CkStartQD be placed inside an entry method of a chare array? If so, isn't it ill-defined? As I interpret it, a QD callback is invoked when quiescence, a state when no messages are in flight or pending processing and no entry methods are being executed on any processor, is detected. But if it is placed inside an entry method, then technically it can't be a quiescent state since the entry method needs to be executed to arrive at the callback.

The problem I have (or at least the way I've cast the problem) is that every chare object needs to detect quiescence after a particular sequence of entry method executions have occurred on each object. So naturally, I'm thinking of placing a CkStartQD call inside a serial entry method of the chare array. But I'm not sure if a quiescent state can ever be reached for the reason I described above. The few examples with QD I've seen in the charm examples all place a CkStartQD inside the main chare.

Any clarification would be helpful.

Thanks,

Hemanth.

[charm] Question on quiescence detection mechanism, Kolla, Hemanth NMN, 07/26/2016
- <Possible follow-up(s)>
- Re: [charm] Question on quiescence detection mechanism, Kale, Laxmikant V, 07/26/2016
  - Re: [charm] [EXTERNAL] Re: Question on quiescence detection mechanism, Kolla, Hemanth NMN, 07/26/2016
    - Re: [charm] [EXTERNAL] Re: Question on quiescence detection mechanism, Phil Miller, 07/27/2016