Skip to Content.
Sympa Menu

charm - Re: [charm] mis-matched client callbacks in reduction messages

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] mis-matched client callbacks in reduction messages


Chronological Thread 
  • From: Jozsef Bakosi <jbakosi AT lanl.gov>
  • To: Phil Miller <mille121 AT illinois.edu>
  • Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] mis-matched client callbacks in reduction messages
  • Date: Thu, 2 Nov 2017 11:24:11 -0600
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=jbakosi AT lanl.gov

Hi Phil,

Sorry for the whining, but this error is giving me way too much trouble and I
don't think my understanding is getting better.

So I am successfully using shadow arrays and they do appear to work around
this
problem. (I have tried this with groups successfully only though so far.)

Since I have been mainly getting this problem with multiple reductions using
the
randomized-queue build of Charm++, I wonder if my requirement that a logic
involving SDAG and multiple reductions to execute correctly (i.e. without this
error) makes sense even with randomized queues. I am thinking that randomized
queues will most likely fire off multiple reductions in different (i.e.,
random)
order, effectively taking the ordering out of my hand. Do you think that's
true?
Aren't multiple reductions inherently incompatible with randomized queues?

To make it more concrete, I have the following simplified scenario in
pseudo-code:

class ChareArray : public CProxy_ChareArray {

/*entry*/ void dt() {
// compute some dt specific to this array element
double dt = ...
// allreduce:
contribute( to all elements of ChareArray targeting advance(mindt)
delivering
the minimum of some dt to all elements )
}

/*entry*/ void advance(double mindt) {
contribute( to some single chare collecting some diagnostics )
if (continue time stepping)
dt();
else
contribute( to some single chare eventually calling ckExit() )
}

}

So during time stepping there are really two contribute calls and I'm pretty
sure these two generate the "mis-matched client callbacks in reduction
messages"
error. (I don't think the logic gets to the contribute that will eventually
get
to ckExit().)

When I start one of them from a bound/shadow array, I still get the error but
only with randomized queues. The order of contributions to the two reductions
(per single chare), I believe, is guaranteed here. But won't randomized queues
screw up the order? Can that even be done? Do I want too much?

Jozsef

On 10.29.2017 17:22, Jozsef Bakosi wrote:
> On 10.27.2017 11:38, Jozsef Bakosi wrote:
> > On 10.27.2017 11:02, Phil Miller wrote:
> > > We use an approach of creating bound 'shadow' arrays to act as
> > > independent reduction (sequencing) contexts to address this
> > > limitation.
> > > We've used this approach in a few places in our code, including the
> > > LiveViz in-situ visualization library and the collision detection
> > > library.
> > > In a little more detail, when constructing a chare array, it's
> > > possible
> > > to specify that it should be bound to another existing chare array.
> > > That means that elements of the same index will always live on the
> > > same
> > > PE. So, you can instantiate some auxiliary arrays, one per reduction
> > > stream, and bind them to your main computation arrays. Since elements
> > > with corresponding indices are guaranteed to be co-located, the main
> > > element can get a pointer to each auxiliary via a ckLocal() call, and
> > > then call aux->contribute(...) rather than implicitly
> > > this->contribute(). So, the setup code get a bit more complicated,
> > > and
> > > the code actually invoking the reductions get just a little more
> > > involved.
> > > Is that a clear description? Does that approach work for you?
> >
> > I think that would work and I do use bound arrays for a different purpose.
> >
> > So how would I have to use this? Here is what I think I need to do: I
> > have to
> > identify all reductions that can happen in an order that is not
> > necessarily
> > guaranteed to be always the same and fire them from bound arrays instead
> > (each
> > from a different chare array)?
>
> Is there a way to tell which two reductions caused the "mis-matched client
> callbacks in reduction messages" error? I do get a traceback from one, but
> can I
> get one from the other one somehow so I know which reduction I have to
> initiate
> from a shadow array?
>
> Thanks,
> J



Archive powered by MHonArc 2.6.19.

Top of Page