Skip to Content.
Sympa Menu

charm - Re: [charm] mis-matched client callbacks in reduction messages

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] mis-matched client callbacks in reduction messages


Chronological Thread 
  • From: "Kale, Laxmikant V" <kale AT illinois.edu>
  • To: Jozsef Bakosi <jbakosi AT lanl.gov>, "Miller, Philip B" <mille121 AT illinois.edu>
  • Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] mis-matched client callbacks in reduction messages
  • Date: Sat, 4 Nov 2017 02:09:52 +0000
  • Accept-language: en-US
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=kale AT illinois.edu

A sanity check:

In your example, there is no chance that the contributions will get out of
order *unless* the “if (continue time stepping)” condition evaluates to
different values on different chares in the same iteration. Are you sure it
is identical on all chares?

(E.g. If it is a floating point comparison, it could be off, for example. If
it is a convergence test, its better to do a Boolean reduction. Or, in other
ways, if the expression being evaluated is not identical across all
participating chares, you would get some chares calling dt(), while others
don’t, in some iteration. ).

Of course, this will be a red herring since you say that no chare is going to
the “else” clause leading to CkExit() path. But just worth confirming.

Also, if you had “thisProxy.dt();” (sending the invocation via scheduler’s
queue) the randomized queue might enter the reasoning, but only if another
advance() call happens before dt() is processed(); But I assume the next
iteration cannot start until the current one is finished.

-Sanjay

On 11/2/17, 12:24 PM, "Jozsef Bakosi"
<jbakosi AT lanl.gov>
wrote:

Hi Phil,

Sorry for the whining, but this error is giving me way too much trouble
and I
don't think my understanding is getting better.

So I am successfully using shadow arrays and they do appear to work
around this
problem. (I have tried this with groups successfully only though so far.)

Since I have been mainly getting this problem with multiple reductions
using the
randomized-queue build of Charm++, I wonder if my requirement that a logic
involving SDAG and multiple reductions to execute correctly (i.e. without
this
error) makes sense even with randomized queues. I am thinking that
randomized
queues will most likely fire off multiple reductions in different (i.e.,
random)
order, effectively taking the ordering out of my hand. Do you think
that's true?
Aren't multiple reductions inherently incompatible with randomized queues?

To make it more concrete, I have the following simplified scenario in
pseudo-code:

class ChareArray : public CProxy_ChareArray {

/*entry*/ void dt() {
// compute some dt specific to this array element
double dt = ...
// allreduce:
contribute( to all elements of ChareArray targeting advance(mindt)
delivering
the minimum of some dt to all elements )
}

/*entry*/ void advance(double mindt) {
contribute( to some single chare collecting some diagnostics )
if (continue time stepping)
dt();
else
contribute( to some single chare eventually calling ckExit() )
}

}

So during time stepping there are really two contribute calls and I'm
pretty
sure these two generate the "mis-matched client callbacks in reduction
messages"
error. (I don't think the logic gets to the contribute that will
eventually get
to ckExit().)

When I start one of them from a bound/shadow array, I still get the error
but
only with randomized queues. The order of contributions to the two
reductions
(per single chare), I believe, is guaranteed here. But won't randomized
queues
screw up the order? Can that even be done? Do I want too much?

Jozsef

On 10.29.2017 17:22, Jozsef Bakosi wrote:
> On 10.27.2017 11:38, Jozsef Bakosi wrote:
> > On 10.27.2017 11:02, Phil Miller wrote:
> > > We use an approach of creating bound 'shadow' arrays to act as
> > > independent reduction (sequencing) contexts to address this
limitation.
> > > We've used this approach in a few places in our code, including
the
> > > LiveViz in-situ visualization library and the collision detection
> > > library.
> > > In a little more detail, when constructing a chare array, it's
possible
> > > to specify that it should be bound to another existing chare
array.
> > > That means that elements of the same index will always live on
the same
> > > PE. So, you can instantiate some auxiliary arrays, one per
reduction
> > > stream, and bind them to your main computation arrays. Since
elements
> > > with corresponding indices are guaranteed to be co-located, the
main
> > > element can get a pointer to each auxiliary via a ckLocal()
call, and
> > > then call aux->contribute(...) rather than implicitly
> > > this->contribute(). So, the setup code get a bit more
complicated, and
> > > the code actually invoking the reductions get just a little more
> > > involved.
> > > Is that a clear description? Does that approach work for you?
> >
> > I think that would work and I do use bound arrays for a different
purpose.
> >
> > So how would I have to use this? Here is what I think I need to do: I
have to
> > identify all reductions that can happen in an order that is not
necessarily
> > guaranteed to be always the same and fire them from bound arrays
instead (each
> > from a different chare array)?
>
> Is there a way to tell which two reductions caused the "mis-matched
client
> callbacks in reduction messages" error? I do get a traceback from one,
but can I
> get one from the other one somehow so I know which reduction I have to
initiate
> from a shadow array?
>
> Thanks,
> J





Archive powered by MHonArc 2.6.19.

Top of Page