Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++ execution order question

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++ execution order question


Chronological Thread 
  • From: Evghenii Gaburov <e-gaburov AT northwestern.edu>
  • To: Akhil langer <akhilanger AT gmail.com>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Charm++ execution order question
  • Date: Sun, 25 Sep 2011 03:35:25 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

> It seems like the solution is either in Quiesence Detection or in
> All_to_All.
I already tried QD (via CkStatQD(..)) but the overhead with large number of
chares (> 1024) becomes intolerable.

In this version
contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
thisProxy)); /* barrier */
was replaced with
CkStartQD(CkIndex_myClass::do_work_complete, thishandle);

and of course, the do_work(..) was called from the threaded method of the
Main chare.

I will try then All_to_all, but it is a bit hard for me to figure out what is
the optimal way to do this in Charm++.
Any suggestions, advice or examples will be highly appreciated.

> AFAIK message delivery order cannot be guaranteed in charm. And message
> delivery order wont solve the problem either. In your original code the
> do_work_complete will be called after all the chares have initiated their
> sends, whether the receiving chares have received those messages is not
> guaranteed. If you think how the contribute call will work, you will see
> that msg delivery order will not solve the problem.
I was thinking that do_work_complete() would be executed only after all
scheduled recv(..) have been executed. If that is not the case,
this may explain why sometimes my code works, sometimes fails in some,
especially on large #procs (128) and # chares (1024-4096),
if do_work_complete() is executed before all scheduler recv(..) have been
executed.

Thanks!




>
>
> On Sat, Sep 24, 2011 at 9:53 PM, Evghenii Gaburov
> <e-gaburov AT northwestern.edu>
> wrote:
> Hi,
>
> Thanks for response.
>
> > Charm does not guarantee message delivery order (charm uses UDP to
> > deliver messages - on top of UDP charm has its built-in TCP like protocol
> > but that is not sufficient to guarantee in-order delivery).
> Is there a way to enforce this delivery order? I am using mpi-linux_x86_64.
>
> > In your program, can the contribute call be moved to the remote chares
> > recv method instead??
> Regretfully, it cannot be, because multiple chares send data to a given
> chare,
> and this chare does not know form how many remote chares data will arrive,
> w/o doing MPI_Alltoall equivalent.
>
> If I place contribute insite remote recv chare, I get a error
>
> "Reason: Already sent a value to this callback!"
>
> which probably because a given chare executed recv more than once and
> called contribute.
>
> Cheers,
> Evghenii
>
> >
> >
> > On Sat, Sep 24, 2011 at 8:59 PM, Evghenii Gaburov
> > <e-gaburov AT northwestern.edu>
> > wrote:
> > Hi All,
> >
> > I have some misunderstanding upon the order in which the messages arrive.
> > I read, that messages by default obey FIFO order
> >
> > So, does the following code
> >
> > "
> > void myClass::do_work()
> > {
> > /* do some work first */
> > for (int i = 0; i < nsend; i++)
> > myClass[remoteIndex[i]].recv(results[i]); /* send
> > data to remote chares */
> > contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
> > thisProxy)); /* barrier */
> > }
> >
> > void myClass::recv(myResults remote_result) { store_remote_result; }
> > void myClass::do_work_complete() { process arrived
> > remote_results; }
> > "
> >
> > guarantees that myClass::recv(..) methods will be executed first (because
> > they were called first),
> > and only afterwards a reduction part will call
> > myClass::do_work_complete() (because it is called second).
> > This order is required to make sure that when do_work_complete is only
> > invoked when *all* remote data has arrived.
> >
> > Thanks!
> >
> > Cheers,
> > Evghenii
> > --
> > Evghenii Gaburov,
> > e-gaburov AT northwestern.edu
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > charm mailing list
> > charm AT cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
> >
>
> --
> Evghenii Gaburov,
> e-gaburov AT northwestern.edu
>
>
>
>
>
>
>

--
Evghenii Gaburov,
e-gaburov AT northwestern.edu











Archive powered by MHonArc 2.6.16.

Top of Page