Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++ execution order question

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++ execution order question


Chronological Thread 
  • From: "Kale, Laxmikant V" <kale AT illinois.edu>
  • To: Evghenii Gaburov <e-gaburov AT northwestern.edu>, Akhil langer <akhilanger AT gmail.com>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Charm++ execution order question
  • Date: Sun, 25 Sep 2011 14:43:43 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Quiescence detection, per se, has an *extremely* low overhead. It runs
only when a processor is idle, and is a spanning tree based algorithm. The
source of "overhead" might be elsewhere. In any case, you should fix the
order dependence in your code first. Having someone in PPL take a look at
the parallel code might help.

I'd not advice any all-to-all operation here, unless there is a strong
reason for it.

Structured dagger notation can help enforce within an object, if that
helps. But the general technique is to explicitly control order dependence.


--
Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu
<http://charm.cs.uiuc.edu/>
Professor, Computer Science
kale AT illinois.edu
201 N. Goodwin Avenue Ph: (217) 244-0094
Urbana, IL 61801-2302 FAX: (217) 265-6582






On 9/24/11 10:35 PM, "Evghenii Gaburov"
<e-gaburov AT northwestern.edu>
wrote:

>> It seems like the solution is either in Quiesence Detection or in
>>All_to_All.
>I already tried QD (via CkStatQD(..)) but the overhead with large number
>of chares (> 1024) becomes intolerable.
>
>In this version
> contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
>thisProxy)); /* barrier */
>was replaced with
> CkStartQD(CkIndex_myClass::do_work_complete, thishandle);
>
>and of course, the do_work(..) was called from the threaded method of the
>Main chare.
>
>I will try then All_to_all, but it is a bit hard for me to figure out
>what is the optimal way to do this in Charm++.
>Any suggestions, advice or examples will be highly appreciated.
>
>> AFAIK message delivery order cannot be guaranteed in charm. And message
>>delivery order wont solve the problem either. In your original code the
>>do_work_complete will be called after all the chares have initiated
>>their sends, whether the receiving chares have received those messages
>>is not guaranteed. If you think how the contribute call will work, you
>>will see that msg delivery order will not solve the problem.
>I was thinking that do_work_complete() would be executed only after all
>scheduled recv(..) have been executed. If that is not the case,
>this may explain why sometimes my code works, sometimes fails in some,
>especially on large #procs (128) and # chares (1024-4096),
>if do_work_complete() is executed before all scheduler recv(..) have been
>executed.
>
>Thanks!
>
>
>
>
>>
>>
>> On Sat, Sep 24, 2011 at 9:53 PM, Evghenii Gaburov
>><e-gaburov AT northwestern.edu>
>> wrote:
>> Hi,
>>
>> Thanks for response.
>>
>> > Charm does not guarantee message delivery order (charm uses UDP to
>>deliver messages - on top of UDP charm has its built-in TCP like
>>protocol but that is not sufficient to guarantee in-order delivery).
>> Is there a way to enforce this delivery order? I am using
>>mpi-linux_x86_64.
>>
>> > In your program, can the contribute call be moved to the remote
>>chares recv method instead??
>> Regretfully, it cannot be, because multiple chares send data to a given
>>chare,
>> and this chare does not know form how many remote chares data will
>>arrive, w/o doing MPI_Alltoall equivalent.
>>
>> If I place contribute insite remote recv chare, I get a error
>>
>> "Reason: Already sent a value to this callback!"
>>
>> which probably because a given chare executed recv more than once and
>>called contribute.
>>
>> Cheers,
>> Evghenii
>>
>> >
>> >
>> > On Sat, Sep 24, 2011 at 8:59 PM, Evghenii Gaburov
>><e-gaburov AT northwestern.edu>
>> wrote:
>> > Hi All,
>> >
>> > I have some misunderstanding upon the order in which the messages
>>arrive. I read, that messages by default obey FIFO order
>> >
>> > So, does the following code
>> >
>> > "
>> > void myClass::do_work()
>> > {
>> > /* do some work first */
>> > for (int i = 0; i < nsend; i++)
>> > myClass[remoteIndex[i]].recv(results[i]); /*
>>send data to remote chares */
>> > contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
>>thisProxy)); /* barrier */
>> > }
>> >
>> > void myClass::recv(myResults remote_result) { store_remote_result; }
>> > void myClass::do_work_complete() { process arrived
>>remote_results; }
>> > "
>> >
>> > guarantees that myClass::recv(..) methods will be executed first
>>(because they were called first),
>> > and only afterwards a reduction part will call
>>myClass::do_work_complete() (because it is called second).
>> > This order is required to make sure that when do_work_complete is
>>only invoked when *all* remote data has arrived.
>> >
>> > Thanks!
>> >
>> > Cheers,
>> > Evghenii
>> > --
>> > Evghenii Gaburov,
>> > e-gaburov AT northwestern.edu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > charm mailing list
>> > charm AT cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> >
>>
>> --
>> Evghenii Gaburov,
>> e-gaburov AT northwestern.edu
>>
>>
>>
>>
>>
>>
>>
>
>--
>Evghenii Gaburov,
>e-gaburov AT northwestern.edu
>
>
>
>
>
>
>
>_______________________________________________
>charm mailing list
>charm AT cs.uiuc.edu
>http://lists.cs.uiuc.edu/mailman/listinfo/charm






Archive powered by MHonArc 2.6.16.

Top of Page