Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++ execution order question

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++ execution order question


Chronological Thread 
  • From: Evghenii Gaburov <e-gaburov AT northwestern.edu>
  • To: "Kale, Laxmikant V" <kale AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Charm++ execution order question
  • Date: Sun, 25 Sep 2011 16:38:32 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi,

Thanks for all responses!

> Quiescence detection, per se, has an *extremely* low overhead. It runs
> only when a processor is idle, and is a spanning tree based algorithm. The
> source of "overhead" might be elsewhere. In any case, you should fix the
> order dependence in your code first. Having someone in PPL take a look at
> the parallel code might help.
I understand. So, it seems something else causes QD to run slow on large #
chares;
anyway, I am unable to use it as my solution because of this issue.

> I'd not advice any all-to-all operation here, unless there is a strong
> reason for it.
Now, I am at loss, since it appears that all-to-all is the only way now to
enforce the order.
All-to-all is done via
"
contribute(numChares*sizeof(int), &nSend_counter[0], CkReduction::sum_int,
CkCallback(do_work_reduction(NULL), thisProxy));
"

however, I consider this to be an overkill because it communicates to *every*
chare how many remote elements *each* chare expects.
Furthermore, it scales linearly with numChares.

For my purpose, it is more than enough to know only how many elements only a
given chare receives. Is there away, reduction to communicate
to each chare only this info, therefore reducing overhead?

I am probably still thinking as n MPI user, so is there a simpler way to do
this in Charm++?

Thanks!


"
void myClass::do_work()
{
/* do some work first */
for (int i = 0; i < nsend; i++)
{
nSend_counter[remoteIndex[i]] += results[i].size();
myClass [remoteIndex[i]].recv(results[i]); /*
send data to remote chares */
}

/* this is all-to-all reduction, each chares know how many elements any
other chares recevies */
contribute(numChares*sizeof(int), &nSend_counter[0], CkReduction::sum_int,
CkCallback(do_work_reduction(NULL), thisProxy));
}

void myClass::do_work_reduction(CkReductionMsg *msg)
{
nRecv_counter[thisIndex] -= ((int*)msg->getData())[thisIndex]; // this
gives total number of elements thisIndex receives
if (nRecv_counter[thisIndex] == 0)
contribute(CkCallback(do_work_complete(), thisProxy)); /* this
means that all recv have been processed */
}

void myClass::recv(myResults remote_result)
{
store_remote_result;
nRecv_counter[thisIndex] += remote_result.size();
if (nRecv_counter[thisIndex] == 0)
contribute(CkCallback(do_work_complete(), thisProxy)); /* both all
recv and do_work_reduction have been processed */
}
void myClass::do_work_complete()
{
assert(nRecv_counter[thisIndex] == 0;
process arrived remote_results;
}
"

Cheers,
Evghenii




> Structured dagger notation can help enforce within an object, if that
> helps. But the general technique is to explicitly control order dependence.
>
>
> --
> Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu
> <http://charm.cs.uiuc.edu/>
> Professor, Computer Science
> kale AT illinois.edu
> 201 N. Goodwin Avenue Ph: (217) 244-0094
> Urbana, IL 61801-2302 FAX: (217) 265-6582
>
>
>
>
>
>
> On 9/24/11 10:35 PM, "Evghenii Gaburov"
> <e-gaburov AT northwestern.edu>
> wrote:
>
>>> It seems like the solution is either in Quiesence Detection or in
>>> All_to_All.
>> I already tried QD (via CkStatQD(..)) but the overhead with large number
>> of chares (> 1024) becomes intolerable.
>>
>> In this version
>> contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
>> thisProxy)); /* barrier */
>> was replaced with
>> CkStartQD(CkIndex_myClass::do_work_complete, thishandle);
>>
>> and of course, the do_work(..) was called from the threaded method of the
>> Main chare.
>>
>> I will try then All_to_all, but it is a bit hard for me to figure out
>> what is the optimal way to do this in Charm++.
>> Any suggestions, advice or examples will be highly appreciated.
>>
>>> AFAIK message delivery order cannot be guaranteed in charm. And message
>>> delivery order wont solve the problem either. In your original code the
>>> do_work_complete will be called after all the chares have initiated
>>> their sends, whether the receiving chares have received those messages
>>> is not guaranteed. If you think how the contribute call will work, you
>>> will see that msg delivery order will not solve the problem.
>> I was thinking that do_work_complete() would be executed only after all
>> scheduled recv(..) have been executed. If that is not the case,
>> this may explain why sometimes my code works, sometimes fails in some,
>> especially on large #procs (128) and # chares (1024-4096),
>> if do_work_complete() is executed before all scheduler recv(..) have been
>> executed.
>>
>> Thanks!
>>
>>
>>
>>
>>>
>>>
>>> On Sat, Sep 24, 2011 at 9:53 PM, Evghenii Gaburov
>>> <e-gaburov AT northwestern.edu>
>>> wrote:
>>> Hi,
>>>
>>> Thanks for response.
>>>
>>>> Charm does not guarantee message delivery order (charm uses UDP to
>>> deliver messages - on top of UDP charm has its built-in TCP like
>>> protocol but that is not sufficient to guarantee in-order delivery).
>>> Is there a way to enforce this delivery order? I am using
>>> mpi-linux_x86_64.
>>>
>>>> In your program, can the contribute call be moved to the remote
>>> chares recv method instead??
>>> Regretfully, it cannot be, because multiple chares send data to a given
>>> chare,
>>> and this chare does not know form how many remote chares data will
>>> arrive, w/o doing MPI_Alltoall equivalent.
>>>
>>> If I place contribute insite remote recv chare, I get a error
>>>
>>> "Reason: Already sent a value to this callback!"
>>>
>>> which probably because a given chare executed recv more than once and
>>> called contribute.
>>>
>>> Cheers,
>>> Evghenii
>>>
>>>>
>>>>
>>>> On Sat, Sep 24, 2011 at 8:59 PM, Evghenii Gaburov
>>> <e-gaburov AT northwestern.edu>
>>> wrote:
>>>> Hi All,
>>>>
>>>> I have some misunderstanding upon the order in which the messages
>>> arrive. I read, that messages by default obey FIFO order
>>>>
>>>> So, does the following code
>>>>
>>>> "
>>>> void myClass::do_work()
>>>> {
>>>> /* do some work first */
>>>> for (int i = 0; i < nsend; i++)
>>>> myClass[remoteIndex[i]].recv(results[i]); /*
>>> send data to remote chares */
>>>> contribute(0, 0, CkReduction::concat, CkCallback(do_work_complete(),
>>> thisProxy)); /* barrier */
>>>> }
>>>>
>>>> void myClass::recv(myResults remote_result) { store_remote_result; }
>>>> void myClass::do_work_complete() { process arrived
>>> remote_results; }
>>>> "
>>>>
>>>> guarantees that myClass::recv(..) methods will be executed first
>>> (because they were called first),
>>>> and only afterwards a reduction part will call
>>> myClass::do_work_complete() (because it is called second).
>>>> This order is required to make sure that when do_work_complete is
>>> only invoked when *all* remote data has arrived.
>>>>
>>>> Thanks!
>>>>
>>>> Cheers,
>>>> Evghenii
>>>> --
>>>> Evghenii Gaburov,
>>>> e-gaburov AT northwestern.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> charm mailing list
>>>> charm AT cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>>
>>>
>>> --
>>> Evghenii Gaburov,
>>> e-gaburov AT northwestern.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Evghenii Gaburov,
>> e-gaburov AT northwestern.edu
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> charm mailing list
>> charm AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>

--
Evghenii Gaburov,
e-gaburov AT northwestern.edu











Archive powered by MHonArc 2.6.16.

Top of Page