Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] load balancer question (freeze/crash)

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] load balancer question (freeze/crash)


Chronological Thread 
  • From: Evghenii Gaburov <e-gaburov AT northwestern.edu>
  • To: Gengbin Zheng <zhenggb AT gmail.com>
  • Cc: Eric Bohm <ebohm AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] load balancer question (freeze/crash)
  • Date: Tue, 4 Oct 2011 17:37:04 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

> Make sure you call the parent when you overload those two functions,
> something like the following:
>
> void ckAboutToMigrate() { CBase_LB_Test::ckAboutToMigrate(); }
> void ckJustMigrated() { CBase_LB_Test::ckJustMigrated(); }
Okay, that solves the problem.


> For your production code, make sure you write pup functions that
> pack/unpack all class variables.
Even temporarily variables that are reconstructed during PUP process, and in
principle
do not require migration? Can this be cause for deadlocks if I do not PUP
them?

> Also look at possible race conditions in the code. For example, after
> calling AtSync() (assuming you are using periodic load balancing), the
> caller should not send new messages. It should wait for the resume
> from resumefromSync() call.
Okay, I will double check that.

Thanks,
Evghenii

>
> Gengbin
>
> On Tue, Oct 4, 2011 at 10:43 AM, Evghenii Gaburov
> <e-gaburov AT northwestern.edu>
> wrote:
>>> This program does not PUP the MainCB callback member variable
>>> Variables which are not PUP'd will not retain their value after
>>> migration. Therefore every migrated element will be calling an
>>> uninitialized callback in ResumeFromSync.
>> So, the freeze still occur even after MainCB is passed to PUP.
>>
>> The test program I posted in the previous listing
>> sometimes freezes with Greedy[Comm]LB, Refine[Comm]LB & MetisLB, but not
>> with RotateLB,
>>
>> when ckAboutToMigrate() & ckJustMigrated() are defined.
>>
>> #if 1
>> void ckAboutToMigrate() {}
>> void ckJustMigrated() {}
>> #endif
>>
>> Any idea what may happen here?
>>
>> While in my simulation code I do not use these, I still experience freezes
>> at ResumeFromSync()
>> after having the code run for about an hour and after a dozens of AtSync()
>> calls. I cannot reproduce
>> this behaviour in that simple test code, but may be this is related to the
>> fact that in production code
>> I move a lot of data...
>>
>> Any help will be of great value!
>>
>> Cheers,
>> Evghenii
>>
>>
>>
>> --
>> Evghenii Gaburov,
>> e-gaburov AT northwestern.edu
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> charm mailing list
>> charm AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> _______________________________________________
>> ppl mailing list
>> ppl AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>

--
Evghenii Gaburov,
e-gaburov AT northwestern.edu











Archive powered by MHonArc 2.6.16.

Top of Page