Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] load balancer question (freeze/crash)

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] load balancer question (freeze/crash)


Chronological Thread 
  • From: Pritish Jetley <pjetley2 AT illinois.edu>
  • To: Evghenii Gaburov <e-gaburov AT northwestern.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, Gengbin Zheng <zhenggb AT gmail.com>
  • Subject: Re: [charm] [ppl] load balancer question (freeze/crash)
  • Date: Tue, 4 Oct 2011 12:45:01 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Evghenii,

You need to pup only those data members of a chare which are required after migration. In your case, this would be just MainCB.

On Tue, Oct 4, 2011 at 12:37 PM, Evghenii Gaburov <e-gaburov AT northwestern.edu> wrote:
> Make sure you call the parent when you overload those two functions,
> something like the following:
>
>      void ckAboutToMigrate() { CBase_LB_Test::ckAboutToMigrate(); }
>      void ckJustMigrated() { CBase_LB_Test::ckJustMigrated(); }
Okay, that solves the problem.


> For your production code, make sure you write pup functions that
> pack/unpack all class variables.
Even temporarily variables that are reconstructed during PUP process, and in principle
do not require migration? Can this be cause for deadlocks if I do not PUP them?

> Also look at possible race conditions in the code. For example, after
> calling AtSync() (assuming you are using periodic load balancing), the
> caller should not send new messages. It should wait for the resume
> from resumefromSync() call.
Okay, I will double check that.

Thanks,
 Evghenii

>
> Gengbin
>
> On Tue, Oct 4, 2011 at 10:43 AM, Evghenii Gaburov
> <e-gaburov AT northwestern.edu> wrote:
>>> This program does not PUP the MainCB callback member variable
>>> Variables which are not PUP'd will not retain their value after
>>> migration.  Therefore every migrated element will be calling an
>>> uninitialized callback in ResumeFromSync.
>> So, the freeze still occur even after MainCB is passed to PUP.
>>
>> The test program I posted in the previous listing
>> sometimes freezes with Greedy[Comm]LB, Refine[Comm]LB & MetisLB, but not with RotateLB,
>>
>> when ckAboutToMigrate() & ckJustMigrated() are defined.
>>
>> #if 1
>>     void ckAboutToMigrate() {}
>>     void ckJustMigrated() {}
>> #endif
>>
>> Any idea what may happen here?
>>
>> While in my simulation code I do not use these, I still experience freezes at ResumeFromSync()
>> after having the code run for about an hour and after a dozens of AtSync() calls. I cannot reproduce
>> this behaviour in that simple test code, but may be this is related to the fact that in production code
>> I move a lot of data...
>>
>> Any help will be of great value!
>>
>> Cheers,
>>  Evghenii
>>
>>
>>
>> --
>> Evghenii Gaburov, e-gaburov AT northwestern.edu
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> charm mailing list
>> charm AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> _______________________________________________
>> ppl mailing list
>> ppl AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>

--
Evghenii Gaburov, e-gaburov AT northwestern.edu







_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm
_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl



--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign



Archive powered by MHonArc 2.6.16.

Top of Page