Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] load balancer question (freeze/crash)

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] load balancer question (freeze/crash)


Chronological Thread 
  • From: Gengbin Zheng <zhenggb AT gmail.com>
  • To: Evghenii Gaburov <e-gaburov AT northwestern.edu>
  • Cc: Eric Bohm <ebohm AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] load balancer question (freeze/crash)
  • Date: Tue, 4 Oct 2011 12:17:30 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Make sure you call the parent when you overload those two functions,
something like the following:

void ckAboutToMigrate() { CBase_LB_Test::ckAboutToMigrate(); }
void ckJustMigrated() { CBase_LB_Test::ckJustMigrated(); }

For your production code, make sure you write pup functions that
pack/unpack all class variables.
Also look at possible race conditions in the code. For example, after
calling AtSync() (assuming you are using periodic load balancing), the
caller should not send new messages. It should wait for the resume
from resumefromSync() call.

Gengbin

On Tue, Oct 4, 2011 at 10:43 AM, Evghenii Gaburov
<e-gaburov AT northwestern.edu>
wrote:
>> This program does not PUP the MainCB callback member variable
>> Variables which are not PUP'd will not retain their value after
>> migration.  Therefore every migrated element will be calling an
>> uninitialized callback in ResumeFromSync.
> So, the freeze still occur even after MainCB is passed to PUP.
>
> The test program I posted in the previous listing
> sometimes freezes with Greedy[Comm]LB, Refine[Comm]LB & MetisLB, but not
> with RotateLB,
>
> when ckAboutToMigrate() & ckJustMigrated() are defined.
>
> #if 1
>     void ckAboutToMigrate() {}
>     void ckJustMigrated() {}
> #endif
>
> Any idea what may happen here?
>
> While in my simulation code I do not use these, I still experience freezes
> at ResumeFromSync()
> after having the code run for about an hour and after a dozens of AtSync()
> calls. I cannot reproduce
> this behaviour in that simple test code, but may be this is related to the
> fact that in production code
> I move a lot of data...
>
> Any help will be of great value!
>
> Cheers,
>  Evghenii
>
>
>
> --
> Evghenii Gaburov,
> e-gaburov AT northwestern.edu
>
>
>
>
>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>





Archive powered by MHonArc 2.6.16.

Top of Page