Skip to Content.
Sympa Menu

charm - Re: [charm] segfault upon migration

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] segfault upon migration


Chronological Thread 
  • From: Nicolas Bock <nicolasbock AT gmail.com>
  • To: Jonathan Lifflander <jliffl2 AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] segfault upon migration
  • Date: Mon, 19 Aug 2013 12:33:03 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Jonathan and Nikhil,

thanks that did it. Although I had read section 10.2 many times, it never occurred to me that [threaded] is also necessary for migration.

Thanks again,

nick


On Fri, Aug 16, 2013 at 7:35 PM, Jonathan Lifflander <jliffl2 AT illinois.edu> wrote:
Hey,

Nikhil and I looked over your code and noticed another problem. It's
actually not related to the load balancing. For a entry method to call
a "sync" method, it must be threaded (see manual section 12.2)
(imagine the scenario when the object it is calling "sync" is on the
same processor, it must suspend to execute the method). We need to add
more runtime error checking to make sure this is the case and print
out a useful error message.

So the fix is to make "doSomething" threaded. Then the code seems to work fine.

What was happening (as far as I can tell), is that the method was
waiting for the return value, and the load balancer was trying to move
it. This was not valid because of the lack of a thread, hence the
stack state was not migrateable. I'm surprised the code didn't hang
before you encountered this problem.

Jonathan

On Fri, Aug 16, 2013 at 6:02 PM, Nicolas Bock <nicolasbock AT gmail.com> wrote:
> Hi,
>
> please have a look at the attached code. The code consists of two chare
> arrays, one holding some data, one doing some work. The main code calls a
> reduction on the Work array which gets information from the Data array to do
> something. When I run this (make run) on more than one PE with the
> GreedyCommLB load balancer the code segfaults at random points when the load
> balancer kicks in. I think what's going on is that the Data::info() call in
> Work::doSomething() suspends the chare and it just so happens that it
> sometimes is migrated while being suspended. If I comment out the code block
> that calls Data::info() the program executes just fine.
>
> Is what I am thinking correct, or is there another problem in the code that
> I have overlooked?
>
> Thanks already,
>
> nick
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>




Archive powered by MHonArc 2.6.16.

Top of Page