Skip to Content.
Sympa Menu

charm - Re: [charm] segfault upon migration

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] segfault upon migration


Chronological Thread 
  • From: Nicolas Bock <nicolasbock AT gmail.com>
  • To: Jonathan Lifflander <jliffl2 AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] segfault upon migration
  • Date: Tue, 20 Aug 2013 13:53:02 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi,

I wrote too soon. The program is still segfaulting. in order to isolate what is ultimately causing this behavior I trimmed the program further. The attached version is I guess as basic as it gets in terms of a reduction on a chare array. When I declare Work::doSomething() as [threaded] in migration.ci, the program segfaults after a few iterations. When that method is declared a simple entry method, then the code runs fine. Since Work:doSomething() is not suspending itself, the [threaded] attribute is not necessary, but is it harmful?

Thanks,

nick



On Mon, Aug 19, 2013 at 12:33 PM, Nicolas Bock <nicolasbock AT gmail.com> wrote:
Hi Jonathan and Nikhil,

thanks that did it. Although I had read section 10.2 many times, it never occurred to me that [threaded] is also necessary for migration.

Thanks again,

nick


On Fri, Aug 16, 2013 at 7:35 PM, Jonathan Lifflander <jliffl2 AT illinois.edu> wrote:
Hey,

Nikhil and I looked over your code and noticed another problem. It's
actually not related to the load balancing. For a entry method to call
a "sync" method, it must be threaded (see manual section 12.2)
(imagine the scenario when the object it is calling "sync" is on the
same processor, it must suspend to execute the method). We need to add
more runtime error checking to make sure this is the case and print
out a useful error message.

So the fix is to make "doSomething" threaded. Then the code seems to work fine.

What was happening (as far as I can tell), is that the method was
waiting for the return value, and the load balancer was trying to move
it. This was not valid because of the lack of a thread, hence the
stack state was not migrateable. I'm surprised the code didn't hang
before you encountered this problem.

Jonathan

On Fri, Aug 16, 2013 at 6:02 PM, Nicolas Bock <nicolasbock AT gmail.com> wrote:
> Hi,
>
> please have a look at the attached code. The code consists of two chare
> arrays, one holding some data, one doing some work. The main code calls a
> reduction on the Work array which gets information from the Data array to do
> something. When I run this (make run) on more than one PE with the
> GreedyCommLB load balancer the code segfaults at random points when the load
> balancer kicks in. I think what's going on is that the Data::info() call in
> Work::doSomething() suspends the chare and it just so happens that it
> sometimes is migrated while being suspended. If I comment out the code block
> that calls Data::info() the program executes just fine.
>
> Is what I am thinking correct, or is there another problem in the code that
> I have overlooked?
>
> Thanks already,
>
> nick
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>


Attachment: migration.tar.bz2
Description: BZip2 compressed data




Archive powered by MHonArc 2.6.16.

Top of Page