Skip to Content.
Sympa Menu

charm - Re: [charm] Program hang when using load balancing and lots of PEs

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Program hang when using load balancing and lots of PEs


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Robert Steinke <rsteinke AT uwyo.edu>
  • Cc: Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] Program hang when using load balancing and lots of PEs
  • Date: Tue, 27 Jan 2015 16:14:34 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

The first thing to try would be running with the option "+LBDebug 3" to get some visibility into what's happening in the LB infrastructure. Could you send us output from such a run?

Also, how many objects are you running with across the whole job?

On Tue, Jan 27, 2015 at 3:51 PM, Robert Steinke <rsteinke AT uwyo.edu> wrote:
I have a program that hangs when I run on lots of PEs and use the load balancer (I'm using MetisLB).  If I run on 512 or fewer processors it is fine.  If I try to run on 1024 processors it hangs shortly after I call CkStartLB (I'm using TurnManualLBOn).  Also, if I don't call CkStartLB(); it runs fine on 1024 processors.

Is this a problem that someone else has encountered before?

Is this something that I should try to dig into, or is there someone else more familiar with the load balancer than I am who is willing to look into it, in which case I will apply my effort to creating a minimal test case that reproduces the problem.

Thanks
Bob Steinke

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm




Archive powered by MHonArc 2.6.16.

Top of Page