Skip to Content.
Sympa Menu

charm - RE: [charm] code crash when run with migration based LB - charm++ 6.7.1

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

RE: [charm] code crash when run with migration based LB - charm++ 6.7.1


Chronological Thread 
  • From: "Vipul Harsh, -" <vharsh2 AT illinois.edu>
  • To: Fouzhan Hosseini <F.Hosseini AT leeds.ac.uk>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: RE: [charm] code crash when run with migration based LB - charm++ 6.7.1
  • Date: Wed, 5 Oct 2016 17:37:36 +0000
  • Accept-language: en-IN, en-US

Hi Fouzhan,

CkMulticast library handles migrations. Contributes should work even with old values of the cookie (CkSectionInfo object),  after the chare has migrated. But make sure that you pup the CkSectionInfo object in your pup routine.

Thanks and Regards,
Vipul Harsh

From: Fouzhan Hosseini [F.Hosseini AT leeds.ac.uk]
Sent: 03 October 2016 18:36:46
To: charm AT cs.uiuc.edu
Subject: [charm] code crash when run with migration based LB - charm++ 6.7.1

Dear All,


I have coded a Charm++ program, which works fine running either on a multi-core machine or on a cluster. However, when this program is linked and executed with available migration based load balancing strategies (e.g +balancer GreedyLB), it usually crashes with error message "corrupted double-linked list.."  or "seg fault". I have been trying to track down the problem and not sure where it is coming from. I have a few questions.     

I am new to charm++ community, I hope here is the right place to raise questions/ask for help/report bugs.  


1) There are two char arrays in my code and PUP method is implemented for both. I only have simple entry methods (no threaded or sync method), but I heavily use structure daggers to express coordination between entry methods( for, if and when statements and matching on reference numbers).  "__sdag_pup(p);" is added in PUP methods. Is there anything else I am supposed to add to my code to be able to use migration based LB?


2) I am using CkMulticast library with array sections and section reductions. Each array section only contributes in one reduction, so I've define a local variable of type CkSectionInfo in relevant chare function members which are updated calling "CkGetSectionInfo()". I do not quite understand how CkSectionInfo are updated in CkMulticast lib in case of migration, so wondered if this can cause problem. 


3) There is an entry method called Merger() which is expressed by sdagger. In this method there is a when statement waiting on another entry method called RecvBSlabSet1(). RecvBSlabSet1() is called when a section reduction on the other array completes. This two entry methods often are mentioned is Error message stack trace. I am including the error message stack trace in case it would be useful. Both this entry methods belong to a chare array called JointContourNet.


** Error in `JCN': *** Error in `JCN': corrupted double-linked list: 0x000000000298f860 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7b184)[0x7ffff6957184]
/lib64/libc.so.6(+0x7d235)[0x7ffff6959235]
JCN(_ZN4SDAG10MsgClosureD0Ev+0x24)[0x60e0b4]
JCN(_ZN4SDAG6BufferD0Ev+0x46)[0x60e126]
JCN(_ZN15JointContourNet7_when_0EPN23Closure_JointContourNet16Merger_4_closureEi+0x2bc)[0x4c82ac]
JCN(_ZN15JointContourNet13RecvBSlabSet1EP14CkReductionMsg+0x188)[0x4c8af8]
JCN(CkDeliverMessageFree+0x22)[0x530652]
JCN(_ZN14CkLocRec_local11invokeEntryEP12CkMigratablePvib+0x240)[0x54a570]
JCN(_ZN14CkLocRec_local7deliverEP14CkArrayMessage11CkDeliver_ti+0x314)[0x54b504]
JCN(_ZN8CkLocMgr7deliverEP9CkMessage11CkDeliver_ti+0xec)[0x546fdc]
JCN(_Z15_processHandlerPvP11CkCoreState+0x437)[0x537327]
JCN(CsdScheduleForever+0x48)[0x5f9ff8]
JCN(CsdScheduler+0x2d)[0x5fa28d]
JCN(ConverseInit+0x3ea)[0x5f8f6a]
JCN(main+0x2c)[0x4bcd5c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff68fdb15]


Regards, 

Fouzhan 




Archive powered by MHonArc 2.6.19.

Top of Page