Skip to Content.
Sympa Menu

charm - Re: [charm] Load balancing with Charm++ >= 6.6.1

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Load balancing with Charm++ >= 6.6.1


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: James Bordner <jobordner AT gmail.com>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Load balancing with Charm++ >= 6.6.1
  • Date: Wed, 6 Jul 2016 13:48:08 -0500

OK, thanks. I'll try to get some follow-up on this in motion. I
suspect your test case that can reproduce it may be necessary. A
reduced case is always nice, but even if it's "build Cello and run it
with this example", we'll manage.


On Wed, Jul 6, 2016 at 1:44 PM, James Bordner
<jobordner AT gmail.com>
wrote:
> Hi Phil--yes, it's also observed in both 6.7.0 and 6.7.1.
>
> On Wed, Jul 6, 2016 at 11:33 AM, Phil Miller
> <mille121 AT illinois.edu>
> wrote:
>>
>> Hi James,
>>
>> Could you specifically confirm that this crash is still observed with
>> 6.7.0 and 6.7.1? I feel like this is somewhat familiar, but would have
>> to do more digging than the brief search I've done to figure out
>> where/why/when.
>>
>> Phil
>>
>> On Wed, Jul 6, 2016 at 1:28 PM, James Bordner
>> <jobordner AT gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I have a load balancing test that works with Charm++ versions <= 6.6.0,
>> > but
>> > with Charm++ >= 6.6.1 I get the following error:
>> >
>> > [0] Assertion "n<len" failed in file cklists.h line 221.
>> > ------------- Processor 0 Exiting: Called CmiAbort ------------
>> > Reason:
>> > [0] Stack Traceback:
>> > [0:0] CmiAbort+0x5b [0x6a51ea]
>> > [0:1] __cmi_assert+0x42 [0x6af4c3]
>> > [0:2] _ZN5CkVecI9LDObjDataEixEm+0x32 [0x58d928]
>> > [0:3] _ZN6BaseLB7LDStats7getHashERK8_LDObjidRK7_LDOMid+0xd5
>> > [0x636deb]
>> > [0:4] _ZN6BaseLB7LDStats7getHashERK9_LDObjKey+0x47 [0x636ee1]
>> > [0:5] _ZN6BaseLB7LDStats11getSendHashER10LDCommData+0x33 [0x636f17]
>> > [0:6]
>> > _ZN9CentralLB27removeCommDataOfDeletedObjsEPN6BaseLB7LDStatsE+0x99
>> > [0x63ede3]
>> > [0:7] _ZN9CentralLB11LoadBalanceEv+0x1ef [0x63cc39]
>> > [0:8] _ZN17CkIndex_CentralLB22_call_LoadBalance_voidEPvS0_+0x30
>> > [0x643dae]
>> > [0:9] CkDeliverMessageFree+0x4e [0x5b05c8]
>> > [0:10] [0x5b070e]
>> > [0:11] [0x5b082a]
>> > [0:12] [0x5b1f0d]
>> > [0:13] [0x5b1fb5]
>> > [0:14] _Z15_processHandlerPvP11CkCoreState+0x126 [0x5b24cc]
>> > [0:15] CmiHandleMessage+0x4d [0x6ac169]
>> > [0:16] CsdScheduleForever+0xad [0x6ac3ea]
>> > [0:17] CsdScheduler+0x16 [0x6ac31b]
>> > [0:18] [0x6aa334]
>> > [0:19] ConverseInit+0x32e [0x6aa7f1]
>> > [0:20] main+0x3f [0x5a0fb2]
>> > [0:21] __libc_start_main+0xf5 [0x7fb33dda8f45]
>> > [0:22] [0x5764ff]
>> > Fatal error on PE 0>
>> >
>> > I can provide more details, but thought I'd start by asking if this
>> > looks
>> > familiar to anyone?
>> >
>> > Thanks!
>> > James
>
>



Archive powered by MHonArc 2.6.16.

Top of Page