Skip to Content.
Sympa Menu

charm - Re: [charm] Load balancing with Charm++ >= 6.6.1

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Load balancing with Charm++ >= 6.6.1


Chronological Thread 
  • From: James Bordner <jobordner AT gmail.com>
  • To: Phil Miller <mille121 AT illinois.edu>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Load balancing with Charm++ >= 6.6.1
  • Date: Wed, 6 Jul 2016 11:54:05 -0700

Great, thank you!

On Wed, Jul 6, 2016 at 11:48 AM, Phil Miller <mille121 AT illinois.edu> wrote:
OK, thanks. I'll try to get some follow-up on this in motion. I
suspect your test case that can reproduce it may be necessary. A
reduced case is always nice, but even if it's "build Cello and run it
with this example", we'll manage.


On Wed, Jul 6, 2016 at 1:44 PM, James Bordner <jobordner AT gmail.com> wrote:
> Hi Phil--yes, it's also observed in both 6.7.0 and 6.7.1.
>
> On Wed, Jul 6, 2016 at 11:33 AM, Phil Miller <mille121 AT illinois.edu> wrote:
>>
>> Hi James,
>>
>> Could you specifically confirm that this crash is still observed with
>> 6.7.0 and 6.7.1? I feel like this is somewhat familiar, but would have
>> to do more digging than the brief search I've done to figure out
>> where/why/when.
>>
>> Phil
>>
>> On Wed, Jul 6, 2016 at 1:28 PM, James Bordner <jobordner AT gmail.com> wrote:
>> > Hello,
>> >
>> > I have a load balancing test that works with Charm++ versions <= 6.6.0,
>> > but
>> > with Charm++ >= 6.6.1 I get the following error:
>> >
>> > [0] Assertion "n<len" failed in file cklists.h line 221.
>> > ------------- Processor 0 Exiting: Called CmiAbort ------------
>> > Reason:
>> > [0] Stack Traceback:
>> >   [0:0] CmiAbort+0x5b  [0x6a51ea]
>> >   [0:1] __cmi_assert+0x42  [0x6af4c3]
>> >   [0:2] _ZN5CkVecI9LDObjDataEixEm+0x32  [0x58d928]
>> >   [0:3] _ZN6BaseLB7LDStats7getHashERK8_LDObjidRK7_LDOMid+0xd5
>> > [0x636deb]
>> >   [0:4] _ZN6BaseLB7LDStats7getHashERK9_LDObjKey+0x47  [0x636ee1]
>> >   [0:5] _ZN6BaseLB7LDStats11getSendHashER10LDCommData+0x33  [0x636f17]
>> >   [0:6]
>> > _ZN9CentralLB27removeCommDataOfDeletedObjsEPN6BaseLB7LDStatsE+0x99
>> > [0x63ede3]
>> >   [0:7] _ZN9CentralLB11LoadBalanceEv+0x1ef  [0x63cc39]
>> >   [0:8] _ZN17CkIndex_CentralLB22_call_LoadBalance_voidEPvS0_+0x30
>> > [0x643dae]
>> >   [0:9] CkDeliverMessageFree+0x4e  [0x5b05c8]
>> >   [0:10]   [0x5b070e]
>> >   [0:11]   [0x5b082a]
>> >   [0:12]   [0x5b1f0d]
>> >   [0:13]   [0x5b1fb5]
>> >   [0:14] _Z15_processHandlerPvP11CkCoreState+0x126  [0x5b24cc]
>> >   [0:15] CmiHandleMessage+0x4d  [0x6ac169]
>> >   [0:16] CsdScheduleForever+0xad  [0x6ac3ea]
>> >   [0:17] CsdScheduler+0x16  [0x6ac31b]
>> >   [0:18]   [0x6aa334]
>> >   [0:19] ConverseInit+0x32e  [0x6aa7f1]
>> >   [0:20] main+0x3f  [0x5a0fb2]
>> >   [0:21] __libc_start_main+0xf5  [0x7fb33dda8f45]
>> >   [0:22]   [0x5764ff]
>> > Fatal error on PE 0>
>> >
>> > I can provide more details, but thought I'd start by asking if this
>> > looks
>> > familiar to anyone?
>> >
>> > Thanks!
>> > James
>
>




Archive powered by MHonArc 2.6.16.

Top of Page