Skip to Content.
Sympa Menu

charm - [charm] memory management errors after ckexit called

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] memory management errors after ckexit called


Chronological Thread 
  • From: Scott Field <sfield AT astro.cornell.edu>
  • To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: [charm] memory management errors after ckexit called
  • Date: Tue, 16 Jun 2015 16:14:23 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi,

  Recently, after pulling a bleeding-edge version of the charm++ code, all of our regression tests now fail with either a segmentation fault or "double free or corruption (!prev): 0x0000000001c4de20 ***". The error appears to occur after ckexit is called. Charm++ was built on my laptop with 

 >>> ./build charm++ multicore-linux32 gcc --with-production -j3 -std=c++11

  Using git's bisect utility, I was able to track down the first commit version where things go wrong. The git hash and commit messages are c96750026bbc7a9190f1381e7ac9ea56ae86f80e and "Bug #695:  disable comm thread in multicore builds". More specifically, if I edit line 200 of the file src/arch/util/machine-common-core.c from "#define CMK_SMP_NO_COMMTHD CMK_MULTICORE" to "#define CMK_SMP_NO_COMMTHD 0" the error message goes away and all tests pass again. 

  Honestly I don't really know what why this change fixed the problem -- its pretty far under-the-hood. 

  A few questions:

1) Is this list a appropriate place to post information about potential bugs?

2) Does this seem to be a charm++ bug introduced by that commit? Or a fix which has simply broken our code? I had a hard time tracking down the source of the error. Oddly enough, I could not reproduce the same error when using valgrind (although it did report an  "Uninitialised value was created by a stack allocation" which it tracked to one of the declaration files created by charmc). With MALLOC_CHECK_ set to 3 I get the following 

*** Error in `./Evolve1DScalarWave': free(): invalid pointer: 0x000000000203c920 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7f4cebc2e38f]
/lib/x86_64-linux-gnu/libc.so.6(+0x81fb6)[0x7f4cebc3cfb6]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c280)[0x7f4cebbf7280]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c2a5)[0x7f4cebbf72a5]
./Evolve1DScalarWave[0x670b4a]
./Evolve1DScalarWave[0x5e39ed]
./Evolve1DScalarWave(CsdScheduleForever+0x48)[0x673e88]
./Evolve1DScalarWave(CsdScheduler+0x2d)[0x67413d]
./Evolve1DScalarWave(_ZN12ElementChareI16ScalarWaveSystemILi1EEE11endTimeStepEv+0x448)[0x580d3c]
./Evolve1DScalarWave(_ZN12ElementChareI16ScalarWaveSystemILi1EEE13endComputeRhsEv+0x5331DScalarWave': free(): invalid pointer: 0x000000000203c920 ***


Best,
Scott



Archive powered by MHonArc 2.6.16.

Top of Page