Skip to Content.
Sympa Menu

charm - Re: [charm] DDT or Totalview debugger usage

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] DDT or Totalview debugger usage


Chronological Thread 
  • From: Phil Miller <phil AT hpccharm.com>
  • To: "Gunter, David O" <dog AT lanl.gov>
  • Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: Re: [charm] DDT or Totalview debugger usage
  • Date: Thu, 25 Feb 2016 19:22:23 -0600

Hi David,

I have used DDT to debug Charm++ applications before, but not specifically its specialized memory access error checking.

To simplify things, I would see if the error reproduces without the 'persistent' and 'smp' build options. That cuts out chunks of the runtime that I don't think are of interest here. Based on what I saw in working with some other folks at LANL, the same error should still reproduce on that simpler target.

As I understand, DDT can attach to the binary and provide a great deal of information even without its special malloc implementation. Just to see what can be learned, can you run the application to the point of the error with DDT attached when built without "-L$DDTROOT/lib/64 -ldmallocthcxx -Wl,--allow-multiple-definition"?

To try to get the special malloc into the picture, I'll suggest linking the application with the additional flag "-memory os". This tells the charmc compiler/linker toolchain wrapper to link in its no-op memory allocator bindings that refer directly to the underlying malloc implementation. When doing this, I would suggest trying out a very simple application code, such as our test "simplearrayhello", and see if DDT can run on that properly. If a clean run goes through fine, I would then introduce an intentional memory access error in that example code, and ensure that DDT handles it as expected. Finally, if that works, then try the same with your real application.

If none of this helps, we may need to consult with Allinea to see about developing a direct binding from our runtime to their instrumented malloc library. This should not be too hard, but would require some technical documentation or assistance in understanding what their instrumented library expects.

Phil

On Thu, Feb 25, 2016 at 7:00 PM, Gunter, David O <dog AT lanl.gov> wrote:
Has anyone ever used Allinea’s DDT or the Totalview debugger to examine memory issues in the charm++ package?

We have a code that hits a memory error 40 iterations into a calculation and we cannot figure out what is causing it.  To make matters worse, this is on a Cray XC system where we are building things statically.

Each of those debuggers requires one to link codes to instrumented malloc libraries before running the code. I build charm++ like this (DDT example):

charm++ gni-crayxc persistent smp -j8 --no-build-shared -L$DDTROOT/lib/64 -ldmallocthcxx -Wl,--allow-multiple-definition -g -O0

and that seems to build a correct charm++ environment.  However, the code that is then built with charm++ will launch but hang at the first call to new (malloc), so we cannot even advance to the point where the real bug occurs.

I’m just curious if anyone else as ever done anything like this and if so, how?

Thanks,
david
--
David Gunter
HPC-5: Applications Readiness Team








Archive powered by MHonArc 2.6.16.

Top of Page