Skip to Content.
Sympa Menu

charm - Re: [charm] NAMD segfaulting in "setJcontext "

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] NAMD segfaulting in "setJcontext "


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Ted Packwood <malice AT cray.com>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] NAMD segfaulting in "setJcontext "
  • Date: Wed, 31 Aug 2016 14:54:08 -0500

On Wed, Aug 31, 2016 at 1:54 PM, Ted Packwood <malice AT cray.com> wrote:
Hello-

I'm trying to determine what is causing a failure in NAMD when built with
the Cray compiler on a Cray XC30.

The failure is in "
setJcontext" as you can see from the traceback below.

The charm++ build works fine with the include charm++ test  "jacobi3d"
(4ranks on 4 seperate Broadwell chips, +ppn6)


I built charm++ with:
./build charm++ mpi-crayxc craycc smp

NAMD was built with:
./config CRAY-XC-cce --charm-arch mpi-crayxc.cce-smp-craycc --with-fftw3 --without-tcl --charm-opts -save

And was run with just one rank on a Broadwell chip.


The intel compiler build of charm++ and NAMD works fine, so this appears to
be an issue with the Cray compiler.


I have a few questions:
1) Does anyone have an idea of what might cause this type of failure?
2) Any suggestions as to a possible solution, or build changes that might

solve the problem?


You could try the other user-level thread packages we include with Charm++, to see if one of them compiles and runs correctly. The options to try besides jcontext are 'context' and 'qt'. At link time, they are specified by "-thread FOO" on the charmc command line.
 
3) Is there a simple charm++ test which mimics the Jcontext usage that
NAMD requires that might cause a similar failure?  I'd prefer to try to reproduce
this with a smaller test than NAMD.  :)

Tests that at least touch the user-level thread code include
tests/charm++/megatest
tests/converse/cthtest
tests/converse/megacon
 
4) If not, should I contact the NAMD folks instead?

No, this is the right place for issues at this level.

If none of the thread packages work under Cray's compiler, I'd refer that to the compiler team there, since there's not much we'll be able to do about fixing them. We've had no luck in getting Charm++ to build and pass its tests in full against the Cray CCE.
 



Core was generated by `./namd2.XC30.IVB.kay.PE604.cce853-g-O0-flex_mp-strict.mpich743.libsci16091.fftw'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00000000212c43eb in raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
37      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
(gdb) where
#0  0x00000000212c43eb in raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x0000000021417dc5 in abort () at abort.c:99
#2  0x0000000021173c72 in MPID_Abort ()
#3  0x0000000021123a75 in PMPI_Abort ()
#4  0x0000000020e75cd5 in LrtsAbort () at machine.c:1656
#5  <signal handler called>
#6  0x0000000020e71089 in setJcontext () at uJcontext.c:131
#7  0x0000000020e71100 in swapJcontext () at uJcontext.c:176
#8  0x0000000020e713b5 in CthResume () at libthreads-default.c:1669
#9  0x0000000020e78388 in CsdScheduleForever () at convcore.c:1901
#10 0x0000000020e78299 in CsdScheduler () at convcore.c:1837
#11 0x00000000200dd891 in BackEnd::suspend () at src/BackEnd.C:285
#12 0x0000000020b6e650 in ScriptTcl::suspend (this=0x41bcb1b0)
    at src/ScriptTcl.C:72
#13 0x0000000020b6e6ff in ScriptTcl::initcheck (this=0x41bcb1b0)
    at src/ScriptTcl.C:104
#14 0x0000000020b6e577 in ScriptTcl::run (this=0x41bcb1b0,
    scriptFile=0x7fffffff765e) at src/ScriptTcl.C:2076
#15 0x00000000200d6d49 in after_backend_init (argc=2, argv=0x7fffffff6678)
    at src/mainfunc.C:158
#16 0x00000000200dcff9 in slave_init (argc=2, argv=0x7fffffff6678)
    at src/BackEnd.C:140
#17 0x0000000020e745ec in ConverseRunPE$$CFE_id_d7e6ac3e_9d711be8 ()
    at machine-common-core.c:1293
#18 0x0000000020e71b9a in call_startfn$$CFE_id_d7e6ac3e_9d711be8 ()
    at machine-smp.c:415
#19 0x0000000020eeda44 in start_thread (arg=0x2aaaaad45700)
    at pthread_create.c:309
#20 0x00000000214756f9 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111





Archive powered by MHonArc 2.6.19.

Top of Page