Skip to Content.
Sympa Menu

charm - [charm] ParFUM Import Library

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] ParFUM Import Library


Chronological Thread 
  • From: <kevin.mueller AT utas.utc.com>
  • To: charm AT lists.cs.illinois.edu
  • Subject: [charm] ParFUM Import Library
  • Date: Fri, 09 Dec 2016 13:49:32 -0600

I'm using a code that imports a surface mesh with ghosts into into ParFUM.
When the call to ParFUM_createComm is performed, the code hangs in the
ParFUM_recreateSharedNodes function between Lines 270 and 289 of src/libs/ck-
libs/ParFUM/import.C. From the GDB backtrace, it appears as though it is hung
the MPI_Barrier on Line 288: MPI_Barrier(MPI_COMM_WORLD);

(gdb) where
#0 0x00007ffff0d99bb0 in __poll_nocancel () at ../sysdeps/unix/syscall-
template.S:81
#1 0x00007ffff1703ffe in MPID_nem_tcp_connpoll () from /local/apps/mpich-
ifort/build/lib/libmpi.so.12
#2 0x00007ffff16eecf7 in MPIDI_CH3I_Progress () from /local/apps/mpich-ifort/
build/lib/libmpi.so.12
#3 0x00007ffff16deef1 in MPID_Iprobe () from /local/apps/mpich-ifort/build/
lib/libmpi.so.12
#4 0x00007ffff16496d4 in PMPI_Iprobe () from /local/apps/mpich-ifort/build/
lib/libmpi.so.12
#5 0x00007ffff3585523 in PumpMsgs () from /local/apps/charm++/charm-6.7.1/
lib_so/libconv-cplus-y.so
#6 0x00007ffff35855ae in CmiNotifyStillIdle () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-cplus-y.so
#7 0x00007ffff3338dde in CcdRaiseCondition () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#8 0x00007ffff33354b5 in CsdScheduleForever () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#9 0x00007ffff333569d in CsdScheduler () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#10 0x00007ffff3585d9a in ConverseInit () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-cplus-y.so
#11 0x00007ffff3f5d6bc in main () from /local/apps/charm++/charm-6.7.1/lib_so/
libckmain.so
#12 0x00007ffff0ccfb15 in __libc_start_main (main=0x401e10 <main@plt>, argc=6,
ubp_av=0x7fffffffd6c8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffd6b8) at libc-start.c:274
#13 0x00000000004021f9 in _start ()


The surface mesh is not very big - on the order of 70000 triangles and all
other parts of the code execute quickly. I have let the code sit for over 12
hours with no progress while it continues to occupy 100% of the CPU.

Is there any advice on how to debug what might be causing the MPI_Barrier to
fail to advance? This is being run as a non-SMP process on a single core.

Thank you in advance and let me know if there is any other information I can
provide.



Archive powered by MHonArc 2.6.19.

Top of Page