Skip to Content.
Sympa Menu

charm - Re: [charm] ParFUM Import Library

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] ParFUM Import Library


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: kevin.mueller AT utas.utc.com
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] ParFUM Import Library
  • Date: Fri, 9 Dec 2016 14:09:31 -0600

Hi Kevin,

Just to make sure I understand the diagnosis correctly before diving in, your program produces the print on line 270, but not the print on line 290?

When you run the code, do you pass any +vp N option?

Phil

On Fri, Dec 9, 2016 at 1:49 PM, <kevin.mueller AT utas.utc.com> wrote:
I'm using a code that imports a surface mesh with ghosts into into ParFUM.
When the call to ParFUM_createComm is performed, the code hangs in the
ParFUM_recreateSharedNodes function between Lines 270 and 289 of src/libs/ck-
libs/ParFUM/import.C. From the GDB backtrace, it appears as though it is hung
the MPI_Barrier on Line 288: MPI_Barrier(MPI_COMM_WORLD);

(gdb) where
#0  0x00007ffff0d99bb0 in __poll_nocancel () at ../sysdeps/unix/syscall-
template.S:81
#1  0x00007ffff1703ffe in MPID_nem_tcp_connpoll () from /local/apps/mpich-
ifort/build/lib/libmpi.so.12
#2  0x00007ffff16eecf7 in MPIDI_CH3I_Progress () from /local/apps/mpich-ifort/
build/lib/libmpi.so.12
#3  0x00007ffff16deef1 in MPID_Iprobe () from /local/apps/mpich-ifort/build/
lib/libmpi.so.12
#4  0x00007ffff16496d4 in PMPI_Iprobe () from /local/apps/mpich-ifort/build/
lib/libmpi.so.12
#5  0x00007ffff3585523 in PumpMsgs () from /local/apps/charm++/charm-6.7.1/
lib_so/libconv-cplus-y.so
#6  0x00007ffff35855ae in CmiNotifyStillIdle () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-cplus-y.so
#7  0x00007ffff3338dde in CcdRaiseCondition () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#8  0x00007ffff33354b5 in CsdScheduleForever () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#9  0x00007ffff333569d in CsdScheduler () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-core.so
#10 0x00007ffff3585d9a in ConverseInit () from /local/apps/charm++/
charm-6.7.1/lib_so/libconv-cplus-y.so
#11 0x00007ffff3f5d6bc in main () from /local/apps/charm++/charm-6.7.1/lib_so/
libckmain.so
#12 0x00007ffff0ccfb15 in __libc_start_main (main=0x401e10 <main@plt>, argc=6,
ubp_av=0x7fffffffd6c8, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7fffffffd6b8) at libc-start.c:274
#13 0x00000000004021f9 in _start ()


The surface mesh is not very big - on the order of 70000 triangles and all
other parts of the code execute quickly. I have let the code sit for over 12
hours with no progress while it continues to occupy 100% of the CPU.

Is there any advice on how to debug what might be causing the MPI_Barrier to
fail to advance? This is being run as a non-SMP process on a single core.

Thank you in advance and let me know if there is any other information I can
provide.




Archive powered by MHonArc 2.6.19.

Top of Page