Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Pritish Jetley <pjetley2 AT illinois.edu>
  • To: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • Cc: charm AT cs.uiuc.edu, Jason Holmes <jholmes AT psu.edu>, Padma Raghavan <raghavan AT cse.psu.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Mon, 26 Mar 2012 12:58:54 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Have you successfully run any other Charm++ programs on this architecture?

Pritish

On Mon, Mar 26, 2012 at 12:22 PM, Shad Kirmani
<sxk5292 AT cse.psu.edu>
wrote:
> Hello,
>
> Sometimes at startup of ChaNGa compiled for ibverbs, the processes will hang
> for a long period of time at the beginning of the job.  A backtrace of a
> process looks like this:
>
> #0  0x00000038daa0b795 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
>   from /usr/lib64/libmlx4-rdmav2.so
> #2  0x000000000061add0 in recvBarrierMessage ()
> #3  0x000000000061b882 in CmiBarrier ()
> #4  0x00000000006206ec in CmiTimerInit ()
> #5  0x00000000006216ec in ConverseCommonInit ()
> #6  0x000000000061d723 in ConverseInit ()
> #7  0x00000000005afd4c in main ()
>
> With the verbose flag added to charmrun, the hang occurs right after it says
> that all nodes are connected:
>
> ...
> Charmrun> Waiting for 62-th client to connect.
> Charmrun> Waiting for 63-th client to connect.
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
>
> We did not see these hangs when ChaNGa was compiled for MPI-linux-x86_64
> instead of net-linux-x86_64 with ibverbs.  When the hang occurs, it can
> either go away after a period of time and the job runs or it just hangs long
> enough that we give up and kill it.
>
> This is on a RedHat Enterprise Linux 5 system using libibverbs-1.1.3-2.
>
> Thanks,
> Shad
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>



--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign





Archive powered by MHonArc 2.6.16.

Top of Page