Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • To: Pritish Jetley <pjetley2 AT illinois.edu>
  • Cc: charm AT cs.uiuc.edu, Jason Holmes <jholmes AT psu.edu>, Padma Raghavan <raghavan AT cse.psu.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Mon, 26 Mar 2012 16:30:13 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello Pritish,

No I have not. I can try running the barnes code on this architecture. Or do you suggest running something more simpler? As you can see the output below, Charmrun hangs even before it enters the ChaNGa code, I do not think this is a code issue.

Thanks,
Shad

On Mon, Mar 26, 2012 at 1:58 PM, Pritish Jetley <pjetley2 AT illinois.edu> wrote:
Have you successfully run any other Charm++ programs on this architecture?

Pritish

On Mon, Mar 26, 2012 at 12:22 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
> Hello,
>
> Sometimes at startup of ChaNGa compiled for ibverbs, the processes will hang
> for a long period of time at the beginning of the job.  A backtrace of a
> process looks like this:
>
> #0  0x00000038daa0b795 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
>   from /usr/lib64/libmlx4-rdmav2.so
> #2  0x000000000061add0 in recvBarrierMessage ()
> #3  0x000000000061b882 in CmiBarrier ()
> #4  0x00000000006206ec in CmiTimerInit ()
> #5  0x00000000006216ec in ConverseCommonInit ()
> #6  0x000000000061d723 in ConverseInit ()
> #7  0x00000000005afd4c in main ()
>
> With the verbose flag added to charmrun, the hang occurs right after it says
> that all nodes are connected:
>
> ...
> Charmrun> Waiting for 62-th client to connect.
> Charmrun> Waiting for 63-th client to connect.
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
>
> We did not see these hangs when ChaNGa was compiled for MPI-linux-x86_64
> instead of net-linux-x86_64 with ibverbs.  When the hang occurs, it can
> either go away after a period of time and the job runs or it just hangs long
> enough that we give up and kill it.
>
> This is on a RedHat Enterprise Linux 5 system using libibverbs-1.1.3-2.
>
> Thanks,
> Shad
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>



--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign




Archive powered by MHonArc 2.6.16.

Top of Page