Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Gengbin Zheng <gzheng AT illinois.edu>
  • To: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, "cosmology-ppl AT cs.uiuc.edu" <cosmology-ppl AT cs.uiuc.edu>, Padma Raghavan <raghavan AT cse.psu.edu>, "Jetley, Pritish" <pjetley2 AT illinois.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Tue, 27 Mar 2012 16:08:13 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Charm development version has some bug and is being fixed right now.
Please check out last night's version (from
http://charm.cs.illinois.edu/autobuild/cur/),
or checkout a version slightly older:

git checkout 31523d896e844b811f27b179eff32187925d4524

Gengbin

On Tue, Mar 27, 2012 at 3:43 PM, Shad Kirmani
<sxk5292 AT cse.psu.edu>
wrote:
> I got the latest version of utility from git clone
> git://charm.cs.uiuc.edu/cosmo/utility but the build failed.
>
> Thanks,
> Shad
>
>
> On Tue, Mar 27, 2012 at 1:52 PM, Pritish Jetley
> <pjetley2 AT illinois.edu>
> wrote:
>>
>> Shad, please download the development version of ChaNGa:
>>
>> git clone git://charm.cs.uiuc.edu/cosmo/changa
>>
>> Pritish
>>
>>
>> On Tue, Mar 27, 2012 at 12:39 PM, Shad Kirmani
>> <sxk5292 AT cse.psu.edu>
>> wrote:
>>>
>>> Hello Phil,
>>>
>>> I downloaded Charm++ 6.4.0. Compiled it with
>>> ./build ChaNGa net-linux-x86_64 ibverbs -O3
>>>
>>> I downloaded the latest ChaNGa code but the ChaNGa code is not compiling
>>> when I do a 'make'. This is the error that I get when I do a 'make' on
>>> ChaNGa:
>>>
>>> ***********************************************************************************
>>> DECAPOLE              -I..  -I..   -c -o MultistepLB.o MultistepLB.C
>>> MultistepLB.C: In member function ‘void
>>> MultistepLB::mergeInstrumentedData(int, BaseLB::LDStats*)’:
>>> MultistepLB.C:373:55: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C:378:43: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C:378:100: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C: In member function ‘void
>>> MultistepLB::printData(BaseLB::LDStats&, int, int*)’:
>>> MultistepLB.C:401:50: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C: In member function ‘void
>>> MultistepLB::work(BaseLB::LDStats*, int)’:
>>> MultistepLB.C:483:25: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C:483:69: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C:493:25: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> MultistepLB.C:493:75: error: ‘struct LDObjData’ has no member named
>>> ‘cpuTime’
>>> Fatal Error by charmc in directory
>>> /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
>>>    Command g++ -m64 -DCMK_GFORTRAN
>>> -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
>>> -D__CHARMC__=1 -I../structures -DINTERLIST_VER=2 -DHEXADECAPOLE -I.. -I..
>>> -O3 -fno-stack-protector -c MultistepLB.C -o MultistepLB.o returned error
>>> code 1
>>> charmc exiting...
>>> make: *** [MultistepLB.o] Error 1
>>>
>>> *****************************************************************************************************
>>>
>>> Thanks,
>>> Shad
>>>
>>>
>>> On Tue, Mar 27, 2012 at 12:05 PM, Phil Miller
>>> <mille121 AT illinois.edu>
>>> wrote:
>>>>
>>>> Could you try using the much more recently release Charm++ 6.4.0,
>>>> <http://charm.cs.illinois.edu/distrib/charm-6.4.0_src.tar.bz2>?. Many
>>>> bugs have been fixed since 6.2, and one of them may be affecting your
>>>> usage.
>>>>
>>>> On Tue, Mar 27, 2012 at 10:51, Shad Kirmani
>>>> <sxk5292 AT cse.psu.edu>
>>>> wrote:
>>>> > Hello Pritish,
>>>> >
>>>> > I compiled charm++ (Cham-6.2) with
>>>> > ./build ChaNGa net-linux-x86_64 ibverbs -O3
>>>> >
>>>> > and then did a 'make' on charm-6.2/tests/charm++/megatest.
>>>> >
>>>> > I then ran the executable pgm on 64 cores. It agains hangs at the same
>>>> > place:
>>>> > Charmrun> Waiting for 62-th client to connect.
>>>> > Charmrun> Waiting for 63-th client to connect.
>>>> > Charmrun> All clients connected.
>>>> > Charmrun> IP tables sent.
>>>> > Charmrun> node programs all connected
>>>> >
>>>> > If you are ready to wait long enough the code sometimes does progress
>>>> > and
>>>> > you get the following results:
>>>> > Megatest is running on 64 nodes 64 processors.
>>>> > test 0: initiated [inlineem (phil)]
>>>> > test 0: completed (0.01 sec)
>>>> > test 1: initiated [callback (olawlor)]
>>>> > test 1: completed (3.98 sec)
>>>> > test 2: initiated [immediatering (gengbin)]
>>>> > ....
>>>> > test 48: initiated [multi nodering (milind)]
>>>> > test 48: completed (0.02 sec)
>>>> > test 49: initiated [multi groupring (milind)]
>>>> > test 49: completed (0.02 sec)
>>>> > test 50: initiated [all-at-once]
>>>> > test 50: completed (0.26 sec)
>>>> > All tests completed, exiting
>>>> > Charmrun> Graceful exit.
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Shad
>>>> >
>>>> > On Mon, Mar 26, 2012 at 4:43 PM, Pritish Jetley
>>>> > <pjetley2 AT illinois.edu>
>>>> > wrote:
>>>> >>
>>>> >> Try "megatest" first. You'll find this suite of tests in:
>>>> >> tests/charm++/megatest
>>>> >>
>>>> >> Pritish
>>>> >>
>>>> >> On Mon, Mar 26, 2012 at 3:30 PM, Shad Kirmani
>>>> >> <sxk5292 AT cse.psu.edu>
>>>> >> wrote:
>>>> >> > Hello Pritish,
>>>> >> >
>>>> >> > No I have not. I can try running the barnes code on this
>>>> >> > architecture.
>>>> >> > Or do
>>>> >> > you suggest running something more simpler? As you can see the
>>>> >> > output
>>>> >> > below,
>>>> >> > Charmrun hangs even before it enters the ChaNGa code, I do not
>>>> >> > think
>>>> >> > this is
>>>> >> > a code issue.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Shad
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Mar 26, 2012 at 1:58 PM, Pritish Jetley
>>>> >> > <pjetley2 AT illinois.edu>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Have you successfully run any other Charm++ programs on this
>>>> >> >> architecture?
>>>> >> >>
>>>> >> >> Pritish
>>>> >> >>
>>>> >> >> On Mon, Mar 26, 2012 at 12:22 PM, Shad Kirmani
>>>> >> >> <sxk5292 AT cse.psu.edu>
>>>> >> >> wrote:
>>>> >> >> > Hello,
>>>> >> >> >
>>>> >> >> > Sometimes at startup of ChaNGa compiled for ibverbs, the
>>>> >> >> > processes
>>>> >> >> > will
>>>> >> >> > hang
>>>> >> >> > for a long period of time at the beginning of the job.  A
>>>> >> >> > backtrace
>>>> >> >> > of a
>>>> >> >> > process looks like this:
>>>> >> >> >
>>>> >> >> > #0  0x00000038daa0b795 in pthread_spin_lock () from
>>>> >> >> > /lib64/libpthread.so.0
>>>> >> >> > #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
>>>> >> >> >   from /usr/lib64/libmlx4-rdmav2.so
>>>> >> >> > #2  0x000000000061add0 in recvBarrierMessage ()
>>>> >> >> > #3  0x000000000061b882 in CmiBarrier ()
>>>> >> >> > #4  0x00000000006206ec in CmiTimerInit ()
>>>> >> >> > #5  0x00000000006216ec in ConverseCommonInit ()
>>>> >> >> > #6  0x000000000061d723 in ConverseInit ()
>>>> >> >> > #7  0x00000000005afd4c in main ()
>>>> >> >> >
>>>> >> >> > With the verbose flag added to charmrun, the hang occurs right
>>>> >> >> > after
>>>> >> >> > it
>>>> >> >> > says
>>>> >> >> > that all nodes are connected:
>>>> >> >> >
>>>> >> >> > ...
>>>> >> >> > Charmrun> Waiting for 62-th client to connect.
>>>> >> >> > Charmrun> Waiting for 63-th client to connect.
>>>> >> >> > Charmrun> All clients connected.
>>>> >> >> > Charmrun> IP tables sent.
>>>> >> >> > Charmrun> node programs all connected
>>>> >> >> >
>>>> >> >> > We did not see these hangs when ChaNGa was compiled for
>>>> >> >> > MPI-linux-x86_64
>>>> >> >> > instead of net-linux-x86_64 with ibverbs.  When the hang occurs,
>>>> >> >> > it
>>>> >> >> > can
>>>> >> >> > either go away after a period of time and the job runs or it
>>>> >> >> > just
>>>> >> >> > hangs
>>>> >> >> > long
>>>> >> >> > enough that we give up and kill it.
>>>> >> >> >
>>>> >> >> > This is on a RedHat Enterprise Linux 5 system using
>>>> >> >> > libibverbs-1.1.3-2.
>>>> >> >> >
>>>> >> >> > Thanks,
>>>> >> >> > Shad
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > _______________________________________________
>>>> >> >> > charm mailing list
>>>> >> >> > charm AT cs.uiuc.edu
>>>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>> >> >> >
>>>> >> >> > _______________________________________________
>>>> >> >> > ppl mailing list
>>>> >> >> > ppl AT cs.uiuc.edu
>>>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Pritish Jetley
>>>> >> >> Doctoral Candidate, Computer Science
>>>> >> >> University of Illinois at Urbana-Champaign
>>>> >> >
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Pritish Jetley
>>>> >> Doctoral Candidate, Computer Science
>>>> >> University of Illinois at Urbana-Champaign
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > charm mailing list
>>>> > charm AT cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>> >
>>>> > _______________________________________________
>>>> > ppl mailing list
>>>> > ppl AT cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>>> >
>>>
>>>
>>
>>
>>
>> --
>> Pritish Jetley
>> Doctoral Candidate, Computer Science
>> University of Illinois at Urbana-Champaign
>
>





Archive powered by MHonArc 2.6.16.

Top of Page