Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • To: Pritish Jetley <pjetley2 AT illinois.edu>
  • Cc: Phil Miller <mille121 AT illinois.edu>, charm AT cs.uiuc.edu, cosmology-ppl AT cs.uiuc.edu, Padma Raghavan <raghavan AT cse.psu.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Tue, 27 Mar 2012 16:43:57 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I got the latest version of utility from git clone git://charm.cs.uiuc.edu/cosmo/utility but the build failed.

Thanks,
Shad

On Tue, Mar 27, 2012 at 1:52 PM, Pritish Jetley <pjetley2 AT illinois.edu> wrote:
Shad, please download the development version of ChaNGa:


Pritish


On Tue, Mar 27, 2012 at 12:39 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
Hello Phil,

I downloaded Charm++ 6.4.0. Compiled it with
./build ChaNGa net-linux-x86_64 ibverbs -O3

I downloaded the latest ChaNGa code but the ChaNGa code is not compiling when I do a 'make'. This is the error that I get when I do a 'make' on ChaNGa:
***********************************************************************************
DECAPOLE              -I..  -I..   -c -o MultistepLB.o MultistepLB.C
MultistepLB.C: In member function ‘void MultistepLB::mergeInstrumentedData(int, BaseLB::LDStats*)’:
MultistepLB.C:373:55: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:378:43: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:378:100: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C: In member function ‘void MultistepLB::printData(BaseLB::LDStats&, int, int*)’:
MultistepLB.C:401:50: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C: In member function ‘void MultistepLB::work(BaseLB::LDStats*, int)’:
MultistepLB.C:483:25: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:483:69: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:493:25: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:493:75: error: ‘struct LDObjData’ has no member named ‘cpuTime’
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command g++ -m64 -DCMK_GFORTRAN -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include -D__CHARMC__=1 -I../structures -DINTERLIST_VER=2 -DHEXADECAPOLE -I.. -I.. -O3 -fno-stack-protector -c MultistepLB.C -o MultistepLB.o returned error code 1
charmc exiting...
make: *** [MultistepLB.o] Error 1
*****************************************************************************************************

Thanks,
Shad


On Tue, Mar 27, 2012 at 12:05 PM, Phil Miller <mille121 AT illinois.edu> wrote:
Could you try using the much more recently release Charm++ 6.4.0,
<http://charm.cs.illinois.edu/distrib/charm-6.4.0_src.tar.bz2>?. Many
bugs have been fixed since 6.2, and one of them may be affecting your
usage.

On Tue, Mar 27, 2012 at 10:51, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
> Hello Pritish,
>
> I compiled charm++ (Cham-6.2) with
> ./build ChaNGa net-linux-x86_64 ibverbs -O3
>
> and then did a 'make' on charm-6.2/tests/charm++/megatest.
>
> I then ran the executable pgm on 64 cores. It agains hangs at the same
> place:
> Charmrun> Waiting for 62-th client to connect.
> Charmrun> Waiting for 63-th client to connect.
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
>
> If you are ready to wait long enough the code sometimes does progress and
> you get the following results:
> Megatest is running on 64 nodes 64 processors.
> test 0: initiated [inlineem (phil)]
> test 0: completed (0.01 sec)
> test 1: initiated [callback (olawlor)]
> test 1: completed (3.98 sec)
> test 2: initiated [immediatering (gengbin)]
> ....
> test 48: initiated [multi nodering (milind)]
> test 48: completed (0.02 sec)
> test 49: initiated [multi groupring (milind)]
> test 49: completed (0.02 sec)
> test 50: initiated [all-at-once]
> test 50: completed (0.26 sec)
> All tests completed, exiting
> Charmrun> Graceful exit.
>
>
> Thanks,
> Shad
>
> On Mon, Mar 26, 2012 at 4:43 PM, Pritish Jetley <pjetley2 AT illinois.edu>
> wrote:
>>
>> Try "megatest" first. You'll find this suite of tests in:
>> tests/charm++/megatest
>>
>> Pritish
>>
>> On Mon, Mar 26, 2012 at 3:30 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
>> > Hello Pritish,
>> >
>> > No I have not. I can try running the barnes code on this architecture.
>> > Or do
>> > you suggest running something more simpler? As you can see the output
>> > below,
>> > Charmrun hangs even before it enters the ChaNGa code, I do not think
>> > this is
>> > a code issue.
>> >
>> > Thanks,
>> > Shad
>> >
>> >
>> > On Mon, Mar 26, 2012 at 1:58 PM, Pritish Jetley <pjetley2 AT illinois.edu>
>> > wrote:
>> >>
>> >> Have you successfully run any other Charm++ programs on this
>> >> architecture?
>> >>
>> >> Pritish
>> >>
>> >> On Mon, Mar 26, 2012 at 12:22 PM, Shad Kirmani <sxk5292 AT cse.psu.edu>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > Sometimes at startup of ChaNGa compiled for ibverbs, the processes
>> >> > will
>> >> > hang
>> >> > for a long period of time at the beginning of the job.  A backtrace
>> >> > of a
>> >> > process looks like this:
>> >> >
>> >> > #0  0x00000038daa0b795 in pthread_spin_lock () from
>> >> > /lib64/libpthread.so.0
>> >> > #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
>> >> >   from /usr/lib64/libmlx4-rdmav2.so
>> >> > #2  0x000000000061add0 in recvBarrierMessage ()
>> >> > #3  0x000000000061b882 in CmiBarrier ()
>> >> > #4  0x00000000006206ec in CmiTimerInit ()
>> >> > #5  0x00000000006216ec in ConverseCommonInit ()
>> >> > #6  0x000000000061d723 in ConverseInit ()
>> >> > #7  0x00000000005afd4c in main ()
>> >> >
>> >> > With the verbose flag added to charmrun, the hang occurs right after
>> >> > it
>> >> > says
>> >> > that all nodes are connected:
>> >> >
>> >> > ...
>> >> > Charmrun> Waiting for 62-th client to connect.
>> >> > Charmrun> Waiting for 63-th client to connect.
>> >> > Charmrun> All clients connected.
>> >> > Charmrun> IP tables sent.
>> >> > Charmrun> node programs all connected
>> >> >
>> >> > We did not see these hangs when ChaNGa was compiled for
>> >> > MPI-linux-x86_64
>> >> > instead of net-linux-x86_64 with ibverbs.  When the hang occurs, it
>> >> > can
>> >> > either go away after a period of time and the job runs or it just
>> >> > hangs
>> >> > long
>> >> > enough that we give up and kill it.
>> >> >
>> >> > This is on a RedHat Enterprise Linux 5 system using
>> >> > libibverbs-1.1.3-2.
>> >> >
>> >> > Thanks,
>> >> > Shad
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > charm mailing list
>> >> > charm AT cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> >> >
>> >> > _______________________________________________
>> >> > ppl mailing list
>> >> > ppl AT cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Pritish Jetley
>> >> Doctoral Candidate, Computer Science
>> >> University of Illinois at Urbana-Champaign
>> >
>> >
>>
>>
>>
>> --
>> Pritish Jetley
>> Doctoral Candidate, Computer Science
>> University of Illinois at Urbana-Champaign
>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>




--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign




Archive powered by MHonArc 2.6.16.

Top of Page