Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • To: Pritish Jetley <pjetley2 AT illinois.edu>
  • Cc: Phil Miller <mille121 AT illinois.edu>, charm AT cs.uiuc.edu, cosmology-ppl AT cs.uiuc.edu, Padma Raghavan <raghavan AT cse.psu.edu>, Jason Holmes <jholmes AT psu.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Tue, 27 Mar 2012 18:09:43 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello Pritish,

Here are the build errors on ChaNGa (charm++ 6.4.0 from git repository:

******************************************************************************************************************
[sxk5292@cyberstar changa]$ make
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE       MultistepLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi: /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned error code 1
charmc exiting...
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE       Orb3dLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi: /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned error code 1
charmc exiting...
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE -DCOOLING_NONE Orb3dLB.ci returned error code 2
charmc exiting...
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE       ParallelGravity.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi: /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned error code 1
charmc exiting...
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE -DCOOLING_NONE ParallelGravity.ci returned error code 2
charmc exiting...
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -O3 -I../utility/structures     -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE       -I.. -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable  -I..   -c -o DataManager.o DataManager.cpp
In file included from DataManager.cpp:6:
ParallelGravity.h:106:82: error: MultistepLB.decl.h: No such file or directory
ParallelGravity.h:107:74: error: Orb3dLB.decl.h: No such file or directory
In file included from InOutput.h:5,
                 from ParallelGravity.h:111,
                 from DataManager.cpp:6:
DataManager.h:14:34: error: ParallelGravity.decl.h: No such file or directory
In file included from DataManager.cpp:8:
Reductions.h:4:29: error: Reductions.decl.h: No such file or directory
In file included from InOutput.h:5,
                 from ParallelGravity.h:111,
                 from DataManager.cpp:6:
DataManager.h:56: error: expected class-name before ‘{’ token
DataManager.h:63: error: ‘CProxy_TreePiece’ does not name a type
In file included from DataManager.cpp:6:
ParallelGravity.h:115: error: ‘CProxy_Main’ does not name a type
ParallelGravity.h:124: error: ‘CProxy_TreePiece’ does not name a type
ParallelGravity.h:128: error: ‘CProxy_LvArray’ does not name a type
ParallelGravity.h:129: error: ‘CProxy_LvArray’ does not name a type
ParallelGravity.h:130: error: ‘CProxy_LvArray’ does not name a type
ParallelGravity.h:131: error: ‘CProxy_TreePiece’ does not name a type
ParallelGravity.h:132: error: ‘CProxy_DataManager’ does not name a type
ParallelGravity.h:174: error: expected class-name before ‘{’ token
ParallelGravity.h:262: error: expected class-name before ‘{’ token
ParallelGravity.h:272: error: expected class-name before ‘{’ token
ParallelGravity.h:284: error: expected class-name before ‘{’ token
ParallelGravity.h:389: error: expected class-name before ‘{’ token
ParallelGravity.h:392: error: ‘CProxy_Sorter’ does not name a type
In file included from ParallelGravity.h:549,
                 from DataManager.cpp:6:
Compute.h:43: error: ‘TreePiece’ has not been declared
Compute.h:46: error: ‘TreePiece’ has not been declared
Compute.h:68: error: ‘TreePiece’ has not been declared
Compute.h:69: error: ‘TreePiece’ has not been declared
Compute.h:70: error: ‘TreePiece’ has not been declared
Compute.h:71: error: ‘TreePiece’ has not been declared
Compute.h:72: error: ‘TreePiece’ has not been declared
Compute.h:124: error: ‘TreePiece’ has not been declared
Compute.h:125: error: ‘TreePiece’ has not been declared
Compute.h:126: error: ‘TreePiece’ has not been declared
Compute.h:129: error: ‘TreePiece’ has not been declared
Compute.h:130: error: ‘TreePiece’ has not been declared
Compute.h:131: error: ‘TreePiece’ has not been declared
Compute.h:155: error: ‘TreePiece’ has not been declared
Compute.h:158: error: ‘TreePiece’ has not been declared
Compute.h:159: error: ‘TreePiece’ has not been declared
Compute.h:160: error: ‘TreePiece’ has not been declared
Compute.h:163: error: ‘TreePiece’ has not been declared
Compute.h:180: error: ‘TreePiece’ has not been declared
Compute.h:213: error: ‘TreePiece’ has not been declared
Compute.h:217: error: ‘TreePiece’ has not been declared
Compute.h:219: error: ‘TreePiece’ has not been declared
In file included from DataManager.cpp:6:
ParallelGravity.h:556: error: expected class-name before ‘{’ token
ParallelGravity.h:937: error: ‘CProxy_TreePiece’ does not name a type
ParallelGravity.h: In member function ‘int TreePiece::getIndex()’:
ParallelGravity.h:642: error: ‘thisIndex’ was not declared in this scope
ParallelGravity.h: In constructor ‘TreePiece::TreePiece()’:
ParallelGravity.h:1194: error: class ‘TreePiece’ does not have any field named ‘pieces’
ParallelGravity.h:1194: error: ‘thisArrayID’ was not declared in this scope
ParallelGravity.h:1203: error: ‘usesAtSync’ was not declared in this scope
ParallelGravity.h: In constructor ‘TreePiece::TreePiece(CkMigrateMessage*)’:
ParallelGravity.h:1282: error: ‘usesAtSync’ was not declared in this scope
ParallelGravity.h: In destructor ‘TreePiece::~TreePiece()’:
ParallelGravity.h:1325: error: ‘thisIndex’ was not declared in this scope
ParallelGravity.h:1346: error: ‘thisIndex’ was not declared in this scope
In file included from DataManager.cpp:6:
ParallelGravity.h: At global scope:
ParallelGravity.h:1702: error: expected class-name before ‘{’ token
DataManager.cpp: In constructor ‘DataManager::DataManager(const CkArrayID&)’:
DataManager.cpp:25: error: ‘treePieces’ was not declared in this scope
DataManager.cpp:25: error: ‘CProxy_TreePiece’ was not declared in this scope
DataManager.cpp: In constructor ‘DataManager::DataManager(CkMigrateMessage*)’:
DataManager.cpp:28: error: class ‘DataManager’ does not have any field named ‘CBase_DataManager’
DataManager.cpp: In member function ‘void DataManager::acceptResponsibleIndex(const int*, int, const CkCallback&)’:
DataManager.cpp:60: error: ‘contribute’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::acceptFinalKeys(const SFC::Key*, const int*, unsigned int*, int, const CkCallback&)’:
DataManager.cpp:115: error: ‘CkIndex_TreePiece’ has not been declared
DataManager.cpp:115: error: ‘treePieces’ was not declared in this scope
DataManager.cpp:117: error: ‘contribute’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::collectSplitters(CkReductionMsg*)’:
DataManager.cpp:153: error: ‘CkIndex_TreePiece’ was not declared in this scope
DataManager.cpp:153: error: ‘treePieces’ was not declared in this scope
DataManager.cpp:153: error: ‘contribute’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::pup(PUP::er&)’:
DataManager.cpp:161: error: ‘CBase_DataManager’ has not been declared
DataManager.cpp:162: error: ‘treePieces’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::notifyPresence(Tree::GenericTreeNode*)’:
DataManager.cpp:171: error: ‘__nodelock’ was not declared in this scope
DataManager.cpp:182: error: ‘__nodelock’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::combineLocalTrees(CkReductionMsg*)’:
DataManager.cpp:249: error: ‘contribute’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::memoryStats(const CkCallback&)’:
DataManager.cpp:391: error: ‘contribute’ was not declared in this scope
DataManager.cpp: In member function ‘void DataManager::resetReadOnly(Parameters, const CkCallback&)’:
DataManager.cpp:404: error: ‘contribute’ was not declared in this scope
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command g++ -m64 -DCMK_GFORTRAN -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include -D__CHARMC__=1 -I../utility/structures -DINTERLIST_VER=2 -DHEXADECAPOLE -DCOOLING_NONE -I.. -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable -I.. -O3 -fno-stack-protector -c DataManager.cpp -o DataManager.o returned error code 1
charmc exiting...
make: *** [DataManager.o] Error 1
[sxk5292@cyberstar changa]$ ls

******************************************************************************************************************

Thanks,
Shad

On Tue, Mar 27, 2012 at 4:43 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
I got the latest version of utility from git clone git://charm.cs.uiuc.edu/cosmo/utility but the build failed.

Thanks,
Shad


On Tue, Mar 27, 2012 at 1:52 PM, Pritish Jetley <pjetley2 AT illinois.edu> wrote:
Shad, please download the development version of ChaNGa:


Pritish


On Tue, Mar 27, 2012 at 12:39 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
Hello Phil,

I downloaded Charm++ 6.4.0. Compiled it with
./build ChaNGa net-linux-x86_64 ibverbs -O3

I downloaded the latest ChaNGa code but the ChaNGa code is not compiling when I do a 'make'. This is the error that I get when I do a 'make' on ChaNGa:
***********************************************************************************
DECAPOLE              -I..  -I..   -c -o MultistepLB.o MultistepLB.C
MultistepLB.C: In member function ‘void MultistepLB::mergeInstrumentedData(int, BaseLB::LDStats*)’:
MultistepLB.C:373:55: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:378:43: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:378:100: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C: In member function ‘void MultistepLB::printData(BaseLB::LDStats&, int, int*)’:
MultistepLB.C:401:50: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C: In member function ‘void MultistepLB::work(BaseLB::LDStats*, int)’:
MultistepLB.C:483:25: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:483:69: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:493:25: error: ‘struct LDObjData’ has no member named ‘cpuTime’
MultistepLB.C:493:75: error: ‘struct LDObjData’ has no member named ‘cpuTime’
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command g++ -m64 -DCMK_GFORTRAN -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include -D__CHARMC__=1 -I../structures -DINTERLIST_VER=2 -DHEXADECAPOLE -I.. -I.. -O3 -fno-stack-protector -c MultistepLB.C -o MultistepLB.o returned error code 1
charmc exiting...
make: *** [MultistepLB.o] Error 1
*****************************************************************************************************

Thanks,
Shad


On Tue, Mar 27, 2012 at 12:05 PM, Phil Miller <mille121 AT illinois.edu> wrote:
Could you try using the much more recently release Charm++ 6.4.0,
<http://charm.cs.illinois.edu/distrib/charm-6.4.0_src.tar.bz2>?. Many
bugs have been fixed since 6.2, and one of them may be affecting your
usage.

On Tue, Mar 27, 2012 at 10:51, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
> Hello Pritish,
>
> I compiled charm++ (Cham-6.2) with
> ./build ChaNGa net-linux-x86_64 ibverbs -O3
>
> and then did a 'make' on charm-6.2/tests/charm++/megatest.
>
> I then ran the executable pgm on 64 cores. It agains hangs at the same
> place:
> Charmrun> Waiting for 62-th client to connect.
> Charmrun> Waiting for 63-th client to connect.
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
>
> If you are ready to wait long enough the code sometimes does progress and
> you get the following results:
> Megatest is running on 64 nodes 64 processors.
> test 0: initiated [inlineem (phil)]
> test 0: completed (0.01 sec)
> test 1: initiated [callback (olawlor)]
> test 1: completed (3.98 sec)
> test 2: initiated [immediatering (gengbin)]
> ....
> test 48: initiated [multi nodering (milind)]
> test 48: completed (0.02 sec)
> test 49: initiated [multi groupring (milind)]
> test 49: completed (0.02 sec)
> test 50: initiated [all-at-once]
> test 50: completed (0.26 sec)
> All tests completed, exiting
> Charmrun> Graceful exit.
>
>
> Thanks,
> Shad
>
> On Mon, Mar 26, 2012 at 4:43 PM, Pritish Jetley <pjetley2 AT illinois.edu>
> wrote:
>>
>> Try "megatest" first. You'll find this suite of tests in:
>> tests/charm++/megatest
>>
>> Pritish
>>
>> On Mon, Mar 26, 2012 at 3:30 PM, Shad Kirmani <sxk5292 AT cse.psu.edu> wrote:
>> > Hello Pritish,
>> >
>> > No I have not. I can try running the barnes code on this architecture.
>> > Or do
>> > you suggest running something more simpler? As you can see the output
>> > below,
>> > Charmrun hangs even before it enters the ChaNGa code, I do not think
>> > this is
>> > a code issue.
>> >
>> > Thanks,
>> > Shad
>> >
>> >
>> > On Mon, Mar 26, 2012 at 1:58 PM, Pritish Jetley <pjetley2 AT illinois.edu>
>> > wrote:
>> >>
>> >> Have you successfully run any other Charm++ programs on this
>> >> architecture?
>> >>
>> >> Pritish
>> >>
>> >> On Mon, Mar 26, 2012 at 12:22 PM, Shad Kirmani <sxk5292 AT cse.psu.edu>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > Sometimes at startup of ChaNGa compiled for ibverbs, the processes
>> >> > will
>> >> > hang
>> >> > for a long period of time at the beginning of the job.  A backtrace
>> >> > of a
>> >> > process looks like this:
>> >> >
>> >> > #0  0x00000038daa0b795 in pthread_spin_lock () from
>> >> > /lib64/libpthread.so.0
>> >> > #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
>> >> >   from /usr/lib64/libmlx4-rdmav2.so
>> >> > #2  0x000000000061add0 in recvBarrierMessage ()
>> >> > #3  0x000000000061b882 in CmiBarrier ()
>> >> > #4  0x00000000006206ec in CmiTimerInit ()
>> >> > #5  0x00000000006216ec in ConverseCommonInit ()
>> >> > #6  0x000000000061d723 in ConverseInit ()
>> >> > #7  0x00000000005afd4c in main ()
>> >> >
>> >> > With the verbose flag added to charmrun, the hang occurs right after
>> >> > it
>> >> > says
>> >> > that all nodes are connected:
>> >> >
>> >> > ...
>> >> > Charmrun> Waiting for 62-th client to connect.
>> >> > Charmrun> Waiting for 63-th client to connect.
>> >> > Charmrun> All clients connected.
>> >> > Charmrun> IP tables sent.
>> >> > Charmrun> node programs all connected
>> >> >
>> >> > We did not see these hangs when ChaNGa was compiled for
>> >> > MPI-linux-x86_64
>> >> > instead of net-linux-x86_64 with ibverbs.  When the hang occurs, it
>> >> > can
>> >> > either go away after a period of time and the job runs or it just
>> >> > hangs
>> >> > long
>> >> > enough that we give up and kill it.
>> >> >
>> >> > This is on a RedHat Enterprise Linux 5 system using
>> >> > libibverbs-1.1.3-2.
>> >> >
>> >> > Thanks,
>> >> > Shad
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > charm mailing list
>> >> > charm AT cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> >> >
>> >> > _______________________________________________
>> >> > ppl mailing list
>> >> > ppl AT cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Pritish Jetley
>> >> Doctoral Candidate, Computer Science
>> >> University of Illinois at Urbana-Champaign
>> >
>> >
>>
>>
>>
>> --
>> Pritish Jetley
>> Doctoral Candidate, Computer Science
>> University of Illinois at Urbana-Champaign
>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
> _______________________________________________
> ppl mailing list
> ppl AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>




--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign





Archive powered by MHonArc 2.6.16.

Top of Page