Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • To: Pritish Jetley <pjetley2 AT illinois.edu>
  • Cc: Phil Miller <mille121 AT illinois.edu>, charm AT cs.uiuc.edu, cosmology-ppl AT cs.uiuc.edu, Jason Holmes <jholmes AT psu.edu>, Padma Raghavan <raghavan AT cse.psu.edu>
  • Subject: Re: [charm] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Wed, 28 Mar 2012 11:31:54 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello,

I am still getting issues with compiling ChaNGa dev version. Does changa dev version requires the dev version of charm++? Right now, I am using charm++ 6.4.0.

*********************************************************************************
[sxk5292@cyberstar changa]$ make clean
rm -f core* DataManager.o Reductions.o TreePiece.o Sorter.o param.o GenericTreeNode.o ParallelGravity.o Ewald.o InOutput.o cosmo.o romberg.o runge.o dumpframe.o dffuncs.o moments.o MultistepLB.o Orb3dLB.o Orb3dLB_notopo.o MultistepLB_notopo.o TreeWalk.o Compute.o CacheInterface.o smooth.o Sph.o starform.o  *~ ChaNGa *.decl.h *.def.h charmrun conv-host 
[sxk5292@cyberstar changa]$ make
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE       MultistepLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE       Orb3dLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E     -DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE       ParallelGravity.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -O3 -I../utility/structures     -DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE       -I.. -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable  -I..   -c -o DataManager.o DataManager.cpp
In file included from DataManager.h:14:0,
                 from InOutput.h:5,
                 from ParallelGravity.h:111,
                 from DataManager.cpp:6:
ParallelGravity.decl.h:4:29: fatal error: Reductions.decl.h: No such file or directory
compilation terminated.
Fatal Error by charmc in directory /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command g++ -m64 -DCMK_GFORTRAN -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include -D__CHARMC__=1 -I../utility/structures -DINTERLIST_VER=2 -DHEXADECAPOLE -DCOOLING_NONE -I.. -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable -I.. -O3 -fno-stack-protector -c DataManager.cpp -o DataManager.o returned error code 1
charmc exiting...
make: *** [DataManager.o] Error 1
[sxk5292@cyberstar changa]$ 

*********************************************************************************

Thanks,
Shad

On Wed, Mar 28, 2012 at 9:05 AM, Jason Holmes <jholmes AT psu.edu> wrote:
Hi Shad,

Did you perhaps compile part of this with gcc-4.5.x?  It comes with a libstdc++ that has versions up to GLIBCXX_3.4.14.  The libstdc++ that comes with gcc-4.4.2 only has versions up to GLIBCXX_3.4.13.

Thanks,

--
Jason Holmes


On 03/27/2012 06:44 PM, Pritish Jetley wrote:
Note these errors:

/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
/usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not
found (required by
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

It seems that there is some issue with your standard C++ libraries. I
searched google for the error and got the following links to posts:

http://stackoverflow.com/questions/5216399/usr-lib-libstdc-so-6-version-glibcxx-3-4-15-not-found
http://askubuntu.com/questions/88718/how-to-get-glibcxx-3-4-14


On Tue, Mar 27, 2012 at 5:09 PM, Shad Kirmani <sxk5292 AT cse.psu.edu
<mailto:sxk5292 AT cse.psu.edu>> wrote:

   Hello Pritish,

   Here are the build errors on ChaNGa (charm++ 6.4.0 from git repository:

   ******************************************************************************************************************
   [sxk5292@cyberstar changa]$ make
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   MultistepLB.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned
   error code 1
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   Orb3dLB.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned
   error code 1
   charmc exiting...

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE
   -DCOOLING_NONE Orb3dLB.ci returned error code 2
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   ParallelGravity.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi returned
   error code 1
   charmc exiting...

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE
   -DCOOLING_NONE ParallelGravity.ci returned error code 2
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -O3
   -I../utility/structures     -DINTERLIST_VER=2
   -DHEXADECAPOLE         -DCOOLING_NONE       -I..
   -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache
   -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable
   -I..   -c -o DataManager.o DataManager.cpp
   In file included from DataManager.cpp:6:
   ParallelGravity.h:106:82: error: MultistepLB.decl.h: No such file or
   directory
   ParallelGravity.h:107:74: error: Orb3dLB.decl.h: No such file or
   directory
   In file included from InOutput.h:5,
                     from ParallelGravity.h:111,
                     from DataManager.cpp:6:
   DataManager.h:14:34: error: ParallelGravity.decl.h: No such file or
   directory
   In file included from DataManager.cpp:8:
   Reductions.h:4:29: error: Reductions.decl.h: No such file or directory
   In file included from InOutput.h:5,
                     from ParallelGravity.h:111,
                     from DataManager.cpp:6:
   DataManager.h:56: error: expected class-name before ‘{’ token
   DataManager.h:63: error: ‘CProxy_TreePiece’ does not name a type
   In file included from DataManager.cpp:6:
   ParallelGravity.h:115: error: ‘CProxy_Main’ does not name a type
   ParallelGravity.h:124: error: ‘CProxy_TreePiece’ does not name a type
   ParallelGravity.h:128: error: ‘CProxy_LvArray’ does not name a type
   ParallelGravity.h:129: error: ‘CProxy_LvArray’ does not name a type
   ParallelGravity.h:130: error: ‘CProxy_LvArray’ does not name a type
   ParallelGravity.h:131: error: ‘CProxy_TreePiece’ does not name a type
   ParallelGravity.h:132: error: ‘CProxy_DataManager’ does not name a type
   ParallelGravity.h:174: error: expected class-name before ‘{’ token
   ParallelGravity.h:262: error: expected class-name before ‘{’ token
   ParallelGravity.h:272: error: expected class-name before ‘{’ token
   ParallelGravity.h:284: error: expected class-name before ‘{’ token
   ParallelGravity.h:389: error: expected class-name before ‘{’ token
   ParallelGravity.h:392: error: ‘CProxy_Sorter’ does not name a type
   In file included from ParallelGravity.h:549,
                     from DataManager.cpp:6:
   Compute.h:43: error: ‘TreePiece’ has not been declared
   Compute.h:46: error: ‘TreePiece’ has not been declared
   Compute.h:68: error: ‘TreePiece’ has not been declared
   Compute.h:69: error: ‘TreePiece’ has not been declared
   Compute.h:70: error: ‘TreePiece’ has not been declared
   Compute.h:71: error: ‘TreePiece’ has not been declared
   Compute.h:72: error: ‘TreePiece’ has not been declared
   Compute.h:124: error: ‘TreePiece’ has not been declared
   Compute.h:125: error: ‘TreePiece’ has not been declared
   Compute.h:126: error: ‘TreePiece’ has not been declared
   Compute.h:129: error: ‘TreePiece’ has not been declared
   Compute.h:130: error: ‘TreePiece’ has not been declared
   Compute.h:131: error: ‘TreePiece’ has not been declared
   Compute.h:155: error: ‘TreePiece’ has not been declared
   Compute.h:158: error: ‘TreePiece’ has not been declared
   Compute.h:159: error: ‘TreePiece’ has not been declared
   Compute.h:160: error: ‘TreePiece’ has not been declared
   Compute.h:163: error: ‘TreePiece’ has not been declared
   Compute.h:180: error: ‘TreePiece’ has not been declared
   Compute.h:213: error: ‘TreePiece’ has not been declared
   Compute.h:217: error: ‘TreePiece’ has not been declared
   Compute.h:219: error: ‘TreePiece’ has not been declared
   In file included from DataManager.cpp:6:
   ParallelGravity.h:556: error: expected class-name before ‘{’ token
   ParallelGravity.h:937: error: ‘CProxy_TreePiece’ does not name a type
   ParallelGravity.h: In member function ‘int TreePiece::getIndex()’:
   ParallelGravity.h:642: error: ‘thisIndex’ was not declared in this scope
   ParallelGravity.h: In constructor ‘TreePiece::TreePiece()’:
   ParallelGravity.h:1194: error: class ‘TreePiece’ does not have any
   field named ‘pieces’
   ParallelGravity.h:1194: error: ‘thisArrayID’ was not declared in
   this scope
   ParallelGravity.h:1203: error: ‘usesAtSync’ was not declared in this
   scope
   ParallelGravity.h: In constructor
   ‘TreePiece::TreePiece(CkMigrateMessage*)’:
   ParallelGravity.h:1282: error: ‘usesAtSync’ was not declared in this
   scope
   ParallelGravity.h: In destructor ‘TreePiece::~TreePiece()’:
   ParallelGravity.h:1325: error: ‘thisIndex’ was not declared in this
   scope
   ParallelGravity.h:1346: error: ‘thisIndex’ was not declared in this
   scope
   In file included from DataManager.cpp:6:
   ParallelGravity.h: At global scope:
   ParallelGravity.h:1702: error: expected class-name before ‘{’ token
   DataManager.cpp: In constructor ‘DataManager::DataManager(const
   CkArrayID&)’:
   DataManager.cpp:25: error: ‘treePieces’ was not declared in this scope
   DataManager.cpp:25: error: ‘CProxy_TreePiece’ was not declared in
   this scope
   DataManager.cpp: In constructor
   ‘DataManager::DataManager(CkMigrateMessage*)’:
   DataManager.cpp:28: error: class ‘DataManager’ does not have any
   field named ‘CBase_DataManager’
   DataManager.cpp: In member function ‘void
   DataManager::acceptResponsibleIndex(const int*, int, const
   CkCallback&)’:
   DataManager.cpp:60: error: ‘contribute’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::acceptFinalKeys(const SFC::Key*, const int*, unsigned
   int*, int, const CkCallback&)’:
   DataManager.cpp:115: error: ‘CkIndex_TreePiece’ has not been declared
   DataManager.cpp:115: error: ‘treePieces’ was not declared in this scope
   DataManager.cpp:117: error: ‘contribute’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::collectSplitters(CkReductionMsg*)’:
   DataManager.cpp:153: error: ‘CkIndex_TreePiece’ was not declared in
   this scope
   DataManager.cpp:153: error: ‘treePieces’ was not declared in this scope
   DataManager.cpp:153: error: ‘contribute’ was not declared in this scope
   DataManager.cpp: In member function ‘void DataManager::pup(PUP::er&)’:
   DataManager.cpp:161: error: ‘CBase_DataManager’ has not been declared
   DataManager.cpp:162: error: ‘treePieces’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::notifyPresence(Tree::GenericTreeNode*)’:
   DataManager.cpp:171: error: ‘__nodelock’ was not declared in this scope
   DataManager.cpp:182: error: ‘__nodelock’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::combineLocalTrees(CkReductionMsg*)’:
   DataManager.cpp:249: error: ‘contribute’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::memoryStats(const CkCallback&)’:
   DataManager.cpp:391: error: ‘contribute’ was not declared in this scope
   DataManager.cpp: In member function ‘void
   DataManager::resetReadOnly(Parameters, const CkCallback&)’:
   DataManager.cpp:404: error: ‘contribute’ was not declared in this scope

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command g++ -m64 -DCMK_GFORTRAN
   -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
   -D__CHARMC__=1 -I../utility/structures -DINTERLIST_VER=2
   -DHEXADECAPOLE -DCOOLING_NONE -I..
   -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache
   -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/threadsafe_hashtable
   -I.. -O3 -fno-stack-protector -c DataManager.cpp -o DataManager.o
   returned error code 1
   charmc exiting...
   make: *** [DataManager.o] Error 1
   [sxk5292@cyberstar changa]$ ls

   ******************************************************************************************************************

   Thanks,
   Shad


   On Tue, Mar 27, 2012 at 4:43 PM, Shad Kirmani <sxk5292 AT cse.psu.edu
   <mailto:sxk5292 AT cse.psu.edu>> wrote:

       I got the latest version of utility from git clone
       git://charm.cs.uiuc.edu/cosmo/utility
       <http://charm.cs.uiuc.edu/cosmo/utility> but the build failed.


       Thanks,
       Shad


       On Tue, Mar 27, 2012 at 1:52 PM, Pritish Jetley
       <pjetley2 AT illinois.edu <mailto:pjetley2 AT illinois.edu>> wrote:

           Shad, please download the development version of ChaNGa:

           git clone git://charm.cs.uiuc.edu/cosmo/changa
           <http://charm.cs.uiuc.edu/cosmo/changa>


           Pritish


           On Tue, Mar 27, 2012 at 12:39 PM, Shad Kirmani
           <sxk5292 AT cse.psu.edu <mailto:sxk5292 AT cse.psu.edu>> wrote:

               Hello Phil,

               I downloaded Charm++ 6.4.0. Compiled it with
               ./build ChaNGa net-linux-x86_64 ibverbs -O3

               I downloaded the latest ChaNGa code but the ChaNGa code
               is not compiling when I do a 'make'. This is the error
               that I get when I do a 'make' on ChaNGa:
               ***********************************************************************************
               DECAPOLE              -I..  -I..   -c -o MultistepLB.o
               MultistepLB.C
               MultistepLB.C: In member function ‘void
               MultistepLB::mergeInstrumentedData(int, BaseLB::LDStats*)’:
               MultistepLB.C:373:55: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C:378:43: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C:378:100: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C: In member function ‘void
               MultistepLB::printData(BaseLB::LDStats&, int, int*)’:
               MultistepLB.C:401:50: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C: In member function ‘void
               MultistepLB::work(BaseLB::LDStats*, int)’:
               MultistepLB.C:483:25: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C:483:69: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C:493:25: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               MultistepLB.C:493:75: error: ‘struct LDObjData’ has no
               member named ‘cpuTime’
               Fatal Error by charmc in directory
               /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
                   Command g++ -m64 -DCMK_GFORTRAN
               -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
               -D__CHARMC__=1 -I../structures -DINTERLIST_VER=2
               -DHEXADECAPOLE -I.. -I.. -O3 -fno-stack-protector -c
               MultistepLB.C -o MultistepLB.o returned error code 1
               charmc exiting...
               make: *** [MultistepLB.o] Error 1
               *****************************************************************************************************

               Thanks,
               Shad


               On Tue, Mar 27, 2012 at 12:05 PM, Phil Miller
               <mille121 AT illinois.edu <mailto:mille121 AT illinois.edu>>

               wrote:

                   Could you try using the much more recently release
                   Charm++ 6.4.0,
                   <http://charm.cs.illinois.edu/distrib/charm-6.4.0_src.tar.bz2>?.
                   Many
                   bugs have been fixed since 6.2, and one of them may
                   be affecting your
                   usage.

                   On Tue, Mar 27, 2012 at 10:51, Shad Kirmani
                   <sxk5292 AT cse.psu.edu <mailto:sxk5292 AT cse.psu.edu>>

                   wrote:
                    > Hello Pritish,
                    >
                    > I compiled charm++ (Cham-6.2) with
                    > ./build ChaNGa net-linux-x86_64 ibverbs -O3
                    >
                    > and then did a 'make' on
                   charm-6.2/tests/charm++/megatest.
                    >
                    > I then ran the executable pgm on 64 cores. It
                   agains hangs at the same
                    > place:
                    > Charmrun> Waiting for 62-th client to connect.
                    > Charmrun> Waiting for 63-th client to connect.
                    > Charmrun> All clients connected.
                    > Charmrun> IP tables sent.
                    > Charmrun> node programs all connected
                    >
                    > If you are ready to wait long enough the code
                   sometimes does progress and
                    > you get the following results:
                    > Megatest is running on 64 nodes 64 processors.
                    > test 0: initiated [inlineem (phil)]
                    > test 0: completed (0.01 sec)
                    > test 1: initiated [callback (olawlor)]
                    > test 1: completed (3.98 sec)
                    > test 2: initiated [immediatering (gengbin)]
                    > ....
                    > test 48: initiated [multi nodering (milind)]
                    > test 48: completed (0.02 sec)
                    > test 49: initiated [multi groupring (milind)]
                    > test 49: completed (0.02 sec)
                    > test 50: initiated [all-at-once]
                    > test 50: completed (0.26 sec)
                    > All tests completed, exiting
                    > Charmrun> Graceful exit.
                    >
                    >
                    > Thanks,
                    > Shad
                    >
                    > On Mon, Mar 26, 2012 at 4:43 PM, Pritish Jetley
                   <pjetley2 AT illinois.edu <mailto:pjetley2 AT illinois.edu>>

                    > wrote:
                    >>
                    >> Try "megatest" first. You'll find this suite of
                   tests in:
                    >> tests/charm++/megatest
                    >>
                    >> Pritish
                    >>
                    >> On Mon, Mar 26, 2012 at 3:30 PM, Shad Kirmani
                   <sxk5292 AT cse.psu.edu <mailto:sxk5292 AT cse.psu.edu>>

                   wrote:
                    >> > Hello Pritish,
                    >> >
                    >> > No I have not. I can try running the barnes
                   code on this architecture.
                    >> > Or do
                    >> > you suggest running something more simpler? As
                   you can see the output
                    >> > below,
                    >> > Charmrun hangs even before it enters the
                   ChaNGa code, I do not think
                    >> > this is
                    >> > a code issue.
                    >> >
                    >> > Thanks,
                    >> > Shad
                    >> >
                    >> >
                    >> > On Mon, Mar 26, 2012 at 1:58 PM, Pritish
                   Jetley <pjetley2 AT illinois.edu
                   <mailto:pjetley2 AT illinois.edu>>

                    >> > wrote:
                    >> >>
                    >> >> Have you successfully run any other Charm++
                   programs on this
                    >> >> architecture?
                    >> >>
                    >> >> Pritish
                    >> >>
                    >> >> On Mon, Mar 26, 2012 at 12:22 PM, Shad
                   Kirmani <sxk5292 AT cse.psu.edu
                   <mailto:sxk5292 AT cse.psu.edu>>

                    >> >> wrote:
                    >> >> > Hello,
                    >> >> >
                    >> >> > Sometimes at startup of ChaNGa compiled for
                   ibverbs, the processes
                    >> >> > will
                    >> >> > hang
                    >> >> > for a long period of time at the beginning
                   of the job.  A backtrace
                    >> >> > of a
                    >> >> > process looks like this:
                    >> >> >
                    >> >> > #0  0x00000038daa0b795 in pthread_spin_lock
                   () from
                    >> >> > /lib64/libpthread.so.0
                    >> >> > #1  0x00002b93ecee7a7b in ibv_cmd_create_qp ()
                    >> >> >   from /usr/lib64/libmlx4-rdmav2.so
                    >> >> > #2  0x000000000061add0 in recvBarrierMessage ()
                    >> >> > #3  0x000000000061b882 in CmiBarrier ()
                    >> >> > #4  0x00000000006206ec in CmiTimerInit ()
                    >> >> > #5  0x00000000006216ec in ConverseCommonInit ()
                    >> >> > #6  0x000000000061d723 in ConverseInit ()
                    >> >> > #7  0x00000000005afd4c in main ()
                    >> >> >
                    >> >> > With the verbose flag added to charmrun,
                   the hang occurs right after
                    >> >> > it
                    >> >> > says
                    >> >> > that all nodes are connected:
                    >> >> >
                    >> >> > ...
                    >> >> > Charmrun> Waiting for 62-th client to connect.
                    >> >> > Charmrun> Waiting for 63-th client to connect.
                    >> >> > Charmrun> All clients connected.
                    >> >> > Charmrun> IP tables sent.
                    >> >> > Charmrun> node programs all connected
                    >> >> >
                    >> >> > We did not see these hangs when ChaNGa was
                   compiled for
                    >> >> > MPI-linux-x86_64
                    >> >> > instead of net-linux-x86_64 with ibverbs.
                     When the hang occurs, it
                    >> >> > can
                    >> >> > either go away after a period of time and
                   the job runs or it just
                    >> >> > hangs
                    >> >> > long
                    >> >> > enough that we give up and kill it.
                    >> >> >
                    >> >> > This is on a RedHat Enterprise Linux 5
                   system using
                    >> >> > libibverbs-1.1.3-2.
                    >> >> >
                    >> >> > Thanks,
                    >> >> > Shad
                    >> >> >
                    >> >> >
                    >> >> > _______________________________________________
                    >> >> > charm mailing list
                    >> >> > charm AT cs.uiuc.edu <mailto:charm AT cs.uiuc.edu>

                    >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/charm
                    >> >> >
                    >> >> > _______________________________________________
                    >> >> > ppl mailing list
                    >> >> > ppl AT cs.uiuc.edu <mailto:ppl AT cs.uiuc.edu>

                    >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
                    >> >> >
                    >> >>
                    >> >>
                    >> >>
                    >> >> --
                    >> >> Pritish Jetley
                    >> >> Doctoral Candidate, Computer Science
                    >> >> University of Illinois at Urbana-Champaign
                    >> >
                    >> >
                    >>
                    >>
                    >>
                    >> --
                    >> Pritish Jetley
                    >> Doctoral Candidate, Computer Science
                    >> University of Illinois at Urbana-Champaign
                    >
                    >
                    >
                    > _______________________________________________
                    > charm mailing list
                    > charm AT cs.uiuc.edu <mailto:charm AT cs.uiuc.edu>

                    > http://lists.cs.uiuc.edu/mailman/listinfo/charm
                    >
                    > _______________________________________________
                    > ppl mailing list
                    > ppl AT cs.uiuc.edu <mailto:ppl AT cs.uiuc.edu>

                    > http://lists.cs.uiuc.edu/mailman/listinfo/ppl
                    >





           --
           Pritish Jetley
           Doctoral Candidate, Computer Science
           University of Illinois at Urbana-Champaign






--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign





Archive powered by MHonArc 2.6.16.

Top of Page