Skip to Content.
Sympa Menu

charm - Re: [charm] [cosmology-ppl] [ppl] Fwd: backtrace of ChaNGa process

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [cosmology-ppl] [ppl] Fwd: backtrace of ChaNGa process


Chronological Thread 
  • From: Tom Quinn <trq AT astro.washington.edu>
  • To: Shad Kirmani <sxk5292 AT cse.psu.edu>
  • Cc: Jason Holmes <jholmes AT psu.edu>, Phil Miller <mille121 AT illinois.edu>, charm AT cs.uiuc.edu, Pritish Jetley <pjetley2 AT illinois.edu>, Padma Raghavan <raghavan AT cse.psu.edu>, cosmology-ppl AT cs.uiuc.edu
  • Subject: Re: [charm] [cosmology-ppl] [ppl] Fwd: backtrace of ChaNGa process
  • Date: Wed, 28 Mar 2012 09:16:13 -0700 (PDT)
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

This error is a problem in the Makefile dependencies: do a "make Reductions.o" first, and then it should compile.

I also just committed a fix. Either do a git pull, or wait for tonight's update.

Tom Quinn Astronomy, University of Washington
Internet:
trq AT astro.washington.edu
Phone: 206-685-9009

On Wed, 28 Mar 2012, Shad Kirmani wrote:

Hello,
I am still getting issues with compiling ChaNGa dev version. Does changa dev
version requires the dev version of charm++? Right now, I am using charm++
6.4.0.

***************************************************************************
******
[sxk5292@cyberstar
changa]$ make clean
rm -f core* DataManager.o Reductions.o TreePiece.o Sorter.o param.o
GenericTreeNode.o ParallelGravity.o Ewald.o InOutput.o cosmo.o romberg.o
runge.o dumpframe.o dffuncs.o moments.o MultistepLB.o Orb3dLB.o
Orb3dLB_notopo.o MultistepLB_notopo.o TreeWalk.o Compute.o CacheInterface.o
smooth.o Sph.o starform.o  *~ ChaNGa *.decl.h *.def.h charmrun conv-host 
[sxk5292@cyberstar
changa]$ make
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E    
-DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE      
MultistepLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E    
-DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE       Orb3dLB.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E    
-DINTERLIST_VER=2       -DHEXADECAPOLE     -DCOOLING_NONE      
ParallelGravity.ci
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -O3
-I../utility/structures     -DINTERLIST_VER=2       -DHEXADECAPOLE    
-DCOOLING_NONE       -I..
-I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache-I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/thr
eadsafe_hashtable  -I..   -c -o DataManager.o DataManager.cpp
In file included from DataManager.h:14:0,
                 from InOutput.h:5,
                 from ParallelGravity.h:111,
                 from DataManager.cpp:6:
ParallelGravity.decl.h:4:29: fatal error: Reductions.decl.h: No such file or
directory
compilation terminated.
Fatal Error by charmc in directory
/gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
   Command g++ -m64 -DCMK_GFORTRAN
-I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
-D__CHARMC__=1 -I../utility/structures -DINTERLIST_VER=2 -DHEXADECAPOLE
-DCOOLING_NONE -I..
-I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache-I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/thr
eadsafe_hashtable -I.. -O3 -fno-stack-protector -c DataManager.cpp -o
DataManager.o returned error code 1
charmc exiting...
make: *** [DataManager.o] Error 1
[sxk5292@cyberstar
changa]$ 

***************************************************************************
******

Thanks,
Shad

On Wed, Mar 28, 2012 at 9:05 AM, Jason Holmes
<jholmes AT psu.edu>
wrote:
Hi Shad,

Did you perhaps compile part of this with gcc-4.5.x?  It comes
with a libstdc++ that has versions up to GLIBCXX_3.4.14.  The
libstdc++ that comes with gcc-4.4.2 only has versions up to
GLIBCXX_3.4.13.

Thanks,

--
Jason Holmes

On 03/27/2012 06:44 PM, Pritish Jetley wrote:
Note these errors:

/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
/usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version
`GLIBCXX_3.4.14' not
found (required by
/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

It seems that there is some issue with your standard C++
libraries. I
searched google for the error and got the following links
to posts:

http://stackoverflow.com/questions/5216399/usr-lib-libstdc-so-6-version-gli
bcxx-3-4-15-not-found
http://askubuntu.com/questions/88718/how-to-get-glibcxx-3-4-14


On Tue, Mar 27, 2012 at 5:09 PM, Shad Kirmani

<sxk5292 AT cse.psu.edu
<mailto:sxk5292 AT cse.psu.edu>>
wrote:

   Hello Pritish,

   Here are the build errors on ChaNGa (charm++ 6.4.0 from git
repository:

  **************************************************************************
****************************************
   
[sxk5292@cyberstar
changa]$ make
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   MultistepLB.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version
`GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi
returned
   error code 1
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   Orb3dLB.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version
`GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi
returned
   error code 1
   charmc exiting...

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE
   -DCOOLING_NONE Orb3dLB.ci returned error code 2
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -E
   -DINTERLIST_VER=2       -DHEXADECAPOLE         -DCOOLING_NONE
   ParallelGravity.ci
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi:
   /usr/global/gcc/4.4.2/lib64/libstdc++.so.6: version
`GLIBCXX_3.4.14'
   not found (required by
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi)

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmxi
returned
   error code 1
   charmc exiting...

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command /lib/cpp -P -DINTERLIST_VER=2 -DHEXADECAPOLE
   -DCOOLING_NONE ParallelGravity.ci returned error code 2
   charmc exiting...
   /gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/charmc -O3
   -I../utility/structures     -DINTERLIST_VER=2
   -DHEXADECAPOLE         -DCOOLING_NONE       -I..
 
 -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache
  -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/th
readsafe_hashtable
   -I..   -c -o DataManager.o DataManager.cpp
   In file included from DataManager.cpp:6:
   ParallelGravity.h:106:82: error: MultistepLB.decl.h: No such
file or
   directory
   ParallelGravity.h:107:74: error: Orb3dLB.decl.h: No such file
or
   directory
   In file included from InOutput.h:5,
                     from ParallelGravity.h:111,
                     from DataManager.cpp:6:
   DataManager.h:14:34: error: ParallelGravity.decl.h: No such
file or
   directory
   In file included from DataManager.cpp:8:
   Reductions.h:4:29: error: Reductions.decl.h: No such file or
directory
   In file included from InOutput.h:5,
                     from ParallelGravity.h:111,
                     from DataManager.cpp:6:
   DataManager.h:56: error: expected class-name before ?{? token
   DataManager.h:63: error: ?CProxy_TreePiece? does not name a type
   In file included from DataManager.cpp:6:
   ParallelGravity.h:115: error: ?CProxy_Main? does not name a type
   ParallelGravity.h:124: error: ?CProxy_TreePiece? does not name a
type
   ParallelGravity.h:128: error: ?CProxy_LvArray? does not name a
type
   ParallelGravity.h:129: error: ?CProxy_LvArray? does not name a
type
   ParallelGravity.h:130: error: ?CProxy_LvArray? does not name a
type
   ParallelGravity.h:131: error: ?CProxy_TreePiece? does not name a
type
   ParallelGravity.h:132: error: ?CProxy_DataManager? does not name
a type
   ParallelGravity.h:174: error: expected class-name before ?{?
token
   ParallelGravity.h:262: error: expected class-name before ?{?
token
   ParallelGravity.h:272: error: expected class-name before ?{?
token
   ParallelGravity.h:284: error: expected class-name before ?{?
token
   ParallelGravity.h:389: error: expected class-name before ?{?
token
   ParallelGravity.h:392: error: ?CProxy_Sorter? does not name a
type
   In file included from ParallelGravity.h:549,
                     from DataManager.cpp:6:
   Compute.h:43: error: ?TreePiece? has not been declared
   Compute.h:46: error: ?TreePiece? has not been declared
   Compute.h:68: error: ?TreePiece? has not been declared
   Compute.h:69: error: ?TreePiece? has not been declared
   Compute.h:70: error: ?TreePiece? has not been declared
   Compute.h:71: error: ?TreePiece? has not been declared
   Compute.h:72: error: ?TreePiece? has not been declared
   Compute.h:124: error: ?TreePiece? has not been declared
   Compute.h:125: error: ?TreePiece? has not been declared
   Compute.h:126: error: ?TreePiece? has not been declared
   Compute.h:129: error: ?TreePiece? has not been declared
   Compute.h:130: error: ?TreePiece? has not been declared
   Compute.h:131: error: ?TreePiece? has not been declared
   Compute.h:155: error: ?TreePiece? has not been declared
   Compute.h:158: error: ?TreePiece? has not been declared
   Compute.h:159: error: ?TreePiece? has not been declared
   Compute.h:160: error: ?TreePiece? has not been declared
   Compute.h:163: error: ?TreePiece? has not been declared
   Compute.h:180: error: ?TreePiece? has not been declared
   Compute.h:213: error: ?TreePiece? has not been declared
   Compute.h:217: error: ?TreePiece? has not been declared
   Compute.h:219: error: ?TreePiece? has not been declared
   In file included from DataManager.cpp:6:
   ParallelGravity.h:556: error: expected class-name before ?{?
token
   ParallelGravity.h:937: error: ?CProxy_TreePiece? does not name a
type
   ParallelGravity.h: In member function ?int
TreePiece::getIndex()?:
   ParallelGravity.h:642: error: ?thisIndex? was not declared in
this scope
   ParallelGravity.h: In constructor ?TreePiece::TreePiece()?:
   ParallelGravity.h:1194: error: class ?TreePiece? does not have
any
   field named ?pieces?
   ParallelGravity.h:1194: error: ?thisArrayID? was not declared in
   this scope
   ParallelGravity.h:1203: error: ?usesAtSync? was not declared in
this
   scope
   ParallelGravity.h: In constructor
   ?TreePiece::TreePiece(CkMigrateMessage*)?:
   ParallelGravity.h:1282: error: ?usesAtSync? was not declared in
this
   scope
   ParallelGravity.h: In destructor ?TreePiece::~TreePiece()?:
   ParallelGravity.h:1325: error: ?thisIndex? was not declared in
this
   scope
   ParallelGravity.h:1346: error: ?thisIndex? was not declared in
this
   scope
   In file included from DataManager.cpp:6:
   ParallelGravity.h: At global scope:
   ParallelGravity.h:1702: error: expected class-name before ?{?
token
   DataManager.cpp: In constructor ?DataManager::DataManager(const
   CkArrayID&)?:
   DataManager.cpp:25: error: ?treePieces? was not declared in this
scope
   DataManager.cpp:25: error: ?CProxy_TreePiece? was not declared in
   this scope
   DataManager.cpp: In constructor
   ?DataManager::DataManager(CkMigrateMessage*)?:
   DataManager.cpp:28: error: class ?DataManager? does not have any
   field named ?CBase_DataManager?
   DataManager.cpp: In member function ?void
   DataManager::acceptResponsibleIndex(const int*, int, const
   CkCallback&)?:
   DataManager.cpp:60: error: ?contribute? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::acceptFinalKeys(const SFC::Key*, const int*,
unsigned
   int*, int, const CkCallback&)?:
   DataManager.cpp:115: error: ?CkIndex_TreePiece? has not been
declared
   DataManager.cpp:115: error: ?treePieces? was not declared in this
scope
   DataManager.cpp:117: error: ?contribute? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::collectSplitters(CkReductionMsg*)?:
   DataManager.cpp:153: error: ?CkIndex_TreePiece? was not declared
in
   this scope
   DataManager.cpp:153: error: ?treePieces? was not declared in this
scope
   DataManager.cpp:153: error: ?contribute? was not declared in this
scope
   DataManager.cpp: In member function ?void
DataManager::pup(PUP::er&)?:
   DataManager.cpp:161: error: ?CBase_DataManager? has not been
declared
   DataManager.cpp:162: error: ?treePieces? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::notifyPresence(Tree::GenericTreeNode*)?:
   DataManager.cpp:171: error: ?__nodelock? was not declared in this
scope
   DataManager.cpp:182: error: ?__nodelock? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::combineLocalTrees(CkReductionMsg*)?:
   DataManager.cpp:249: error: ?contribute? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::memoryStats(const CkCallback&)?:
   DataManager.cpp:391: error: ?contribute? was not declared in this
scope
   DataManager.cpp: In member function ?void
   DataManager::resetReadOnly(Parameters, const CkCallback&)?:
   DataManager.cpp:404: error: ?contribute? was not declared in this
scope

   Fatal Error by charmc in directory
   /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
       Command g++ -m64 -DCMK_GFORTRAN
 
 -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
   -D__CHARMC__=1 -I../utility/structures -DINTERLIST_VER=2
   -DHEXADECAPOLE -DCOOLING_NONE -I..
 
 -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache
  -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/tmp/libs/ck-libs/cache/th
readsafe_hashtable
   -I.. -O3 -fno-stack-protector -c DataManager.cpp -o
DataManager.o
   returned error code 1
   charmc exiting...
   make: *** [DataManager.o] Error 1
   
[sxk5292@cyberstar
changa]$ ls

  **************************************************************************
****************************************

   Thanks,
   Shad


   On Tue, Mar 27, 2012 at 4:43 PM, Shad Kirmani
<sxk5292 AT cse.psu.edu
   
<mailto:sxk5292 AT cse.psu.edu>>
wrote:

       I got the latest version of utility from git clone
       git://charm.cs.uiuc.edu/cosmo/utility
       <http://charm.cs.uiuc.edu/cosmo/utility> but the build
failed.

       Thanks,
       Shad


       On Tue, Mar 27, 2012 at 1:52 PM, Pritish Jetley
       
<pjetley2 AT illinois.edu

<mailto:pjetley2 AT illinois.edu>>
wrote:

           Shad, please download the development version of
ChaNGa:

           git clone git://charm.cs.uiuc.edu/cosmo/changa
           <http://charm.cs.uiuc.edu/cosmo/changa>

           Pritish


           On Tue, Mar 27, 2012 at 12:39 PM, Shad Kirmani
           
<sxk5292 AT cse.psu.edu

<mailto:sxk5292 AT cse.psu.edu>>
wrote:

               Hello Phil,

               I downloaded Charm++ 6.4.0. Compiled it with
               ./build ChaNGa net-linux-x86_64 ibverbs -O3

               I downloaded the latest ChaNGa code but the
ChaNGa code
               is not compiling when I do a 'make'. This is the
error
               that I get when I do a 'make' on ChaNGa:
              
**************************************************************************
*********
               DECAPOLE              -I..  -I..   -c -o
MultistepLB.o
               MultistepLB.C
               MultistepLB.C: In member function ?void
               MultistepLB::mergeInstrumentedData(int,
BaseLB::LDStats*)?:
               MultistepLB.C:373:55: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C:378:43: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C:378:100: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C: In member function ?void
               MultistepLB::printData(BaseLB::LDStats&, int,
int*)?:
               MultistepLB.C:401:50: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C: In member function ?void
               MultistepLB::work(BaseLB::LDStats*, int)?:
               MultistepLB.C:483:25: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C:483:69: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C:493:25: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               MultistepLB.C:493:75: error: ?struct LDObjData? has
no
               member named ?cpuTime?
               Fatal Error by charmc in directory
             
 /gpfs/home/sxk5292/group/embedding/ChaNGa-2.0/changa
                   Command g++ -m64 -DCMK_GFORTRAN
             
 -I/gpfs/home/sxk5292/group/embedding/charm-6.4.0/bin/../include
               -D__CHARMC__=1 -I../structures -DINTERLIST_VER=2
               -DHEXADECAPOLE -I.. -I.. -O3 -fno-stack-protector
-c
               MultistepLB.C -o MultistepLB.o returned error
code 1
               charmc exiting...
               make: *** [MultistepLB.o] Error 1
              
**************************************************************************
***************************

               Thanks,
               Shad


               On Tue, Mar 27, 2012 at 12:05 PM, Phil Miller
               
<mille121 AT illinois.edu
<mailto:mille121 AT illinois.edu>>
               wrote:

                   Could you try using the much more recently
release
                   Charm++ 6.4.0,
                 
 <http://charm.cs.illinois.edu/distrib/charm-6.4.0_src.tar.bz2>?.
                   Many
                   bugs have been fixed since 6.2, and one of
them may
                   be affecting your
                   usage.

                   On Tue, Mar 27, 2012 at 10:51, Shad Kirmani
                   
<sxk5292 AT cse.psu.edu
<mailto:sxk5292 AT cse.psu.edu>>

                   wrote:
                    > Hello Pritish,
                    >
                    > I compiled charm++ (Cham-6.2) with
                    > ./build ChaNGa net-linux-x86_64 ibverbs
-O3
                    >
                    > and then did a 'make' on
                   charm-6.2/tests/charm++/megatest.
                    >
                    > I then ran the executable pgm on 64 cores.
It
                   agains hangs at the same
                    > place:
                    > Charmrun> Waiting for 62-th client to
connect.
                    > Charmrun> Waiting for 63-th client to
connect.
                    > Charmrun> All clients connected.
                    > Charmrun> IP tables sent.
                    > Charmrun> node programs all connected
                    >
                    > If you are ready to wait long enough the
code
                   sometimes does progress and
                    > you get the following results:
                    > Megatest is running on 64 nodes 64
processors.
                    > test 0: initiated [inlineem (phil)]
                    > test 0: completed (0.01 sec)
                    > test 1: initiated [callback (olawlor)]
                    > test 1: completed (3.98 sec)
                    > test 2: initiated [immediatering
(gengbin)]
                    > ....
                    > test 48: initiated [multi nodering
(milind)]
                    > test 48: completed (0.02 sec)
                    > test 49: initiated [multi groupring
(milind)]
                    > test 49: completed (0.02 sec)
                    > test 50: initiated [all-at-once]
                    > test 50: completed (0.26 sec)
                    > All tests completed, exiting
                    > Charmrun> Graceful exit.
                    >
                    >
                    > Thanks,
                    > Shad
                    >
                    > On Mon, Mar 26, 2012 at 4:43 PM, Pritish
Jetley
                   
<pjetley2 AT illinois.edu
<mailto:pjetley2 AT illinois.edu>>
                    > wrote:
                    >>
                    >> Try "megatest" first. You'll find this
suite of
                   tests in:
                    >> tests/charm++/megatest
                    >>
                    >> Pritish
                    >>
                    >> On Mon, Mar 26, 2012 at 3:30 PM, Shad
Kirmani
                   
<sxk5292 AT cse.psu.edu
<mailto:sxk5292 AT cse.psu.edu>>
                   wrote:
                    >> > Hello Pritish,
                    >> >
                    >> > No I have not. I can try running the
barnes
                   code on this architecture.
                    >> > Or do
                    >> > you suggest running something more
simpler? As
                   you can see the output
                    >> > below,
                    >> > Charmrun hangs even before it enters
the
                   ChaNGa code, I do not think
                    >> > this is
                    >> > a code issue.
                    >> >
                    >> > Thanks,
                    >> > Shad
                    >> >
                    >> >
                    >> > On Mon, Mar 26, 2012 at 1:58 PM,
Pritish
                   Jetley
<pjetley2 AT illinois.edu
                   
<mailto:pjetley2 AT illinois.edu>>
                    >> > wrote:
                    >> >>
                    >> >> Have you successfully run any other
Charm++
                   programs on this
                    >> >> architecture?
                    >> >>
                    >> >> Pritish
                    >> >>
                    >> >> On Mon, Mar 26, 2012 at 12:22 PM, Shad
                   Kirmani
<sxk5292 AT cse.psu.edu
                   
<mailto:sxk5292 AT cse.psu.edu>>

                    >> >> wrote:
                    >> >> > Hello,
                    >> >> >
                    >> >> > Sometimes at startup of ChaNGa
compiled for
                   ibverbs, the processes
                    >> >> > will
                    >> >> > hang
                    >> >> > for a long period of time at the
beginning
                   of the job.  A backtrace
                    >> >> > of a
                    >> >> > process looks like this:
                    >> >> >
                    >> >> > #0  0x00000038daa0b795 in
pthread_spin_lock
                   () from
                    >> >> > /lib64/libpthread.so.0
                    >> >> > #1  0x00002b93ecee7a7b in
ibv_cmd_create_qp ()
                    >> >> >   from /usr/lib64/libmlx4-rdmav2.so
                    >> >> > #2  0x000000000061add0 in
recvBarrierMessage ()
                    >> >> > #3  0x000000000061b882 in CmiBarrier
()
                    >> >> > #4  0x00000000006206ec in
CmiTimerInit ()
                    >> >> > #5  0x00000000006216ec in
ConverseCommonInit ()
                    >> >> > #6  0x000000000061d723 in
ConverseInit ()
                    >> >> > #7  0x00000000005afd4c in main ()
                    >> >> >
                    >> >> > With the verbose flag added to
charmrun,
                   the hang occurs right after
                    >> >> > it
                    >> >> > says
                    >> >> > that all nodes are connected:
                    >> >> >
                    >> >> > ...
                    >> >> > Charmrun> Waiting for 62-th client
to connect.
                    >> >> > Charmrun> Waiting for 63-th client
to connect.
                    >> >> > Charmrun> All clients connected.
                    >> >> > Charmrun> IP tables sent.
                    >> >> > Charmrun> node programs all
connected
                    >> >> >
                    >> >> > We did not see these hangs when
ChaNGa was
                   compiled for
                    >> >> > MPI-linux-x86_64
                    >> >> > instead of net-linux-x86_64 with
ibverbs.
                     When the hang occurs, it
                    >> >> > can
                    >> >> > either go away after a period of
time and
                   the job runs or it just
                    >> >> > hangs
                    >> >> > long
                    >> >> > enough that we give up and kill it.
                    >> >> >
                    >> >> > This is on a RedHat Enterprise Linux
5
                   system using
                    >> >> > libibverbs-1.1.3-2.
                    >> >> >
                    >> >> > Thanks,
                    >> >> > Shad
                    >> >> >
                    >> >> >
                    >> >> >
_______________________________________________
                    >> >> > charm mailing list
                    >> >> >
charm AT cs.uiuc.edu
<mailto:charm AT cs.uiuc.edu>
                    >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/charm
                    >> >> >
                    >> >> >
_______________________________________________
                    >> >> > ppl mailing list
                    >> >> >
ppl AT cs.uiuc.edu
<mailto:ppl AT cs.uiuc.edu>
                    >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/ppl
                    >> >> >
                    >> >>
                    >> >>
                    >> >>
                    >> >> --
                    >> >> Pritish Jetley
                    >> >> Doctoral Candidate, Computer Science
                    >> >> University of Illinois at
Urbana-Champaign
                    >> >
                    >> >
                    >>
                    >>
                    >>
                    >> --
                    >> Pritish Jetley
                    >> Doctoral Candidate, Computer Science
                    >> University of Illinois at
Urbana-Champaign
                    >
                    >
                    >
                    >
_______________________________________________
                    > charm mailing list
                    >
charm AT cs.uiuc.edu
<mailto:charm AT cs.uiuc.edu>
                    >
http://lists.cs.uiuc.edu/mailman/listinfo/charm
                    >
                    >
_______________________________________________
                    > ppl mailing list
                    >
ppl AT cs.uiuc.edu

<mailto:ppl AT cs.uiuc.edu>
                    >
http://lists.cs.uiuc.edu/mailman/listinfo/ppl
                    >





           --
           Pritish Jetley
           Doctoral Candidate, Computer Science
           University of Illinois at Urbana-Champaign






--
Pritish Jetley
Doctoral Candidate, Computer Science
University of Illinois at Urbana-Champaign







Archive powered by MHonArc 2.6.16.

Top of Page