Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] icc compiler option

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] icc compiler option


Chronological Thread 
  • From: "Bennion, Brian" <Bennion1 AT llnl.gov>
  • To: Jim Phillips <jim AT ks.uiuc.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] icc compiler option
  • Date: Tue, 6 Sep 2011 16:30:54 -0700
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

It was worth a try. I have wanted to test an ibverbs build for a long time.

Can any one explain to me what triggers might casue the send receive error
Brian

________________________________________
From: Jim Phillips
[jim AT ks.uiuc.edu]
Sent: Tuesday, September 06, 2011 4:24 PM
To: Bennion, Brian
Cc:
charm AT cs.uiuc.edu
Subject: RE: [ppl] [charm] icc compiler option

It's starting but finding some reason to crash. Maybe just try MPI.

-Jim


On Tue, 6 Sep 2011, Bennion, Brian wrote:

> ldd showed all libraries were linked.
>
> ++verbose was well...verbose the output is below.
> bennion1
> 237:~/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests>
> ../charmrun +p 12 ++verbose ++mpiexec ++remote-shell mympiexec
> /g/g14/bennion1/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/namd2
>
> /g/g14/bennion1/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/test/apoa1.namd
> Charmrun> charmrun started...
> Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 6: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 7: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 8: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 9: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 10: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 11: "127.0.0.1", IP:127.0.0.1
> Charmrun> Charmrun = 192.168.117.53, port = 56501
> Charmrun> IBVERBS version of charmrun
> Charmrun> Sending "$CmiMyNode 192.168.117.53 56501 387 0" to client 0.
> Charmrun> find the node program
> "/g/g14/bennion1/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/namd2"
> at
> "/g/g11/petefred/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests"
> for 0.
> Charmrun> Starting mympiexec ./charmrun.387
> Charmrun> mpiexec started
> Charmrun> node programs all started
> Charmrun> Waiting for 0-th client to connect.
> This is dollar star -n 12 ./charmrun.387
> srun: Job is in held state, pending scheduler release
> srun: job 1000783 queued and waiting for resources
> srun: job 1000783 has been allocated resources
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun> Waiting for 1-th client to connect.
> Charmrun> Waiting for 2-th client to connect.
> Charmrun> Waiting for 3-th client to connect.
> Charmrun> Waiting for 4-th client to connect.
> Charmrun> Waiting for 5-th client to connect.
> Charmrun> Waiting for 6-th client to connect.
> Charmrun> Waiting for 7-th client to connect.
> Charmrun> Waiting for 8-th client to connect.
> Charmrun> Waiting for 9-th client to connect.
> Charmrun> Waiting for 10-th client to connect.
> Charmrun> Waiting for 11-th client to connect.
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
> Charmrun> started all node programs in 39.031 seconds.
> Charmrun: error on request socket--
> Socket closed before recv.
> bennion1
> 238:~/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests>
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
>
> ________________________________________
> From: Jim Phillips
> [jim AT ks.uiuc.edu]
> Sent: Tuesday, September 06, 2011 3:52 PM
> To: Bennion, Brian
> Cc:
> charm AT cs.uiuc.edu
> Subject: RE: [ppl] [charm] icc compiler option
>
> No charmrun_err files? Anything missing when you run ldd?
>
> Try adding ++verbose.
>
> -Jim
>
>
> On Tue, 6 Sep 2011, Bennion, Brian wrote:
>
>> OK. should have seen that one.
>> I get a little farther now....
>>
>> ../charmrun +p 12 ++mpiexec ++remote-shell mympiexec
>> /g/g14/bennion1/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/namd2
>>
>> /g/g14/bennion1/g11Dir/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/test/apoa1.namd
>> Charmrun> IBVERBS version of charmrun
>> This is dollar star -n 12 ./charmrun.32109
>> srun: Job is in held state, pending scheduler release
>> srun: job 1000777 queued and waiting for resources
>> srun: job 1000777 has been allocated resources
>> Charmrun> started all node programs in 7.786 seconds.
>> Charmrun: error on request socket--
>> Socket closed before recv.
>>
>>
>> ________________________________________
>> From: Jim Phillips
>> [jim AT ks.uiuc.edu]
>> Sent: Tuesday, September 06, 2011 3:28 PM
>> To: Bennion, Brian
>> Cc:
>> charm AT cs.uiuc.edu
>> Subject: RE: [ppl] [charm] icc compiler option
>>
>> You probably don't what the "shift; shift;" at the beginning, since it's
>> stripping off the "-n 12" arguments that I assume srun needs.
>>
>> -Jim
>>
>>
>> On Tue, 6 Sep 2011, Bennion, Brian wrote:
>>
>>> Hello Jim,
>>>
>>> OK this seems like its going to be painful.
>>> mpiexec doesn't exist on sierra.llnl.gov
>>>
>>> mympiexec file is below
>>> #!/bin/csh
>>> echo "This is dollar start "$*
>>> shift; shift; exec srun $* -p pdebug
>>>
>>> when the command below is executed
>>>
>>> ../charmrun +p 12 ++mpiexec ++remote-shell mympiexec ../namd2 apoa1.namd
>>>
>>> The output is
>>>
>>> Charmrun> IBVERBS version of charmrun
>>> This is dollar star: -n 12 ./charmrun.24175
>>> srun: Job is in held state, pending scheduler release
>>> srun: job 1000773 queued and waiting for resources
>>> Charmrun> error 0 attaching to node:
>>> Timeout waiting for node-program to connect
>>>
>>> The charmrun.24175 file is script that seems to be hanging somewhere.
>>> Its contents are pasted below:
>>>
>>>
>>> #!/bin/sh
>>> Echo() {
>>> echo 'Charmrun remote shell(127.0.0.1.0)>' $*
>>> }
>>> Exit() {
>>> if [ $1 -ne 0 ]
>>> then
>>> Echo Exiting with error code $1
>>> fi
>>> exit $1
>>> }
>>> Find() {
>>> loc=''
>>> for dir in `echo $PATH | sed -e 's/:/ /g'`
>>> do
>>> test -f "$dir/$1" && loc="$dir/$1"
>>> done
>>> if [ "x$loc" = x ]
>>> then
>>> Echo $1 not found in your PATH "($PATH)"--
>>> Echo set your path in your ~/.charmrunrc
>>> Exit 1
>>> fi
>>> }
>>> test -f "$HOME/.charmrunrc" && . "$HOME/.charmrunrc"
>>> DISPLAY='sierra0:16.0';export DISPLAY
>>> NETMAGIC="24175";export NETMAGIC
>>> CmiMyNode=$OMPI_COMM_WORLD_RANK
>>> test -z "$CmiMyNode" && CmiMyNode=$MPIRUN_RANK
>>> test -z "$CmiMyNode" && CmiMyNode=$PMI_RANK
>>> test -z "$CmiMyNode" && CmiMyNode=$PMI_ID
>>> test -z "$CmiMyNode" && (Echo Could not detect rank from environment ;
>>> Exit 1)
>>> export CmiMyNode
>>> NETSTART="$CmiMyNode 192.168.112.1 48334 24175 0";export NETSTART
>>> CmiMyNodeSize='1'; export CmiMyNodeSize
>>> CmiMyForks='0'; export CmiMyForks
>>> CmiNumNodes=$OMPI_COMM_WORLD_SIZE
>>> test -z "$CmiNumNodes" && CmiNumNodes=$MPIRUN_NPROCS
>>> test -z "$CmiNumNodes" && CmiNumNodes=$PMI_SIZE
>>> test -z "$CmiNumNodes" && (Echo Could not detect node count from
>>> environment ; Exit 1)
>>> export CmiNumNodes
>>> PATH="$PATH:/bin:/usr/bin:/usr/X/bin:/usr/X11/bin:/usr/local/bin:/usr/X11R6/bin:/usr/openwin/bin"
>>> if test ! -x
>>> "/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests/../namd2"
>>> then
>>> Echo 'Cannot locate this node-program:
>>> /g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests/../namd2'
>>> Exit 1
>>> fi
>>> cd "/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests"
>>> if test $? = 1
>>> then
>>> Echo 'Cannot propagate this current directory:'
>>> Echo '/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests'
>>> Exit 1
>>> fi
>>> rm -f /tmp/charmrun_err.$$
>>> ("/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests/../namd2"
>>> apoa1.namdq
>>> res=$?
>>> if [ $res -eq 127 ]
>>> then
>>> (
>>>
>>> "/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests/../namd2"
>>> ldd
>>> "/g/g11/NAMD_CVS-2011-08-29_Source/Linux-x86_64-ib-icc-sse3/tests/../namd2"
>>> ) > /tmp/charmrun_err.$$ 2>&1
>>> fi
>>> ) < /dev/null 1> /dev/null 2> /dev/null
>>> sleep 1
>>> if [ -r /tmp/charmrun_err.$$ ]
>>> then
>>> cat /tmp/charmrun_err.$$
>>> rm -f /tmp/charmrun_err.$$
>>> Exit 1
>>> fi
>>> Exit 0
>>>
>>> From: Jim Phillips
>>> [jim AT ks.uiuc.edu]
>>> Sent: Tuesday, September 06, 2011 2:22 PM
>>> To: Bennion, Brian
>>> Cc:
>>> charm AT cs.uiuc.edu
>>> Subject: RE: [ppl] [charm] icc compiler option
>>>
>>> You use charmrun with the ++mpiexec option and a mympiexec script that
>>> runs your srun with whatever options it needs. See this bit in notes.txt:
>>>
>>> -- Linux Clusters with InfiniBand or Other High-Performance Networks --
>>>
>>> Charm++ provides a special ibverbs network layer that uses InfiniBand
>>> networks directly through the OpenFabrics OFED ibverbs library. This
>>> avoids efficiency and portability issues associated with MPI. Look for
>>> pre-built ibverbs NAMD binaries or specify ibverbs when building Charm++.
>>>
>>> Writing batch job scripts to run charmrun in a queueing system can be
>>> challenging. Since most clusters provide directions for using mpiexec
>>> to launch MPI jobs, charmrun provides a ++mpiexec option to use mpiexec
>>> to launch non-MPI binaries. If "mpiexec -np <procs> ..." is not
>>> sufficient to launch jobs on your cluster you will need to write an
>>> executable mympiexec script like the following from TACC:
>>>
>>> #!/bin/csh
>>> shift; shift; exec ibrun $*
>>>
>>> The job is then launched (with full paths where needed) as:
>>>
>>> charmrun +p<procs> ++mpiexec ++remote-shell mympiexec namd2 <configfile>
>>>
>>>
>>> -Jim
>>>
>>>
>>> On Tue, 6 Sep 2011, Bennion, Brian wrote:
>>>
>>>> Thanks for the patch. It is a bit more sophisticated that what I would
>>>> have come up with.
>>>>
>>>> How does one start an ibverbs namd2.8 executable when more than 1 node
>>>> is needed. With mpi builds I just tell our "srun" scheduler that I need
>>>> 144 tasks and it assigns the PEs appropriately.
>>>>
>>>> The same syntax only produced 144 separate but identical jobs.
>>>>
>>>> Brian
>>>>
>>>> ________________________________________
>>>> From: Jim Phillips
>>>> [jim AT ks.uiuc.edu]
>>>> Sent: Tuesday, September 06, 2011 5:36 AM
>>>> To: Bennion, Brian
>>>> Cc:
>>>> charm AT cs.uiuc.edu
>>>> Subject: RE: [ppl] [charm] icc compiler option
>>>>
>>>> Hi,
>>>>
>>>> The attached patch should fix this.
>>>>
>>>> -Jim
>>>>
>>>>
>>>> On Mon, 5 Sep 2011, Bennion, Brian wrote:
>>>>
>>>>> bennion1 35:~> icc -v
>>>>> icc version 12.1.0 (gcc version 4.1.2 compatibility)
>>>>> ld /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crti.o
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbegin.o --eh-frame-hdr
>>>>> -dynamic-linker /lib64/ld-linux-x86-64.so.2
>>>>> -L/usr/local/tools/ifort-12.1.023-beta/lib -o a.out
>>>>> -L/usr/local/tools/icc-12.1.023-beta/compiler/lib/intel64
>>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../.. -L/lib64 -L/lib
>>>>> -L/usr/lib64 -L/usr/lib -rpath /usr/local/tools/icc-12.1.023-beta/lib
>>>>> -rpath /usr/local/tools/ifort-12.1.023-beta/lib -Bstatic -limf -lsvml
>>>>> -Bdynamic -lm -Bstatic -lipgo -ldecimal --as-needed -Bdynamic -lcilkrts
>>>>> -lstdc++ --no-as-needed -lgcc -lgcc_s -Bstatic -lirc -Bdynamic -lc
>>>>> -lgcc -lgcc_s -Bstatic -lirc_s -Bdynamic -ldl -lc
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtend.o
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crtn.o
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In
>>>>> function `_start':
>>>>> (.text+0x20): undefined reference to `main'
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jim Phillips
>>>>> [mailto:jim AT ks.uiuc.edu]
>>>>> Sent: Saturday, September 03, 2011 1:32 PM
>>>>> To: Bennion, Brian
>>>>> Cc:
>>>>> charm AT cs.uiuc.edu
>>>>> Subject: Re: [ppl] [charm] icc compiler option
>>>>>
>>>>>
>>>>> Actually, the Intel 12.x compilers use Version too:
>>>>>
>>>>> [jphillip@kidlogin2
>>>>> ~]$ icc -v
>>>>> Version 12.0.4
>>>>>
>>>>> Brian, what does icc -v return for you? On what platform?
>>>>>
>>>>> -Jim
>>>>>
>>>>>
>>>>> On Sat, 3 Sep 2011, Jim Phillips wrote:
>>>>>
>>>>>>
>>>>>> This must be new in the Intel 12.x compilers.
>>>>>>
>>>>>> What does "icc -v" look like for you?
>>>>>>
>>>>>> -Jim
>>>>>>
>>>>>>
>>>>>> On Fri, 2 Sep 2011, Bennion, Brian wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> In the charm632 version that I ships with namd2.8 there is a small
>>>>>>> bug in
>>>>>>> the build scripts. Specifically, if icc is requested as the
>>>>>>> compiler, the
>>>>>>> build will fail because icc was not found. In the cc-icc.sh script
>>>>>>> the
>>>>>>> grep command is looking for "Version" all icc -v commands only show
>>>>>>> "version".
>>>>>>> The grep comand should have the "-i" option to check for both
>>>>>>> spellings.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> charm mailing list
>>>>>>> charm AT cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>>>>> _______________________________________________
>>>>>>> ppl mailing list
>>>>>>> ppl AT cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>





Archive powered by MHonArc 2.6.16.

Top of Page