Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Cannot launch on stampede with = 16k processes

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Cannot launch on stampede with = 16k processes


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Scott Field <sfield AT astro.cornell.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, Jim Phillips <jim AT ks.uiuc.edu>
  • Subject: Re: [charm] [ppl] Cannot launch on stampede with = 16k processes
  • Date: Wed, 12 Nov 2014 19:41:14 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

On Wed, Nov 12, 2014 at 6:58 PM, Scott Field <sfield AT astro.cornell.edu> wrote:
Hi Jim, Bilge and the rest of this list,


Hi Scott.
 
  Thank you for the quick response. In short, neither of these solutions worked for the 16K cpu (1 process per cpu) case.

  Before getting into the details, has anyone successfully executed a charm++ job on Stampede's large queue using 1024 nodes? If so, I would be grateful to know the (i) build of charm++ and (ii) the charmrun options used to launch the job. I've tried a variety of settings and, as a result, I'm quickly burning through cpu time.

Launching a full 16k processes that connect back to charmrun is probably doable, but rather wasteful if you're trying to get performance results. You'll be much better off using an SMP build of Charm++, as described below.
 
  Jim: thanks for sharing your home directory with me! I copied over your mpiexec script, which appears to give the same errors as the script I've been using.

  Bilge: I had previously tried with a 

   ./build charm++ net-linux-x86_64 ibverbs --with-production -j8

 charm++ build. I didn't mention this because I had carried out more experimentation with charmrun command line options using the vanilla net build. However, I've also been unsuccessful launching chamrun with 16K cpus using ibverbs. I've now explored a variety of batch/timeout/scalable-start options (without success) and have built with both

./build charm++ verbs-linux-x86_64 --with-production -j8

and

./build charm++ net-linux-x86_64 ibverbs --with-production -j8

...which might be equivalent?

These are very similar, but not quite equivalent. They reflect two generations of software architecture in the Charm++ network communications layer - the former verbs-... is newer, and the latter net-...-ibverbs is older. The verbs-... should be preferable, I think.

You'll want to add the 'smp' option to you build, so that Charm++ can run multiple worker threads in a single process. As I understand it, this provides unequivocally better performance for NAMD at medium and large scales.

When you launch, you'll want to launch just one or two processes per node (using appropriate options for ++mpiexec/ibrun), and then pass an additional argument +ppn 15 (one process per node, one thread driving external communications) or +ppn 7 (two processes per node).

I've been able to launch hello-world-type mpi programs on all 1024 nodes, so I believe it is possible in principle to do the same with charm++.

I would be grateful for any suggestions. I am currently looking into using a nodelist instead of mpiexec. However, as previously noted, I'm burning through SUs and would like to focus my remaining SUs on solutions most likely to work.
 
Nodelist vs mpiexec very likely would not help. It seems that the trouble is in charmrun accepting all of the connections back to it, rather than in getting the processes launched. Running in SMP mode will reduce that by a factor of 8-16.
 

Thanks in advance for any tips!

Best,
Scott

On Tue, Nov 11, 2014 at 1:51 PM, Bilge Acun <acun2 AT illinois.edu> wrote:
You should use Infiniband build of Charm++ on Stampede instead of the regular net build. Your build command should look like:

>> ./build charm++ verbs-linux-x86_64 --with-production -j8

On 11 November 2014 11:59, Jim Phillips <jim AT ks.uiuc.edu> wrote:

Here is my (possibly useless) suggestion:


Take a look at /home1/00288/tg455591/NAMD_scripts/runbatch_latest

I use:   ++scalable-start ++mpiexec ++remote-shell $SCRIPTDIR/mpiexec

where $SCRIPTDIR/mpiexec looks like:

#!/bin/csh -f

# drop -n <N>
shift
shift

exec /usr/local/bin/ibrun $*


It looks like you're doing about the same thing.  I can't say I've every
actually tried 16k processes, though.  At some point we switch to smp.

You may want to try launching an MPI hello world with ibrun at that scale
just to be sure it works at all.  Good luck.

Jim


On Tue, 11 Nov 2014, Scott Field wrote:

> Hi,
>
> I am running large jobs (as part of a scaling test) on stampede. My
> complete module list is "TACC-paths, cluster-paths, cluster, cmake/2.8.9,
> mvapich2/1.9a2,
> Linux, xalt/0.4.4, TACC, gcc/4.7.1, mkl/13.0.2.146"
>
> and charm++ (the most recent version from git) has been built with
>
>>>> ./build charm++ net-linux-x86_64 --with-production -j8
>
> The scaling tests span 1 node (16 procs) up to 1024 nodes (16384 procs).
> When I hit 256 nodes charmrun starts reporting problems. Typically I
> execute 4 charmruns in a single sbatch submission. At 256 procs the first
> one fails:
>
> TACC: Starting up job 4421563
> TACC: Setting up parallel environment for MVAPICH2+mpispawn.
> TACC: Starting parallel tasks...
> Charmrun> error 4466 attaching to node:
> Timeout waiting for node-program to connect
>
> while the next three succeed. At 16384 procs all 4 charmrun jobs fail with
> the same error (although the error number is different). My "base" command
> is
>
>>>> ./charmrun ./Evolve1DScalarWave +p16384 ++mpiexec ++remote-shell
> mympiexec
>
> where Evolve1DScalarWave is the executable and mympiexec is
>
> #!/bin/csh
> shift; shift; exec ibrun $*
>
> Finally, I've tried numerous possible combinations of the following command
> line options
>
> ++scalable-start
> ++timeout XXX
> ++batch YYY
>
> Where XXX is one of 60, 100, 1000 and YYY is one of 10, 64 and 128. None of
> these worked. Using only scalable-start I get a slightly modified error
> message
>
>
> Charmrun> error 93523 attaching to node:
> Error in accept.
>
>
> For all three options enabled and a large timeout I get about 200,000 lines
> from charmrun showing these same lines over and over (with the numbers
> different):
>
> Charmrun remote shell(127.0.0.1.0)> remote responding...
> Charmrun remote shell(127.0.0.1.0)> starting node-program...
> Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
> Charmrun> Waiting for 16305-th client to connect.
> Charmrun> client 6763 connected (IP=129.114.77.58 data_port=42030)
> Charmrun> adding client 12805: "127.0.0.1", IP:127.0.0.1
>
> until finally the job fails with
>
> [c401-004.stampede.tacc.utexas.edu:mpispawn_1][spawn_processes] Failed to
> execvp() 'sfield':  (2)
>
> Sometimes this last message is seen immediately after "TACC: Starting
> parallel tasks...".
>
> I've been able to reproduce this problem with the jacobi2D charm++ example.
>
> Any help or suggestions would be greatly appreciated!
>
> Best,
> Scott
>
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm



--
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department


_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm





Archive powered by MHonArc 2.6.16.

Top of Page