Skip to Content.
Sympa Menu

charm - Re: [charm] Trying to run my FMM charm on a Cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Trying to run my FMM charm on a Cluster


Chronological Thread 
  • From: Mustafa Abdul Jabbar <musbar AT gmail.com>
  • To: Michael Robson <mprobson AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Trying to run my FMM charm on a Cluster
  • Date: Tue, 15 Apr 2014 02:01:58 +0300
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Thanks Michael, 
Actually, I fixed that one by giving the shell file 'execute' right using chmod
Now, I am facing another problem that ibverb doesn't support more than a couple thousand processors. 
Adding ++scalable-start and ++batch extremely slows things down.

I rebuilt charm++ with mpi as suggested by another blog. Of course building wasn't easy either. After building I faced another problem on Stampede cluster

[TACC]: Job submission is not allowed from this host  /usr/bin/setarch x86_64. Please submit

[TACC]: through one of the available login resources.

Is there any straight forward way to get more processors. I am suspecting that the way I am initializing the node array is naive. I am creating a 1D Node array of CkNumPes(), which implies that the creating process somehow needs to establish network connection with all of the nodes. this is crazy if it's true, and increasing MaxStartups as suggested by some posts doesn't sound so tempting to me, not to mention that I must be a sudoer of the cluster to do that. 






On Mon, Apr 14, 2014 at 10:41 PM, Michael Robson <mprobson AT illinois.edu> wrote:
Hello,

Instead of your current mpiexec strategy you might try

++mpiexec ++remote-shell "ibrun"

If that doesn't work you may need to add a -o 0 to tell it to offest 0. So the mpiexec portion would be:

++mpiexec ++remote-shell "ibrun -o 0"

Hope that helps and let us know if you run into any more trouble.

Sincerely,
Michael

On Sat, Apr 12, 2014 at 11:10 AM, Mustafa Abdul Jabbar <musbar AT gmail.com> wrote:
Hello,
I have been trying to run my charm job on cluster that uses sbatch for submitting job scripts and ibrun for running them
My first trial with ibrun worked but yielded a job that ran independently on the processors (i.e. as if each job is run locally"

Some threads said that I need to write a script for mpiexec to deal with ibrun

So I wrote this script and named it mympiexec
#!/bin/csh
shift; shift; exec ibrun $*

and this was my job script

#!/bin/bash
#SBATCH -J FMM
#SBATCH -o %j32.o
#SBATCH -p normal
#SBATCH -n 16
#SBATCH -t 00:10:00
./charmrun +p16 ++mpiexec ++remote-shell ~/exafmm-charm/examples/
charmexec ./a.out

I get "couldn't locate mpiexec program" when that's run. I tried to put the shell in one of the PATHs and it was of no use.

here's my nodelist "~/.nodelist"

group main ++shell ssh
host localhost








Archive powered by MHonArc 2.6.16.

Top of Page