Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster


Chronological Thread 
  • From: Jim Phillips <jim AT ks.uiuc.edu>
  • To: Jozsef Bakosi <jbakosi AT gmail.com>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster
  • Date: Mon, 16 Feb 2015 14:14:16 -0600 (CST)
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>


Try "charmrun ++mpiexec +p32 ./hello" which will use mpiexec internally to launch across nodes.

Jim


On Fri, 13 Feb 2015, Jozsef Bakosi wrote:

Hi folks,

I'm wondering what is the best way to run Charm++ applications on clusters
with Infiniband interconnects. So far I've been successfully building and
running my app using Charm++, built by the following command, which uses
MPI:

./build AMPI mpi-linux-x86_64 mpicxx

But now I'm wondering if the "ibverbs" build option provides better
performance on Infiniband clusters. We have Qlogic and Mellanox Infiniband
Fat-Tree interconnets. To experiment with this, I have successfully built
Charm++ using the following command:

./build AMPI net-linux-x86_64 ibverbs

But when I try to
run net-linux-x86_64-ibverbs/tests/charm++/simplearrayhello on two compute
nodes, I get

$ ./charmrun +p32 ./hello
Charmrun> IBVERBS version of charmrun
mcmd: connect failed: Connection refused (32x)
Charmrun> Error 1 returned from rsh (localhost:0)

So my questions are:

1. Can I expect better performance on Infiniband clusters using build
options other than MPI?
2. Do I also have to contact our system admins to allow access to lower
(than MPI) level software layers for the interconnect so Charm++ code (I
assume ibverbs) can use it?
3. Am I missing something else?
4. Are the best ways to build Charm++ for specific hardware documented
somewhere?

Thanks in advance, and please let me know if I you need more information on
the clusters.
Jozsef





Archive powered by MHonArc 2.6.16.

Top of Page