Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster


Chronological Thread 
  • From: Eric Bohm <ebohm AT illinois.edu>
  • To: <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster
  • Date: Tue, 17 Feb 2015 14:55:54 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

We have not found a reliable way to detect this at build time.

On 02/17/2015 02:40 PM, Jim Phillips wrote:

Is this documented anywhere? Is there a way to detect this at runtime?

Jim


On Tue, 17 Feb 2015, Bilge Acun wrote:

Hi Jozef,

For Qlogic hardware, QLOGIC macro needs to be enabled when building
Charm++.
Can you try building Charm++ again with adding -DQLOGIC option?

Thanks,

--

*Bilge Acun*
*PhD Candidate at University of Illinois at Urbana-Champaign*
*Computer Science Department*

On 17 February 2015 at 09:33, Jozsef Bakosi
<jbakosi AT gmail.com>
wrote:

Thanks, Jim and Abhinav, this helps. However, this is what I get after
building Charm++ with "net-linux-x86_64 ibverbs" and trying to
run simplearrayhello:

$ ./charmrun +p32 ./hello ++mpiexec
Charmrun> IBVERBS version of charmrun
Charmrun> started all node programs in 2.129 seconds.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some
device-specific parameters in machine-ibevrbs
...
(32x)
...
[0] Stack Traceback:
[0:0] CmiAbort+0x40 [0x54bac0]
[0:1] initInfiOtherNodeData+0x168 [0x54bfd8]
[0:2] ConverseInit+0xe8a [0x5569fa]
[0:3] main+0x26 [0x4857e6]
[0:4] __libc_start_main+0xfd [0x2abbb434cd5d]
[0:5] [0x47ffd9]
Fatal error on PE 0> Failed to change qp state to RTS: you may need some
device-specific parameters in machine-ibevrbs

And here is what I get after building with "net-linux-x86_64 ibverbs
smp":

$ ./charmrun +p32 ./hello ++mpiexec
Charmrun> IBVERBS version of charmrun
Charmrun> started all node programs in 0.856 seconds.
Charmrun: error on request socket--
Socket closed before recv.

Any other clue as to what I'm still missing?

Thanks,
Jozsef

On Mon, Feb 16, 2015 at 8:57 PM, Abhinav Bhatele <
bhatele AT illinoisalumni.org>
wrote:

Hi Jozsef,

Please find some answers inline:


On Fri, Feb 13, 2015 at 8:19 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:

Hi folks,

I'm wondering what is the best way to run Charm++ applications on
clusters with Infiniband interconnects. So far I've been successfully
building and running my app using Charm++, built by the following command,
which uses MPI:

./build AMPI mpi-linux-x86_64 mpicxx

But now I'm wondering if the "ibverbs" build option provides better
performance on Infiniband clusters. We have Qlogic and Mellanox Infiniband
Fat-Tree interconnets. To experiment with this, I have successfully built
Charm++ using the following command:

./build AMPI net-linux-x86_64 ibverbs

But when I try to
run net-linux-x86_64-ibverbs/tests/charm++/simplearrayhello on two compute
nodes, I get

$ ./charmrun +p32 ./hello
Charmrun> IBVERBS version of charmrun
mcmd: connect failed: Connection refused (32x)
Charmrun> Error 1 returned from rsh (localhost:0)

So my questions are:

1. Can I expect better performance on Infiniband clusters using build
options other than MPI?


Yes, typically you would expect the ibverbs build to perform better
than the MPI build. You can try the four builds below:

mpi-linux-x86_64 mpicxx
mpi-linux-x86_64 mpicxx smp

net-linux-x86_64 ibverbs
net-linux-x86_64 ibverbs smp


2. Do I also have to contact our system admins to allow access to
lower (than MPI) level software layers for the interconnect so Charm++ code
(I assume ibverbs) can use it?


No, like Jim pointed out you can use ++mpiexec or manually specify the
nodelist that has been allocated to you:
http://charm.cs.illinois.edu/manuals/html/charm++/C.html


3. Am I missing something else?
4. Are the best ways to build Charm++ for specific hardware documented
somewhere?


Hopefully, someone else will answer this but my guess is no.



Thanks in advance, and please let me know if I you need more
information on the clusters.
Jozsef

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm

_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl




--
Abhinav Bhatele, people.llnl.gov/bhatele
Center for Applied Scientific Computing, Lawrence Livermore National
Laboratory





--

*Bilge Acun*
*PhD Candidate at University of Illinois at Urbana-Champaign*
*Computer Science Department*

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm





Archive powered by MHonArc 2.6.16.

Top of Page