Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster


Chronological Thread 
  • From: Bilge Acun <acun2 AT illinois.edu>
  • To: "Bohm, Eric J" <ebohm AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster
  • Date: Tue, 17 Feb 2015 15:07:37 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I don't think this is documented anywhere, I'll add it to the manual.

I can extend the abort print statement to state that QLOGIC macro needs to be enabled for Qlogic hardware if that's helpful.

On 17 February 2015 at 14:55, Bohm, Eric J <ebohm AT illinois.edu> wrote:
We have not found a reliable way to detect this at build time.

On 02/17/2015 02:40 PM, Jim Phillips wrote:
>
> Is this documented anywhere?  Is there a way to detect this at runtime?
>
> Jim
>
>
> On Tue, 17 Feb 2015, Bilge Acun wrote:
>
>> Hi Jozef,
>>
>> For Qlogic hardware, QLOGIC macro needs to be enabled when building
>> Charm++.
>> Can you try building Charm++ again with adding -DQLOGIC option?
>>
>> Thanks,
>>
>> --
>>
>> *Bilge Acun*
>> *PhD Candidate at University of Illinois at Urbana-Champaign*
>> *Computer Science Department*
>>
>> On 17 February 2015 at 09:33, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
>>
>>>  Thanks, Jim and Abhinav, this helps. However, this is what I get after
>>> building Charm++ with "net-linux-x86_64 ibverbs" and trying to
>>> run simplearrayhello:
>>>
>>>  $ ./charmrun +p32 ./hello ++mpiexec
>>> Charmrun> IBVERBS version of charmrun
>>> Charmrun> started all node programs in 2.129 seconds.
>>> ------------- Processor 0 Exiting: Called CmiAbort ------------
>>> Reason: Failed to change qp state to RTS: you may need some
>>> device-specific parameters in machine-ibevrbs
>>> ...
>>> (32x)
>>> ...
>>> [0] Stack Traceback:
>>>   [0:0] CmiAbort+0x40  [0x54bac0]
>>>   [0:1] initInfiOtherNodeData+0x168  [0x54bfd8]
>>>   [0:2] ConverseInit+0xe8a  [0x5569fa]
>>>   [0:3] main+0x26  [0x4857e6]
>>>   [0:4] __libc_start_main+0xfd  [0x2abbb434cd5d]
>>>   [0:5]   [0x47ffd9]
>>> Fatal error on PE 0> Failed to change qp state to RTS: you may need
>>> some
>>> device-specific parameters in machine-ibevrbs
>>>
>>>  And here is what I get after building with "net-linux-x86_64 ibverbs
>>> smp":
>>>
>>>  $ ./charmrun +p32 ./hello ++mpiexec
>>> Charmrun> IBVERBS version of charmrun
>>> Charmrun> started all node programs in 0.856 seconds.
>>> Charmrun: error on request socket--
>>> Socket closed before recv.
>>>
>>>  Any other clue as to what I'm still missing?
>>>
>>>  Thanks,
>>> Jozsef
>>>
>>> On Mon, Feb 16, 2015 at 8:57 PM, Abhinav Bhatele <
>>> bhatele AT illinoisalumni.org> wrote:
>>>
>>>> Hi Jozsef,
>>>>
>>>>  Please find some answers inline:
>>>>
>>>>
>>>> On Fri, Feb 13, 2015 at 8:19 AM, Jozsef Bakosi <jbakosi AT gmail.com>
>>>> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>>  I'm wondering what is the best way to run Charm++ applications on
>>>>> clusters with Infiniband interconnects. So far I've been successfully
>>>>> building and running my app using Charm++, built by the following
>>>>> command,
>>>>> which uses MPI:
>>>>>
>>>>>  ./build AMPI mpi-linux-x86_64 mpicxx
>>>>>
>>>>>  But now I'm wondering if the "ibverbs" build option provides better
>>>>> performance on Infiniband clusters. We have Qlogic and Mellanox
>>>>> Infiniband
>>>>> Fat-Tree interconnets. To experiment with this, I have
>>>>> successfully built
>>>>> Charm++ using the following command:
>>>>>
>>>>>  ./build AMPI net-linux-x86_64 ibverbs
>>>>>
>>>>>  But when I try to
>>>>> run net-linux-x86_64-ibverbs/tests/charm++/simplearrayhello on two
>>>>> compute
>>>>> nodes, I get
>>>>>
>>>>>  $ ./charmrun +p32 ./hello
>>>>> Charmrun> IBVERBS version of charmrun
>>>>> mcmd: connect failed: Connection refused (32x)
>>>>> Charmrun> Error 1 returned from rsh (localhost:0)
>>>>>
>>>>>  So my questions are:
>>>>>
>>>>>  1. Can I expect better performance on Infiniband clusters using
>>>>> build
>>>>> options other than MPI?
>>>>>
>>>>
>>>>  Yes, typically you would expect the ibverbs build to perform better
>>>> than the MPI build. You can try the four builds below:
>>>>
>>>>  mpi-linux-x86_64 mpicxx
>>>>  mpi-linux-x86_64 mpicxx smp
>>>>
>>>>  net-linux-x86_64 ibverbs
>>>>  net-linux-x86_64 ibverbs smp
>>>>
>>>>
>>>>>  2. Do I also have to contact our system admins to allow access to
>>>>> lower (than MPI) level software layers for the interconnect so
>>>>> Charm++ code
>>>>> (I assume ibverbs) can use it?
>>>>>
>>>>
>>>>  No, like Jim pointed out you can use ++mpiexec or manually specify
>>>> the
>>>> nodelist that has been allocated to you:
>>>> http://charm.cs.illinois.edu/manuals/html/charm++/C.html
>>>>
>>>>
>>>>>  3. Am I missing something else?
>>>>> 4. Are the best ways to build Charm++ for specific hardware
>>>>> documented
>>>>> somewhere?
>>>>>
>>>>
>>>>  Hopefully, someone else will answer this but my guess is no.
>>>>
>>>>
>>>>>
>>>>>  Thanks in advance, and please let me know if I you need more
>>>>> information on the clusters.
>>>>>  Jozsef
>>>>>
>>>>> _______________________________________________
>>>>> charm mailing list
>>>>> charm AT cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>>>
>>>>> _______________________________________________
>>>>> ppl mailing list
>>>>> ppl AT cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Abhinav Bhatele, people.llnl.gov/bhatele
>>>> Center for Applied Scientific Computing, Lawrence Livermore National
>>>> Laboratory
>>>>
>>>
>>>
>>
>>
>> --
>>
>> *Bilge Acun*
>> *PhD Candidate at University of Illinois at Urbana-Champaign*
>> *Computer Science Department*
>>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm



--
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department



Archive powered by MHonArc 2.6.16.

Top of Page