charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Using Charm AMPI

From: Jim Phillips <jim AT ks.uiuc.edu>
To: Sam White <white67 AT illinois.edu>
Cc: Scott Field <sfield AT astro.cornell.edu>, Leonardo Duarte <leo.duarte AT gmail.com>, Charm Mailing List <charm AT cs.illinois.edu>
Subject: Re: [charm] [ppl] Using Charm AMPI
Date: Fri, 30 Oct 2015 12:45:18 -0500 (CDT)

You just need to do the experiment. For the latest NAMD running in smp mode we do better using 32 cores per node, although the GPU version may have some serial bottleneck issues where using 8 (of 16) is faster.

Jim

On Fri, 30 Oct 2015, Sam White wrote:

Yes, Charm++ will view Blue Waters as having 32 SMP cores per node, but it
depends on the application and system whether using one PE per hardware
thread (hyperthread) or per core will perform best. On Blue Waters I
believe there is little to no benefit from using all 32 hardware threads
for NAMD and many applications because there are only 16 floating point
units per node, so two hardware threads share one FPU. On a system with
heavier-weight, more independent hyperthread hardware, this will differ.
But experiment with your application and use whichever configuration
performs best for you!

- Sam

On Fri, Oct 30, 2015 at 11:16 AM, Scott Field
<sfield AT astro.cornell.edu>
wrote:

Hi,

I'm glad to hear it worked out!

I have a follow up question (one which may be most appropriate for
another thread). On blue waters I've been launching 32 (or 31) threads per
nodes for smp builds and 32 (or 31) processes per node for non-smp builds.
Should I be using 16 instead? In Sam's example he uses 16 cores/node.

Does anyone have experience comparing the charm++ on bluewaters when
viewing each node as having 16 vs 32 cores? Documentation seems to suggest
viewing the system as having 32 cores (
https://bluewaters.ncsa.illinois.edu/charm
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bluewaters.ncsa.illinois.edu_charm&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=x3NNBo1sW-0Ro900LIARhw_4yZMfh7AfgFTrqQHfc5M&m=WFVsfsIfP0WT_7KkxxpnIww6uczJ0b93joaYaIMhtYM&s=FquRmIr1N38TK1J0sutlNO3-2PQ5pGVe9eCuiYsmSNk&e=>
).

Best,
Scott

On Fri, Oct 30, 2015 at 3:52 AM, Leonardo Duarte
<leo.duarte AT gmail.com>
wrote:

Hello everyone,

Thanks a lot for all the answers.
You were right. The problem was the +pemap and +commap parameters.
Since I was not defining them, the same thread was been used for worker
and communication.
Now my simple example that was spending 11 min is taking just 3 secs. I
know I have to improve this a lot but the 11 min was too weird to be right.
I was seeking for some build or running command line mistake, and that
was it.

Just for the record, I changed back to the PrgEnv-gnu since all of you
said there is no reason to be slow.

Thanks you all for the help.

Leonardo.

[charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
  - Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
    - Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
  - Message not available
    - Re: [charm] Using Charm AMPI, Sam White, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
  - Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
    - Re: [charm] [ppl] Using Charm AMPI, Scott Field, 10/30/2015
    - Message not available
      - Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
        
        Re: [charm] [ppl] Using Charm AMPI, Phil Miller, 10/30/2015
        
        Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/30/2015