Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Using Charm AMPI

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Using Charm AMPI


Chronological Thread 
  • From: Jim Phillips <jim AT ks.uiuc.edu>
  • To: Sam White <white67 AT illinois.edu>
  • Cc: Scott Field <sfield AT astro.cornell.edu>, Leonardo Duarte <leo.duarte AT gmail.com>, Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] [ppl] Using Charm AMPI
  • Date: Fri, 30 Oct 2015 12:45:18 -0500 (CDT)


You just need to do the experiment. For the latest NAMD running in smp mode we do better using 32 cores per node, although the GPU version may have some serial bottleneck issues where using 8 (of 16) is faster.

Jim


On Fri, 30 Oct 2015, Sam White wrote:

Yes, Charm++ will view Blue Waters as having 32 SMP cores per node, but it
depends on the application and system whether using one PE per hardware
thread (hyperthread) or per core will perform best. On Blue Waters I
believe there is little to no benefit from using all 32 hardware threads
for NAMD and many applications because there are only 16 floating point
units per node, so two hardware threads share one FPU. On a system with
heavier-weight, more independent hyperthread hardware, this will differ.
But experiment with your application and use whichever configuration
performs best for you!

- Sam

On Fri, Oct 30, 2015 at 11:16 AM, Scott Field
<sfield AT astro.cornell.edu>
wrote:

Hi,

I'm glad to hear it worked out!

I have a follow up question (one which may be most appropriate for
another thread). On blue waters I've been launching 32 (or 31) threads per
nodes for smp builds and 32 (or 31) processes per node for non-smp builds.
Should I be using 16 instead? In Sam's example he uses 16 cores/node.

Does anyone have experience comparing the charm++ on bluewaters when
viewing each node as having 16 vs 32 cores? Documentation seems to suggest
viewing the system as having 32 cores (
https://bluewaters.ncsa.illinois.edu/charm
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bluewaters.ncsa.illinois.edu_charm&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=x3NNBo1sW-0Ro900LIARhw_4yZMfh7AfgFTrqQHfc5M&m=WFVsfsIfP0WT_7KkxxpnIww6uczJ0b93joaYaIMhtYM&s=FquRmIr1N38TK1J0sutlNO3-2PQ5pGVe9eCuiYsmSNk&e=>
).

Best,
Scott

On Fri, Oct 30, 2015 at 3:52 AM, Leonardo Duarte
<leo.duarte AT gmail.com>
wrote:

Hello everyone,

Thanks a lot for all the answers.
You were right. The problem was the +pemap and +commap parameters.
Since I was not defining them, the same thread was been used for worker
and communication.
Now my simple example that was spending 11 min is taking just 3 secs. I
know I have to improve this a lot but the 11 min was too weird to be right.
I was seeking for some build or running command line mistake, and that
was it.

Just for the record, I changed back to the PrgEnv-gnu since all of you
said there is no reason to be slow.

Thanks you all for the help.

Leonardo.







Archive powered by MHonArc 2.6.16.

Top of Page