charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Using Charm AMPI

From: Phil Miller <mille121 AT illinois.edu>
To: Sam White <white67 AT illinois.edu>
Cc: Scott Field <sfield AT astro.cornell.edu>, Leonardo Duarte <leo.duarte AT gmail.com>, Charm Mailing List <charm AT cs.illinois.edu>
Subject: Re: [charm] [ppl] Using Charm AMPI
Date: Fri, 30 Oct 2015 11:46:38 -0500

There's an additional twist to this that applies to any system with multiple sockets in each node. Regardless of using hyperthreads or not, it's often desirable to run a separate process on the cores of each socket.
* This makes NUMA memory affinity trivial

* In communication-intensive applications, a single communication thread may not be sufficient to handle the message volume for the entire node. Running a process per socket also means one or more additional communication threads to divide this load. Jim may be able to comment on what configuration has been found to be optimal for NAMD.

Phil

On Fri, Oct 30, 2015 at 11:39 AM, Sam White <white67 AT illinois.edu> wrote:

Yes, Charm++ will view Blue Waters as having 32 SMP cores per node, but it depends on the application and system whether using one PE per hardware thread (hyperthread) or per core will perform best. On Blue Waters I believe there is little to no benefit from using all 32 hardware threads for NAMD and many applications because there are only 16 floating point units per node, so two hardware threads share one FPU. On a system with heavier-weight, more independent hyperthread hardware, this will differ. But experiment with your application and use whichever configuration performs best for you!

- Sam

On Fri, Oct 30, 2015 at 11:16 AM, Scott Field <sfield AT astro.cornell.edu> wrote:

Hi,

I'm glad to hear it worked out!

I have a follow up question (one which may be most appropriate for another thread). On blue waters I've been launching 32 (or 31) threads per nodes for smp builds and 32 (or 31) processes per node for non-smp builds. Should I be using 16 instead? In Sam's example he uses 16 cores/node.

Does anyone have experience comparing the charm++ on bluewaters when viewing each node as having 16 vs 32 cores? Documentation seems to suggest viewing the system as having 32 cores (https://bluewaters.ncsa.illinois.edu/charm).

Best,

Scott

On Fri, Oct 30, 2015 at 3:52 AM, Leonardo Duarte <leo.duarte AT gmail.com> wrote:

Hello everyone,

Thanks a lot for all the answers.

You were right. The problem was the +pemap and +commap parameters.

Since I was not defining them, the same thread was been used for worker and communication.

Now my simple example that was spending 11 min is taking just 3 secs. I know I have to improve this a lot but the 11 min was too weird to be right.

I was seeking for some build or running command line mistake, and that was it.

Just for the record, I changed back to the PrgEnv-gnu since all of you said there is no reason to be slow.

Thanks you all for the help.

Leonardo.

[charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
  - Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
    - Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
  - Message not available
    - Re: [charm] Using Charm AMPI, Sam White, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
  - Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
    - Re: [charm] [ppl] Using Charm AMPI, Scott Field, 10/30/2015
    - Message not available
      - Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
        
        Re: [charm] [ppl] Using Charm AMPI, Phil Miller, 10/30/2015
        
        Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/30/2015