Skip to Content.
Sympa Menu

charm - Re: [charm] Profiling and tuning charm++ applications

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Profiling and tuning charm++ applications


Chronological Thread 
  • From: Eric Bohm <ebohm AT illinois.edu>
  • To: <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Profiling and tuning charm++ applications
  • Date: Wed, 22 Jul 2015 11:50:16 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello Alex,

Charm++ applications can easily reach peak utilization.  However, there are a number of factors which may be affecting your performance.   The MPI target for Charm++ is one of the simplest to build, but it is unlikely to be the one that gives the best performance.   For single node scalability you will probably experience better performance using a different target.  Try multicore-linux64.

It is difficult to diagnose your specific problem in the abstract, however the most common cause for poor single core utilization is overly fine granularity in simulation decomposition.  Experiencing a substantial drop from 1 to 2 cores suggests a load imbalance issue may also be present, however I recommend you examine compute granularity first.  A modest increase in work per chare is likely to help.  The Projections tool can be used to evaluate the current situation.

Regarding process switching, you can force affinity by appending the +setcpuaffinity flag, and specifically choose bindings by using the +pemap L[-U[:S[.R]+O arguments.  See section C.2.2 of the manual (http://charm.cs.illinois.edu/manuals/html/charm++/manual.html) for details.

On 07/22/2015 11:24 AM, Alexander Frolov wrote:
Hi!

I am profiling my application with projections and found out that usage profile is terribly low (~45%) for the cases when 2 and more cores are used (at the moment I am investigating scalability inside of single smp node). For single pe the usage profile is about 65% (does not look good as well).

I would suppose that something wrong with mpi environment (for eg. mpi-process are continuously switched between cores). But maybe the problem in charm++ configuration?

Has anybody met with similar behavior of charm++ applications? That is there is no scalability when it is expected... 
Any suggestions would be very appreciated! :-)

Hardware:
x2 Intel(R) Xeon(R) CPU E5-2690 with 65868940 kB of memory

System software:
icpc version 14.0.1, impi/4.1.0.030

Charm++ runtime:
./build charm++ mpi-linux-x86_64 mpicxx -verbose 2>&1

ps. I checked with --with-production option but it does not improved the situation significantly.

Best,
   Alex


_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm




Archive powered by MHonArc 2.6.16.

Top of Page