charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] charm++ compiling problem

From: Phil Miller <mille121 AT illinois.edu>
To: Haowei Huang <huangh AT in.tum.de>
Cc: Charm Mailing List <charm AT cs.illinois.edu>
Subject: Re: [charm] charm++ compiling problem
Date: Tue, 1 Jun 2010 14:54:48 -0500
List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

On Tue, Jun 1, 2010 at 14:24, Haowei Huang
<huangh AT in.tum.de>
wrote:
>> If you really need a value out of some chare, you
>> can write a synchronous entry method with a non-void return type to
>> get that value.
>
> So you mean that we can only use entry method to access properties of
> chare but normal method can't do it,right?
> However, the return value of entry method must be void as described in your
> manual.

There are several ways you can get around the normal restriction:

1. If you always know who's going to be requesting a value, the holder
of that value can directly invoke some entry method on the requester
to pass the value.
2. If you want to maintain modularity, the getValue() entry method can
take a CkCallback argument that refers to an entry method on the
caller, and the owner of the value can send it through the callback.
3. If you mark an entry method with the [sync] annotation in the ci
file, then it can have a non-void return type. The caller will block
until that method returns its result. This requires that the caller be
a [threaded] entry method, or a similar construct.

There's also a fourth, lower-level construct called CkFutures that is
used to implement [sync] entry methods.

The approach with the lowest overhead at runtime is probably the
callback-based #2, since the runtime doesn't have to create and tear
down a user-level thread per invocation. This can make the code a
little bit harder to follow, since it has to 'return' to another entry
method. The Structured Dagger control flow notation solves that issue,
by letting you write higher-level code that sequences calls to
particular entry methods.

>> This is the 'asynchronous remote method
>> invocation' that's central to the Charm++ programming model.
> I know the concept of asynchronous remote method invocation, which means
> that the communication between chares is based on message. However, the
> number of message and the size of message must be small because there are so
> many chares at runtime. How do you solve the communication efficiency
> problem? You just treat message between chares as MPI message or other
> aggregation strategies to solve the "many small messages " problem. As far
> as I know, after compile a Charm++ program, we get an MPI executable file
> and one process is responsible for computation of multiple chares. So what
> is the relation between chare and thread? Multiple threads within a MPI
> process?

We don't directly address the issue of small messages, because in
general they are not a substantial problem. The communication layer
(MPI, net, bluegene, etc) may implement aggregation optimizations, but
you shouldn't worry about it unless performance analysis has indicated
that it's an issue. Even then, the appropriate response is usually to
adjust the grain-size of your decomposition. Chares responsible for a
larger amount of work will generally move larger chunks of data. In
cases where there really must be lots of small messages, and they
really are a problem, we provide a communication optimization library
to which programmers can delegate particular communication, that can
do that sort of aggregation explicitly.

With very narrow exceptions, each chare is assigned to a particular
processor. On plain MPI and Net targets, we have one process (MPI
rank) per processor, each running one OS thread. All chares assigned
to that processor run in that OS thread. On SMP and multicore targets
(including MPI-SMP), each node runs one process, with one OS thread on
each processor. Again, chares are assigned to a particular processor,
and so run in the thread executing on that processor.

Charm provides user-level threads to implement threaded entry methods
and higher-level constructs like Adaptive MPI and the unstructured
mesh framework (ParFUM). These are very lightweight. For a threaded
entry method, the associated thread will be assigned to whatever
processor owns the underlying chare. For threaded programming models,
these user-level threads are themselves migratable entities.

Re: [charm] charm++ compiling problem, Phil Miller, 06/01/2010
- Message not available
  - Re: [charm] charm++ compiling problem, Phil Miller, 06/01/2010
- <Possible follow-up(s)>
- Re: [charm] charm++ compiling problem, Phil Miller, 06/01/2010