charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Pe numbering

From: Phil Miller <mille121 AT illinois.edu>
To: François Tessier <francois.tessier AT inria.fr>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: Re: [charm] Pe numbering
Date: Fri, 8 Nov 2013 08:27:23 -0800
List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

PE numbering is 'arbitrary' in the same way that the ranks of MPI processes are arbitrary. It depends on several things, which you may be able to control, depending on your execution environment:

====
1. Process launching:

There are a number of different ways that Charm++ programs get launched across a parallel machine. The most general case is charmrun in our own net*/verbs* builds. Charmrun takes a number of PEs to launch and a list of physical nodes on which to launch them. Some of the nodes may be re-used round-robin if more PEs are requested than individual nodes. I believe ++cpus N on a node will allocate a 'block' of N PEs on that node before continuing through the list.

Note that some nodes may have fewer PEs than cores assigned, or some may have more PEs assigned than it has cores, or even both in the same job with a suitably weird configuration. IIRC, we print a warning at startup if PEs (plus communication threads, where applicable) oversubscribe the available logical cores on any given hardware node.

The mpirun launchers on many large systems (which we sometimes use with charmrun via ++mpiexec) also often have options for block, cyclic, and arbitrary mappings of ranks to processors.

2. CPU affinity settings

By default, PEs are not 'pinned' to any particular core, physical or logical. The OS on each hardware node is left free to assign and reassign that mapping according to its scheduler's whims. When using runtime options +setcpuaffinity and +pemap, the mapping is explicitly set by the runtime accordingly.

3. Job partitioning

A feature that you may not have encountered yet is the recently added ability to 'partition' a single Charm++ job into multiple smaller jobs, each of which mostly behaves as if it is an independent execution of the program. There are then special routines to send messages between partitions. This is in use for replica-exchange logic in NAMD, and possibly other Charm++ codes.

When a job is started with multiple partitions, it is divided along boundaries of separate OS processes ('charm nodes'). The division of those nodes among partitions and the numbering of PEs within each partition is determined by the partitioning strategy, which can range from a simple linear map from the global ranks, to a network topology aware calculation.
====

That all said, to get the data you want without controlling it directly, it may be easiest to look at the functionality we expose in our cputopology routines. For any PE, you can ask which bit of hardware it's assigned to, which PEs live on which bit of hardware, and so forth.

I hope that all helps.

Phil

On Fri, Nov 8, 2013 at 5:31 AM, François Tessier <francois.tessier AT inria.fr> wrote:

Hello,

I would like to know if the number returned by my_chare.getCurrentPe()
corresponds to a physical core number or a logical one. I ran some tests
that seem to show it uses a physical core numbering but I need to be
sure :-).

Thanks,

François

--
___________________
François TESSIER
PhD Student at University of Bordeaux
Inria - Runtime Team
Tel : 0033.5.24.57.41.52
francois.tessier AT inria.fr
http://runtime.bordeaux.inria.fr/ftessier/
PGP 0x8096B5FA

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm

[charm] Pe numbering, François Tessier, 11/08/2013
- Re: [charm] Pe numbering, Phil Miller, 11/08/2013