Skip to Content.
Sympa Menu

charm - RE: [charm] When to migrate

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

RE: [charm] When to migrate


Chronological Thread 
  • From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
  • To: "Chandrasekar, Kavitha" <kchndrs2 AT illinois.edu>, "White, Samuel T" <white67 AT illinois.edu>
  • Cc: Phil Miller <unmobile AT gmail.com>, "Totoni, Ehsan" <ehsan.totoni AT intel.com>, "Langer, Akhil" <akhil.langer AT intel.com>, "Harshitha Menon" <harshitha.menon AT gmail.com>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: RE: [charm] When to migrate
  • Date: Tue, 6 Dec 2016 22:51:20 +0000
  • Accept-language: en-US

Hi all,

 

I have been running my code with the placement of the AMPI_Migrate calls as indicated in the figure below. I delay the time of that call between one and five time steps after the load changes, but I see zero effect on performance (this is on a shared memory system, using isomalloc, so the migration does not require serious data transfer). I would expect that with a delay of zero time steps the runtime has no information yet about the changed load, so does not know how to migrate. With one step it knows something, and with a couple more even more. Delaying much longer is no good, because the load will have changed without effective migration executed. So I would expect an initial increase in performance with increasing delay, and subsequently performance decrease if I delay migration even more. The data is noisy, but it is clear migration delay has negligible effect.

I have two questions:

1.      How does the runtime actually collect data about load balance? Is it continuous, or only when AMPI_Migrate is called? If the latter, then I’m in trouble, because the runtime would not be utilizing the migration delay to learn more about the load balance.

2.      How can I make effective use (if any) of start/stop measurement calls? Should they be used to demarcate a few time steps before a load change occurs, so the runtime can learn?

Thanks!

 

Rob

 

From: Van Der Wijngaart, Rob F
Sent: Monday, December 05, 2016 10:28 AM
To: 'Chandrasekar, Kavitha' <kchndrs2 AT illinois.edu>; 'White, Samuel T' <white67 AT illinois.edu>
Cc: 'Phil Miller' <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; 'Harshitha Menon' <harshitha.menon AT gmail.com>; 'charm AT cs.uiuc.edu' <charm AT cs.uiuc.edu>
Subject: RE: [charm] When to migrate

 

Hi Kavitha,

 

One last question. If I have an AMPI code that I have linked with commonLB and that I run with a valid +balancer argument, but I do not call AMPI_Migrate anywhere in the code, does the runtime still collect load balance information? If so, is there a charmrun command that I can give that prevents the runtime from doing that? I know I could use these functions:

AMPI_Load_stop_measure(void)

AMPI_Load_start_measure(void)

but I prefer to specify it on the command line (otherwise I need to supply yet another input parameter and parse it in the code). Thanks!

 

Rob

 

From: Van Der Wijngaart, Rob F
Sent: Friday, December 02, 2016 3:19 PM
To: 'Chandrasekar, Kavitha' <kchndrs2 AT illinois.edu>; 'White, Samuel T' <white67 AT illinois.edu>
Cc: 'Phil Miller' <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; 'Harshitha Menon' <harshitha.menon AT gmail.com>; 'charm AT cs.uiuc.edu' <charm AT cs.uiuc.edu>
Subject: RE: [charm] When to migrate

 

Below is to illustrate how I understood where I should place the AMPI_Migrate calls in my simulation. Is this correct? Thanks!

 

Rob

 

 

 

 

 

 

 

 

 

 

 

 


From: Van Der Wijngaart, Rob F
Sent: Friday, December 02, 2016 1:53 PM
To: 'Chandrasekar, Kavitha' <kchndrs2 AT illinois.edu>; White, Samuel T <white67 AT illinois.edu>
Cc: Phil Miller <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

 

Thanks, Kavitha, but now I am a little confused. How should I use AMPI_Load_start/stop_measure in conjunction with AMPI+Migrate? Are they only used with the Metabalancer, or also with the fixed balancers? In my case changes in load occur only at discrete points in time; they do not grow or shrink continually. So migration, if done at all, should be done at one of these discrete points, or it will have no effect. If I am supposed to bracket those points with calls to AMPI_Load_start/stop_measure, would the sequence of calls would be something like this: AMPI_Load_start, wait a few time steps for the load to change, AMPI_Migrate, wait a few more time steps, AMPI_Load_stop_measure?

Thanks!

 

Rob

 

From: Chandrasekar, Kavitha [mailto:kchndrs2 AT illinois.edu]
Sent: Friday, December 02, 2016 1:10 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>; White, Samuel T <white67 AT illinois.edu>
Cc: Phil Miller <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

 

When using load balancers without Metabalancer, it is sufficient to make the AMPI_Migrate calls when the imbalance appears. Phil pointed out a couple of things regarding this:

 

1. The AMPI_Migrate calls would need to be placed a few time steps after the imbalance appears, since load imbalance in the ranks would be known to the load balancing framework only at the end of the time step

 

2. The information supplied to the load balancing framework would be more accurate, if the LB instrumentation is turned off to start with and turned on a few time steps before the load imbalance appears. This can be repeated each time load imbalance occurs.

The calls to turn instrumentation off and on are:

AMPI_Load_stop_measure(void)

AMPI_Load_start_measure(void)

 

A clarification regarding use of +MetaLB - the option needs to be specified alongside the +balancer <loadbalancer> option.

 

Thanks,

Kavitha

 


From: Van Der Wijngaart, Rob F [rob.f.van.der.wijngaart AT intel.com]
Sent: Friday, December 02, 2016 2:20 PM
To: Chandrasekar, Kavitha; White, Samuel T
Cc: Phil Miller; Totoni, Ehsan; Langer, Akhil; Harshitha Menon; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

OK, thanks, Kavitha, I’ll do that. Should I apply this method to all load balancers, or only to MetaLB?

 

Rob

 

From: Chandrasekar, Kavitha [mailto:kchndrs2 AT illinois.edu]
Sent: Friday, December 02, 2016 12:07 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>; White, Samuel T <white67 AT illinois.edu>
Cc: Phil Miller <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

 

It would be useful to call AMPI_Migrate every few time steps. The load statistics collection happens at the AMPI_Migrate calls. If there is observed load imbalance, which I understand would be when the refinement appears and disappears, then Metabalancer would calculate the load balancing period based on historical data. So, it would be useful to call it more often than only at time step with the load imbalance.

 

Thanks,

Kavitha

 


From: Van Der Wijngaart, Rob F [rob.f.van.der.wijngaart AT intel.com]
Sent: Friday, December 02, 2016 1:00 PM
To: Chandrasekar, Kavitha; White, Samuel T
Cc: Phil Miller; Totoni, Ehsan; Langer, Akhil; Harshitha Menon; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

Hi Kavitha,

 

After a lot of debugging and switching to the 6.7.1 development version (that fixed the string problem, as you and Sam noted), I can now run my Adaptive MPI code consistently and without fail, both with and without explicit PUP routines (currently on a shared memory system). I haven’t tried the meta load balancer yet, but will do so shortly. I did want to share the structure of my code with you, to make sure I am and will be doing the right thing. This is an Adaptive Mesh Refinement code, in which I intermittently add a new discretization grid (a refinement) to the original grid (AKA background grid). I do this in a very controlled fashion, where I exactly specify the interval (in number of time steps) at which the refinement appears, and how long it is present. This is a cyclical process. Obviously, the amount of work goes up (for some ranks) when a refinement appears, and goes down again when it disappears.

Right now I place an AMPI_Migrate call each time a refinement has just appeared, and when it has just disappeared. So each time I call it something has changed. I have a number of parameters that I vary in my current test suite, including the over-decomposition factor, and the load balancing policy (RefineLB, RefineSwapLB, RefineCommLB, GreedyLB, and GreedyCommLB). I will add MetaLB to that in my next round of tests. My question is if my approach for when to call AMPI_Migrate is correct. Simply put, I only call AMPI_Migrate when the work structure (work assignment to ranks) has changed, and not otherwise. What do you think, should I call it every time step? Note that calling it every so many time steps without regard for when the refinement appears and disappears wouldn’t make much sense. I’d be sampling the workload distribution at a frequency unrelated to the refinement frequency.

Thanks in advance!

 

Rob

 

From: Chandrasekar, Kavitha [mailto:kchndrs2 AT illinois.edu]
Sent: Tuesday, November 22, 2016 1:29 PM
To: White, Samuel T <white67 AT illinois.edu>; Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: Phil Miller <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu
Subject: RE: [charm] When to migrate

 

The meta-balancer capability to decide when to invoke a load balancer is available with the +MetaLB command line argument. It relies on the AMPI_Migrate calls to collect statistics to decide when to invoke the load balancer. However, in the current release, there is a bug in AMPI_Migrate's string handling, so it might not work correctly. 

 

The meta-balancer capability to select the optimal load balancing strategy is expected to be merged to mainline charm in the near future. I will update the manual to include the usage of meta-balancer.

 

Thanks,

Kavitha


From: samt.white AT gmail.com [samt.white AT gmail.com] on behalf of Sam White [white67 AT illinois.edu]
Sent: Tuesday, November 22, 2016 3:09 PM
To: Van Der Wijngaart, Rob F
Cc: Phil Miller; Totoni, Ehsan; Langer, Akhil; Harshitha Menon; charm AT cs.uiuc.edu; Chandrasekar, Kavitha
Subject: Re: [charm] When to migrate

Yes, Kavitha will respond on Metabalancer. MPI_Comm's are int's in AMPI. We should really have APIs in our C and Fortran PUP interfaces to hide these details from users, so thanks for pointing it out.

-Sam

 

On Tue, Nov 22, 2016 at 3:01 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Meanwhile, it would still be great to learn about the status of the meta-balancer. Thanks!

 

Rob

 

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Tuesday, November 22, 2016 12:23 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: Phil Miller <unmobile AT gmail.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu; Chandrasekar, Kavitha <kchndrs2 AT illinois.edu>
Subject: Re: [charm] When to migrate

 

If you are using MPI_Comm_split(), you can completely ignore the text in that section of the AMPI manual. That is specifically in reference to MPI-3's routine MPI_Comm_split_type() [1], which can be used to create subcommunicators per shared-memory node by passing the flag MPI_COMM_TYPE_SHARED. By migrating ranks out of a node, the communicator in this case becomes invalid (the ranks in the communicator no longer share the same address space).

For any other kind of communicator (created via MPI_Comm_dup, MPI_Comm_split, etc.), the user does not need to do anything special before/after calling AMPI_Migrate().

[1] http://www.mpich.org/static/docs/v3.2/www3/MPI_Comm_split_type.html

-Sam

 

On Tue, Nov 22, 2016 at 2:19 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Thanks, Phil! Yeah, my statement about derived communicators was too broad, but in my app I do indeed use MPI_Comm_split to create communicators.

 

Rob

 

From: Phil Miller [mailto:unmobile AT gmail.com]
Sent: Tuesday, November 22, 2016 12:17 PM
To: Van Der Wijngaart, Rob F <
rob.f.van.der.wijngaart AT intel.com>; White, Samuel T <white67 AT illinois.edu>
Cc: Totoni, Ehsan <
ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu; Kavitha Chandrasekar <kchndrs2 AT illinois.edu>
Subject: RE: [charm] When to migrate

 

Sam should be better able to answer your exact query. Depending on what you need, In brief, that remark in the manual is specifically about MPI_Comm_split_type that's used to get a subcommunicator with physical commonality. It doesn't affect derived communicators in general.

 

On Nov 22, 2016 2:03 PM, "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com> wrote:

Hello Kavitha,

 

I was just talking with Ehsan and Akhil about logistics of dynamic load balancing in Adaptive MPI applications, see below. Can you give me an update on the status of the meta-balancer? Meanwhile, I ran into a funny issue with my application. I am using MPI_Comm_split to create multiple communicators. This is what I read in the Adaptive MPI manual:

Note that migrating ranks around the cores and nodes of a system can change which ranks share physical resources, such as memory. A consequence of this is that communicators created via MPI_Comm_split_type are invalidated by calls to AMPI_Migrate that result in migration which breaks the semantics of that communicator type. The only valid routine to call on such communicators is MPI_Comm_free .

 

We also provide callbacks that user code can register with the runtime system to be invoked just before and right after migration: AMPI_Register_about_to_migrate and AMPI_Register_just_migrated respectively. Note that the callbacks are only invoked on those ranks that are about to actually migrate or have just actually migrated.

 

So is the idea that before a migration I call MPI_Comm_free on derived communicators and reconstitute the communicators after the migration by reinvoking MPI_Comm_split?

Thanks!

 

Rob

 

From: Langer, Akhil
Sent: Tuesday, November 22, 2016 10:07 AM
To: Totoni, Ehsan <ehsan.totoni AT intel.com>; Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Subject: Re: When to migrate

 

I think it is Kavitha Chandrasekar (kchndrs2 AT illinois.edu) who is continuing the work. Harshitha (harshitha.menon AT gmail.com) is now at LLNL. 

 

From: "Totoni, Ehsan" <ehsan.totoni AT intel.com>
Date: Tuesday, November 22, 2016 at 12:03 PM
To: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>, Akhil Langer <akhil.langer AT intel.com>
Subject: RE: When to migrate

 

The person working on it (Harshitha) has left recently and I don’t know who picked up the work. I suggest sending an email to the mailing list. Hopefully, the meta-balancer is in a usable shape.

 

-Ehsan

 

From: Van Der Wijngaart, Rob F
Sent: Tuesday, November 22, 2016 9:33 AM
To: Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>
Subject: RE: When to migrate

 

Thanks, Ehsan! Indeed, my workload is iterative. The structure is as follows:

for (t=0; t<T; t++) {

  if (iter%period<duration && criterion(my_rank) do extra work;

  do regular work;

}

So whenever the time step is a multiple of the period, some ranks (depending on the criterion function) start doing extra work for duration steps. As you can see, there is a hierarchy in the iterative workload behavior.

Whom should I contact about the meta-balancer?

Thanks again!

 

Rob

 

From: Totoni, Ehsan
Sent: Tuesday, November 22, 2016 9:24 AM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>
Subject: RE: When to migrate

 

Hi Rob,

 

If the workload is iterative, where in the iteration AMPI_Migrate() is called shouldn’t matter in principle for measurement-based load balancing. Of course, there are tricky cases where this doesn’t work (few long variant iterations etc). There is also a meta-balancer that automatically decides how often load balancing should be invoked and which load balancer. I can’t find it in the manual so I suggest sending them an email to make them document it J

 

Is your workload different than typical iterative applications?

 

Best,

Ehsan

 

p.s. MPI_Migrate() is renamed to AMPI_Migrate(). MPI_ prefix is not used anymore for AMPI-specific calls.

 

From: Van Der Wijngaart, Rob F
Sent: Tuesday, November 22, 2016 9:01 AM
To: Langer, Akhil <akhil.langer AT intel.com>; Totoni, Ehsan <ehsan.totoni AT intel.com>
Subject: When to migrate

 

Hi Akhil and Ehsan,

 

I have a silly question. I put together a workload designed to test the capabilities of runtimes to do dynamic load balancing. It’s  a very controlled environment. For a while nothing happens to the load, but at discrete points in time I either remove work from or add work to an MPI rank (depending on the strategy chosen, this could be quite dramatic, such as a rank having no work to do at all for a while, then it gets a chore, and after a while stops doing that chore again). I am adding PUP routines and migrate calls to the workload to test it using Adaptive MPI. The question is when I should invoke MPI_Migrate. Should I do it just before the load per rank changes, or right after? Because the period during which I add or remove work from a rank could be short, this could make quite a difference. The workload is cyclic, so the runtime can learn, in principle, from load changes in the past.

Thanks for any advice you can offer!

 

Rob

 

 

Attachment: image001.emz
Description: image001.emz

PNG image




Archive powered by MHonArc 2.6.19.

Top of Page