Skip to Content.
Sympa Menu

charm - Re: [charm] How to verify that AMPI load balancing works?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] How to verify that AMPI load balancing works?


Chronological Thread 
  • From: Marcin Mielniczuk <marmistrz.dev AT zoho.eu>
  • To: Sam White <white67 AT illinois.edu>
  • Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] How to verify that AMPI load balancing works?
  • Date: Mon, 3 Jun 2019 17:54:23 +0200
  • Arc-authentication-results: i=1; mx.zohomail.eu; dkim=pass header.i=zoho.eu; spf=pass smtp.mailfrom=marmistrz.dev AT zoho.eu; dmarc=pass header.from=<marmistrz.dev AT zoho.eu> header.from=<marmistrz.dev AT zoho.eu>
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.eu; s=zohoarc; t=1559577265; h=Content-Type:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=HwHrsUaaG1y43ezkWdGu4w28FIi4G1eTjsDI0RdH4QI=; b=QT3He0Qm8wAh8UDUGuy05u0Z4oNV8JCqGsAiHMzEnrhdWUCLqjAVjhCv8whh9Qjk9zmbUHFPgPJv3gdnZwrBm9PA68KCVLm3HOG9Y0gOfpk9KeYjOoiGiQZ/iv/KSdc0RWg4vl3LA+JWZ/N5Z/9xbJImDIVRnopbwf24PJiSdIc=
  • Arc-seal: i=1; a=rsa-sha256; t=1559577265; cv=none; d=zohomail.eu; s=zohoarc; b=cUAP4HcRi77j+Ddob4HoX1RFaBoR1BBXkoVPA+1hTfxSnYpppFCAV9tLL30IOsSlbG/J8fsDx01kDFdjDEFhHv46Rjm2o2yRGyKawBAfeJAK/BId1Fnoy1uE9WomNnaaJ+vhvvLr8epMb1U7IFTRWBk+sZrlbiXhWbERIhlbnnE=
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=marmistrz.dev AT zoho.eu; dkim=pass header.d=zoho.eu header.s=admin; dmarc=pass header.from=zoho.eu

Hi Sam,

I understand that AMPI_Register_just_migrated is the proper way to do 4? If so, then I confirm that the migration only happens once, on the first call to AMPI_Migrate.

I do call AMPI_Migrate multiple times. This can be seen looking at the "trying to migrate" lines in the stdout and the following lines in the source code: https://github.com/marmistrz/heat_solver/blob/master/main.cpp#L286-L291

Regards,
Marcin

On 03.06.2019 16:15, Sam White wrote:
Hi Marcin,

This is how recommend enabling and testing dynamic load balancing in an AMPI program:
1. Insert periodic calls to AMPI_Migrate(...) with the MPI_Info for LB.
2. Link with "-memory isomalloc -module CommonLBs".
3. First run with "+balancer RotateLB +LBDebug 3".
4. Verify that multiple rounds of migrations are happening (every rank is migrating at each call to AMPI_Migrate(). The AMPI manual has info on how to print the current PE # that a rank is on.

Then you can experiment with other load balancing strategies and options. You should not make calls to AMPI_Register_pup() when using Isomalloc for migration. Isomalloc is essentially a substitute for writing explicit PUP routines at the application level.

For your issue, are you sure that you are calling AMPI_Migrate() more than once? When running with +LBDebug the LB strategy will print some info as soon as it is called each time it is called, so the output in your log file suggests that it's not being called more than once for some reason.
You may also want to run with +LBTestPeSpeed so that the LB framework takes into consideration the heterogeneity of the nodes you are running on.

Let me know if this helps or if you still see migration only happening once. We will improve the manual based on your feedback, so thanks for getting in touch with us!
-Sam





On Mon, Jun 3, 2019 at 8:45 AM Marcin Mielniczuk <marmistrz.dev AT zoho.eu> wrote:
Hi,

I'm evaluating possible options for distributed computing in LAN
networks and came across AMPI. Currently I'm trying to get some hands-on
experience.

I have a toy project to test on, which I have ported to AMPI. [1]
According to the AMPI documentation, no manual PUP routines should be
needed when using isomalloc, so I have only added an AMPI_Migrate call
and an AMPI_Register_just_migrated handler. I'm not sure if it's
correct, because all the AMPI examples seem to have manual PUP routines,
even if using isomalloc.

I'm trying to verify if the processes are actually being migrated.
It appears that the just_migrated handler is never called, moreover,
when running with +LBDebug, the only CharmLB logs refer to the first
call of AMPI_Migrate. It looks like the LB doesn't even consider the
migration later on.

While for GreedyLB the load balancer may have decided that the load
imbalance isn't large enough, I have also tried RotateLB, which should
always migrate and it appears not to. The behavior persists even if I
add extra artificial CPU load on one of the nodes, which should cause a
large load imbalance.

All in all, it looks like AMPI doesn't migrate any process even when run
with a load balancer.

My command line is: 

          ./charmrun +p12 -hostfile hostfile --mca btl_tcp_if_include
<LAN subnet> ./heat_solver --size 14400 --steps 200 --noresults +vp60
+balancer RotateLB +LBDebug 100

My setup is: 2 computers in a common LAN (Ethernet) network, without a
shared file system. One node has 4 CPUs, the other has 8 CPUs. The CPUs
differ between the nodes: one is Intel Core i7-6700 (4GHz), the other
AMD Ryzen 7 1700 Eight-Core (3GHz)

I have attached execution logs. Is the lack of migration just a
programming error on my side or is it an AMPI bug?

Regards,
Marcin

[1] https://github.com/marmistrz/heat_solver




Archive powered by MHonArc 2.6.19.

Top of Page