Skip to Content.
Sympa Menu

charm - Re: [charm] Using Load balancers in charm++

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Using Load balancers in charm++


Chronological Thread 
  • From: Aditya Kiran Pandare <apandar AT ncsu.edu>
  • To: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Using Load balancers in charm++
  • Date: Sat, 16 Sep 2017 10:29:35 -0400
  • Authentication-results: illinois.edu; spf=none smtp.mailfrom=apandar AT ncsu.edu

Hello,

This is just an update to my previous question about using load balancers in charm++. After Dr Kale and Dr Pilla pointed me in the right direction, I finally was able to get load balancing to have an effect on the computation time.

The reason was that my code did not have iterations defined by entry-method invocations. Once this was corrected, load balancing clearly showed an effect on the computation time.

Here are some preliminary results using 8 PEs on 8 separate compute nodes. The top timeline is without using load balancers, and the bottom one is using RefineLB.

Thank you everyone, for your help.

Regards,

--
Aditya K Pandare
Graduate Research Assistant
Computational Fluid Dynamics Lab A
3211, Engineering Building III
Department of Mechanical and Aerospace Engineering (MAE)
North Carolina State University

On Wed, Sep 13, 2017 at 6:49 PM, Aditya Kiran Pandare <apandar AT ncsu.edu> wrote:
Thank you for your responses Dr Kale and Dr Pilla.

That is true. My code did not have iterations. I am now writing an iterative version of the calculation, and will send an update on this list as soon as I have the results.

Thanks again for your help; I really appreciate it.

--
Aditya K Pandare
Graduate Research Assistant
Computational Fluid Dynamics Lab A
3211, Engineering Building III
Department of Mechanical and Aerospace Engineering (MAE)
North Carolina State University

On Wed, Sep 13, 2017 at 6:17 PM, Kale, Laxmikant V <kale AT illinois.edu> wrote:

If you don’t have an iterative computation (with many timesteps or iterations), you cannot use periodic load balancers.

(and if you do have an iterative computation, I’d suggest using a much lower LBPeriod).

 

Is it a program with chare-arrays or just 40 singleton chares (which is what I think you are doing)? For singleton chares, with no periodic behavior, you should use “seed-balancers” (Section 7.7 of the charm++ manual: http://charm.cs.illinois.edu/manuals/html/charm++/7.html#SECTION01670000000000000000)

 

If you just post the program, (or email to me), we can answer your question better.

 

As to why 12 instead of 40: how are you terminating the program (i.e. when do you call CkExit() )? It has be after everything is finished. You may need to use quiescence detection. (see manual).

 

          -Sanjay

 

From: Aditya Kiran Pandare <apandar AT ncsu.edu>
Reply-To: Aditya Kiran Pandare <apandar AT ncsu.edu>
Date: Wednesday, September 13, 2017 at 8:29 AM
To: Vinicius Freitas <vinicius.mct.freitas AT gmail.com>
Cc: Laércio Lima Pilla <laercio.pilla AT ufsc.br>, "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
Subject: Re: [charm] Using Load balancers in charm++

 

Hello Vinicius,

I have tried using GreedyLB (as Dr Villa suggested) with the same results: no load balancing effects. I've now tried RefineLB with the debugger verbose set. I've attached the log file and timeline.

 

This seems a very useful option, it shows how chares have been migrated. But still no improvement in results.


--

Aditya K Pandare

Graduate Research Assistant

Computational Fluid Dynamics Lab A

3211, Engineering Building III

Department of Mechanical and Aerospace Engineering (MAE)

North Carolina State University

 

On Tue, Sep 12, 2017 at 7:55 PM, Vinicius Freitas <vinicius.mct.freitas AT gmail.com> wrote:

Dear Aditya,

As Laércio said, DistributedLB won't work unless you have multiple compute nodes to run it on.

To check for migrations, you can run your charm++ program with a +LBDebug 1 or +LBDebug 2, this will make centralized load balancers output some information about their execution.

So I'd try suggest following line and check the results again:
" ./charmrun +p8 ./mandel 4000 0.8 +cs +balancer RefineLB +LBPeriod 1.0 +LBDebug 2 "

Best regards,


-- 
Vinicius Marino Calvo Torres de Freitas
Computer Science Undergratuate Student (Aluno de graduação em Ciência da Computação) 
Research Assistant at the Embedded Computing Laboratory at UFSC
UFSC - CTC - INE - ECL, Brazil
Email: vinicius.mctf AT grad.ufsc.br or vinicius.mct.freitas AT gmail.com 
Tel: +55 (48) 96163803

 

2017-09-12 19:52 GMT-03:00 Laércio Lima Pilla <laercio.pilla AT ufsc.br>:

Dear Aditya,

I could be wrong, but I think DistributedLB is not configured to work when running on a single compute node, as is the situation that you are presenting.

Have you tried running centralized load balancers, like GreedyLB or RefineLB?

Do you have access to a cluster where you could try to run using multiple compute nodes?

Best regards,

Em 2017-09-12 19:11, Aditya Kiran Pandare escreveu:

Hello,

I'm a graduate student from NC State University and am new to parallel programming and the charm++ environment. I'm working on using charm to parallelize a Mandelbrot set calculation. I was able to do this without load balancing; so the next step is trying to use a load balancer, specifically DistributedLB. I'm currently trying the "periodical load balancing mode". I was hoping to get some help from this mailing-list about a few questions I have.

 

The problem I'm facing is that, even when I use a load balancer, I don't see any change in the PE usage (as compared to no load balancer). I've attached the timelines for the case with and without DistributedLB for comparison (timeline_distLB.pdf, timeline_noLB.pdf). I'm trying to debug my code to find the reason why I cannot see any effect of load balancing. I have a hunch that the chares are not getting migrated at all. I have attached the screen outputs when I run with and without the load balancer (DistLB.log, NoLB.log). As you can see, I have run with the +cs flag.

 

My questions:

 

1) Is there a way to check chare-migration in charm++?

 

2) In this test, the number of chares are 40 (as seen in the "Load distribution" screen output). However, the "Total chares" shows only 12 created. Could you explain how I can interpret this?

 

3) Also if we compare the outputs of the two tests, it can be seen that there are differences in the "mesgs for groups" column of the statistics table. Does t his mean that Load Balancing is actually being used by the code, but in an incorrect way?

 

To make sure I got the compilation, etc. right, here's how I proceeded:

 

First, I compiled & linked the code with the "-module CommonLBs". Now, I'm trying to run the code on 8 cores of a single node.

 

Then, the command I used to run the code: ./charmrun +p8 ./mandel 4000 0.8 +cs +balancer DistributedLB +LBPeriod 1.0

(here the ./mandel takes tw o arguments, int and double)

 

Any help is appreciated.

 

Thank you,

 

--

Aditya K Pandare

Graduate Research Assistant

Computational Fluid Dynamics Lab A

3211, Engineering Building III

Department of Mechanical and Aerospace Engineering (MAE)

North Carolina State University

 

--

Laércio Lima Pilla, PhD.
Associate Professor (Professor Adjunto)
UFSC - CTC - INE, Brazil
Email: laercio.pilla AT ufsc.br or laercio.lima.pilla AT gmail.com
Tel: +55 (48) 99152 8120, +55 (48) 3721 7564
Website: www.inf.ufsc.br/~pilla/

 

 



Attachment: 5000p_v0.96_compareLB.jpg
Description: JPEG image

Attachment: RefineLB.log
Description: Binary data




Archive powered by MHonArc 2.6.19.

Top of Page