Skip to Content.
Sympa Menu

charm - Re: [charm] Issues with DistBaseLB in the LeanMD mini-app

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Issues with DistBaseLB in the LeanMD mini-app


Chronological Thread 
  • From: Vinicius Freitas <vinicius.mct.freitas AT gmail.com>
  • To: Harshitha Menon <gplkrsh2 AT illinois.edu>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Issues with DistBaseLB in the LeanMD mini-app
  • Date: Thu, 10 Oct 2019 06:23:27 +0200
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=vinimmbb AT gmail.com; dkim=pass header.d=gmail.com header.s=20161025; dmarc=pass header.from=gmail.com

Hello Harshitha,

We took some time to rerun the experiments, and the synchronization issue disappeared with the flag. It works!

Thank you for the support,

Vinicius F.

Le jeu. 26 sept. 2019 à 15:39, Harshitha Menon <gplkrsh2 AT illinois.edu> a écrit :
Vinicius,

It seems to me that not all the chares have finished migration. Could you try giving +LBSyncResume runtime option? +LBSyncResume forces a global barrier on all the processors to make sure that they can resume computation only after all the migrations have been completed. You can find more details here. If you are performing load balancing overhead analysis and need migration time as well, then you need to turn this option on to ensure that overall load balancing time includes the time for collecting data, load balancing algorithm, and migration time.

-


On Thu, Sep 26, 2019 at 4:03 AM Vinicius Freitas <vinicius.mct.freitas AT gmail.com> wrote:
Hello all,

I have been working with Charm++ and distributed load balancers for a few years now, and in this time, LeanMD has been my favorite mini-app to evaluate new LB strategies for several reasons. However, I have recently changed platforms into a larger experimental environment, and now am having problems with the application restarting after load balancing.
This issues seem to be exclusive to DistBaseLB, as they happen both with DistributedLB as well as with other in-house developed strategies. Even though our strategies (and Distributed) disregard unmigratable Chares, once LeanMD restarts, it issues a Segmentation Fault, which does not happen with centralized LBs, such as GreedyLB.

I will attach the machine configurations and application output in this situation. We have used Charm++ in checkout v6.9.0-rc3, and compiled it with mpicc. The loaded modules in the slurm environment are also attached.

Do you have any idea what could be causing this? How should we proceed to keep experimenting with LeanMD? Or should we disregard it completely? In the later case, are there other MD mini-apps you have been using in experimentation?

Thank you for your support and attention,
Regards,
-- 
Vinicius Marino Calvo Torres de Freitas Computer Science Graduate Student (Aluno de pós-graduação em Ciência da Computação)
Research Assistant at the Embedded Computing Laboratory at UFSC (BR)
Research Intern at LRI at Univ. Paris-Sud (FR)
UFSC - CTC - INE - ECL, Brazil
Univ. Paris-Sud - LRI - ParSys, France
Email: vinicius.mctf AT posgrad.ufsc.br or vinicius.mct.freitas AT gmail.com Tel: +55 (48) 996 163 803


  • Re: [charm] Issues with DistBaseLB in the LeanMD mini-app, Vinicius Freitas, 10/09/2019

Archive powered by MHonArc 2.6.19.

Top of Page