charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Migration of bound arrays

From: Jozsef Bakosi <jbakosi AT lanl.gov>
To: Ronak Buch <rabuch2 AT illinois.edu>
Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
Subject: Re: [charm] Migration of bound arrays
Date: Thu, 24 Jan 2019 16:27:06 -0700
Authentication-results: illinois.edu; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dkim=pass header.d=lanl.gov header.s=lanl; dmarc=pass header.from=lanl.gov

Follow-up question:

If ResumeFromSync is defined by a chare array, can I rely on it being
called only after migration? (Or if LB is not configured, after AtSync?)

The particular behavior I'm seeing is that I'm NOT setting usesAtSync,
NOT calling AtSync, and not even selecting a LB on the command line, yet
I'm getting an output from ResumeFromSync. This can be reproduced by:

//usesAtSync = true;

...

else {
step();
//AtSync();
}

...

void ResumeFromSync() override {
CkPrintf("A %d on %d\n",thisIndex,CkMyPe());
//step();
}

Then running

$ ./charmrun +p4 ./bound_migrate 100000 +cs

If I don't even define ResumeFromSync, then, of course, I get no
migration even with configuring a LB:

$ ./charmrun +p4 ./bound_migrate 1000000 +balancer RandCentLB +LBDebug 1 +cs

Isn't ResumeFromSync only supposed to be called after AtSync?

If in ResumeFromSync I'm supposed to put in the function call that is to
continue with a next iteration, but ResumeFromSync is called even when
migration is not configured, how can I ensure correct application logic?

Thanks,
Jozsef

On 01.24.2019 11:03, Jozsef Bakosi wrote:
> Hi Ronak,
>
> Thanks for confirming the correctness. I have done some more digging and
> found a potential problem, which though seems unrelated to bound arrays.
>
> Consider the code attached. If you fill in the path to charmc in the
> Makefile, the code should be compilable and the problem reproducible:
>
> If quiescence detection is on and I run the code using LB, I get
> quiescence triggered:
>
> =============================================================================
> $ ./charmrun +p4 ./bound_migrate 10 +balancer RandCentLB +LBDebug 1 +cs
>
> Running on 4 processors: ./bound_migrate 10 +balancer RandCentLB +LBDebug
> 1 +cs
> charmrun> /usr/bin/setarch x86_64 -R mpirun -np 4 ./bound_migrate 10
> +balancer RandCentLB +LBDebug 1 +cs
> Charm++> Running on MPI version: 3.1
> Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
> MPI_THREAD_SINGLE)
> Charm++> Running in non-SMP mode: 4 processes (PEs)
> Converse/Charm++ Commit ID:
> Charm++> Using STL-based msgQ:
> Charm++> Using randomized msgQ. Priorities will not be respected!
> CharmLB> Verbose level 1, load balancing period: 0.5 seconds
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 hosts (2 sockets x 18 cores x 1 PUs = 36-way SMP)
> Charm++> cpu topology info is gathered in 0.001 seconds.
> CharmLB> RandCentLB created.
> Running on 4 processors using 10 elements
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: Quiescence detected
> ...
> =============================================================================
>
> However, when LB is not used, it runs without error:
>
> =============================================================================
> $ ./charmrun +p4 ./bound_migrate 10
>
> Running on 4 processors: ./bound_migrate 10
> charmrun> /usr/bin/setarch x86_64 -R mpirun -np 4 ./bound_migrate 10
> Charm++> Running on MPI version: 3.1
> Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
> MPI_THREAD_SINGLE)
> Charm++> Running in non-SMP mode: 4 processes (PEs)
> Converse/Charm++ Commit ID:
> Charm++> Using STL-based msgQ:
> Charm++> Using randomized msgQ. Priorities will not be respected!
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 hosts (2 sockets x 18 cores x 1 PUs = 36-way SMP)
> Charm++> cpu topology info is gathered in 0.002 seconds.
> Running on 4 processors using 10 elements
> A 0 on 0
> A 0 on 0
> A 0 on 0
> ...
> =============================================================================
>
> Also, if one turns off quiescence detection in bound_migrate.C:
>
> // CkStartQD(
> // CkCallback( CkIndex_Main::quiescence(), thisProxy ) );
>
> migration now works fine:
>
> =============================================================================
> $ ./charmrun +p4 ./bound_migrate 10 +balancer RandCentLB +LBDebug 1 +cs
>
> Running on 4 processors: ./bound_migrate 10 +balancer RandCentLB +LBDebug
> 1 +cs
> charmrun> /usr/bin/setarch x86_64 -R mpirun -np 4 ./bound_migrate 10
> +balancer RandCentLB +LBDebug 1 +cs
> Charm++> Running on MPI version: 3.1
> Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
> MPI_THREAD_SINGLE)
> Charm++> Running in non-SMP mode: 4 processes (PEs)
> Converse/Charm++ Commit ID:
> Charm++> Using STL-based msgQ:
> Charm++> Using randomized msgQ. Priorities will not be respected!
> CharmLB> Verbose level 1, load balancing period: 0.5 seconds
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 hosts (2 sockets x 18 cores x 1 PUs = 36-way SMP)
> Charm++> cpu topology info is gathered in 0.002 seconds.
> CharmLB> RandCentLB created.
> Running on 4 processors using 10 elements
>
> CharmLB> RandCentLB: PE [0] step 0 starting at 0.503235 Memory: 1.640625 MB
> CharmLB> RandCentLB: PE [0] strategy starting at 0.503726
> Calling RandCentLB strategy
> CharmLB> RandCentLB: PE [0] Memory: LBManager: 41 KB CentralLB: 1 KB
> CharmLB> RandCentLB: PE [0] #Objects migrating: 9, LBMigrateMsg size: 0.00
> MB
> CharmLB> RandCentLB: PE [0] strategy finished at 0.503734 duration 0.000009
> s
> A 1 on 1
> A 6 on 1
> A 8 on 1
> A 9 on 2
> A 0 on 2
> A 4 on 2
> A 2 on 3
> A 5 on 0
> A 3 on 0
> A 7 on 0
> CharmLB> RandCentLB: PE [0] step 0 finished at 0.503937 duration 0.000702 s
>
>
> CharmLB> RandCentLB: PE [0] step 1 starting at 1.003931 Memory: 1.640625 MB
> CharmLB> RandCentLB: PE [0] strategy starting at 1.003990
> Calling RandCentLB strategy
> CharmLB> RandCentLB: PE [0] Memory: LBManager: 41 KB CentralLB: 1 KB
> CharmLB> RandCentLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00
> MB
> CharmLB> RandCentLB: PE [0] strategy finished at 1.003997 duration 0.000007
> s
> A 7 on 0
> CharmLB> RandCentLB: PE [0] step 1 finished at 1.004095 duration 0.000164 s
>
> ...
>
> All done
> Charm Kernel Summary Statistics:
> Proc 0: [8 created, 8 processed]
> Proc 1: [0 created, 0 processed]
> Proc 2: [0 created, 0 processed]
> Proc 3: [0 created, 0 processed]
> Total Chares: [8 created, 8 processed]
> Charm Kernel Detailed Statistics (R=requested P=processed):
>
> Create Mesgs Create Mesgs Create Mesgs
> Chare for Group for Nodegroup for
> PE R/P Mesgs Chares Mesgs Groups Mesgs Nodegroups
> ---- --- --------- --------- --------- --------- --------- ----------
> 0 R 8 0 10 191 0 0
> P 8 0 10 203 0 0
> 1 R 0 0 0 80 0 0
> P 0 0 10 78 0 0
> 2 R 0 0 0 76 0 0
> P 0 0 10 70 0 0
> 3 R 0 0 0 75 0 0
> P 0 0 10 74 0 0
> [Partition 0][Node 0] End of program
> =============================================================================
>
> I'm not sure if this matters, but note I am running this with randomized
> queues, on top of OpenMPI, and in non-SMP mode, with Charm++ built with:
>
> $ ./build charm++ mpi-linux-x86_64 --enable-error-checking
> --with-prio-type=int --enable-randomized-msgq --suffix randq-debug
> --build-shared -j36 -w -g
>
> I have bumped into this while trying to switch from migration using
> non-blocking AtSync to blocking AtSync (using ResumeFromSync). It
> appears that when quiescence detection is enabled and one uses blocking
> AtSync migration, quiescence is always detected (right within the first
> call to AtSync) and ResumeFromSync is never called. Non-blocking AtSync
> migration appears fine with quiescence detection. Is this a bug or I'm
> not using this correctly?
>
> Eric Mikida: I suspect that this was probably the reason why I could not
> switch to using ResumeFromSync in branch
> https://github.com/quinoacomputing/quinoa/tree/ResumeFromSync.
>
> Thanks,
> Jozsef
>
> On 01.10.2019 13:42, Ronak Buch wrote:
> > Your code seems to be correct, this should migrate A and B together. The
> > only potential issue that I can see is that the code you've provided
> > doesn't use resumeFromSync, which should be used in order to guarantee
> > that
> > migrations have actually completed.
> >
> > Traditionally, m_a isn't pupped, it's set as a readonly instead, and then
> > just used globally, but pupping it as you are doing should work.
> >
> > Waiting for resumeFromSync should make it safe. I believe resumeFromSync
> > will only be called on A, so you can do some messaging to make B know
> > that.
> >
> >
> > All of the above are what I think should be true based on how I understand
> > your sample code and description. I'll try to recreate a simple proof on
> > concept to replicate your issue to do some additional checking and
> > testing.
> > If you have compileable sample code that you can share that exhibits this
> > problem, that would also be appreciated.
> >
> > Thanks,
> > Ronak
> >
> > On Tue, Jan 8, 2019 at 5:21 PM Jozsef Bakosi
> > <jbakosi AT lanl.gov>
> > wrote:
> >
> > > Hi folks,
> > >
> > > Consider the following:
> > >
> > > =========================================
> > > class A : public CProxy_A {
> > > public:
> > > A() { usesAtSync = true; }
> > > };
> > >
> > > class B : public CProxy_B {
> > >
> > > public:
> > > B( CProxy_A& a ) : m_a( a ) {}
> > >
> > > void pup( PUP::er& p ) { p | m_a; }
> > >
> > > A* aptr() { return m_a[ thisIndex ].ckLocal(); }
> > >
> > > void step() { if (time to migrate) aptr()->AtSync(); }
> > >
> > > private:
> > > CProxy_A m_a;
> > > };
> > >
> > >
> > > CProxy_A a = CProxy_A::ckNew();
> > >
> > > CkArrayOptions bound;
> > > bound.bindTo( a );
> > > CProxy_B b = CProxy_B::ckNew( bound );
> > >
> > > a[0].insert();
> > > b[0].insert( a );
> > >
> > > a.doneInserting();
> > > b.doneInserting();
> > > =========================================
> > >
> > > In words: I have two bound chare arrays, A and B. In B I hold a proxy to
> > > A and dereference it via aptr() to get a raw pointer to the bound
> > > object.
> > >
> > > B::step() is called in some iteration and which when it is time to
> > > migrate, calls A::AtSync(). Calling aptr() has worked so far fine
> > > without migration. When I migrate, however, I randomly get segfaults
> > > from aptr().
> > >
> > > Questions:
> > >
> > > 1. Is this a correct way of migrating A and B together?
> > >
> > > 2. Is this a correct way of using usesAtSync and AtSync?
> > >
> > > 3. Is it mandatory to pup m_a as written above?
> > >
> > > 4. How do I know that calling aptr() is safe after migration? In other
> > > words, how do I know that when I call B::aptr() A has already arrived?
> > >
> > > Thanks,
> > > Jozsef

> module a {
>
> array [1D] A {
> entry A( int maxit );
> }
>
> };

> module b {
>
> array [1D] B {
> entry B();
> }
>
> }

> mainmodule main {
>
> extern module a;
> extern module b;
>
> readonly CProxy_Main mainProxy;
>
> mainchare Main {
> entry Main(CkArgMsg *m);
> entry [reductiontarget] void done();
> entry void quiescence();
> };
>
> };

> #include <stdio.h>
>
> #include "main.decl.h"
> #include "a.decl.h"
> #include "b.decl.h"
>
> CProxy_Main mainProxy;
>
> class A : public CBase_A {
>
> public:
> explicit A( int maxit ) : m_maxit(maxit), m_it(0) {
> usesAtSync = true;
> step();
> }
>
> explicit A( CkMigrateMessage* ) {}
>
> void pup( PUP::er &p ) override {
> p | m_maxit;
> p | m_it;
> }
>
> void ResumeFromSync() override {
> CkPrintf("A %d on %d\n",thisIndex,CkMyPe());
> step();
> }
>
> private:
> int m_maxit;
> int m_it;
> void step() {
> if (++m_it == m_maxit)
> contribute( CkCallback(CkReductionTarget(Main,done),
> mainProxy) );
> else {
> AtSync();
> }
> }
> };
>
> class B : public CBase_B {
> public:
> explicit B() { }
> explicit B( CkMigrateMessage* ) {}
> };
>
> /*mainchare*/
> class Main : public CBase_Main {
> public:
> Main(CkArgMsg* m) {
> int nelem=5;
> if (m->argc > 1) nelem=atoi(m->argv[1]);
> delete m;
>
> CkPrintf("Running on %d processors using %d elements\n",
> CkNumPes(),nelem);
> mainProxy = thisProxy;
>
> // CkStartQD(
> // CkCallback( CkIndex_Main::quiescence(), thisProxy ) );
>
> CkArrayOptions bound;
> CProxy_A a = CProxy_A::ckNew();
> //CProxy_B b = CProxy_B::ckNew( bound );
>
> for (int i=0; i<nelem; ++i) {
> a[i].insert( 10 );
> //b[i].insert();
> }
> a.doneInserting();
> //b.doneInserting();
> };
>
> void quiescence() {
> CkAbort("Quiescence detected");
> }
>
> void done()
> {
> CkPrintf("All done\n");
> CkExit();
> };
> };
>
> #include "a.def.h"
> #include "b.def.h"
> #include "main.def.h"

> LB=-module CommonLB
>
> CHARMC=<path-to-charm-install>/bin/charmc $(OPTS)
>
> OBJS = bound_migrate.o
>
> all: bound_migrate
>
> bound_migrate: $(OBJS)
> $(CHARMC) -language charm++ -module CommonLBs -o bound_migrate $(OBJS)
>
> main.decl.h: main.ci
> $(CHARMC) main.ci
>
> a.decl.h: a.ci
> $(CHARMC) a.ci
>
> b.decl.h: b.ci
> $(CHARMC) b.ci
>
> clean:
> rm -f *.decl.h *.def.h *.o bound_migrate charmrun
>
> bound_migrate.o: bound_migrate.C main.decl.h a.decl.h b.decl.h
> $(CHARMC) -c bound_migrate.C

[charm] Migration of bound arrays, Jozsef Bakosi, 01/08/2019
- <Possible follow-up(s)>
- Re: [charm] Migration of bound arrays, Ronak Buch, 01/10/2019
  - Re: [charm] Migration of bound arrays, Jozsef Bakosi, 01/24/2019
    - Re: [charm] Migration of bound arrays, Jozsef Bakosi, 01/24/2019