charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

RE: [charm] Adaptive MPI

From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
To: Sam White <white67 AT illinois.edu>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: RE: [charm] Adaptive MPI
Date: Wed, 23 Nov 2016 17:28:04 +0000
Accept-language: en-US

Hi Sam,

The first experiment was successful, but the isomalloc example hangs. See below. Unless it is a symptom of something bigger, I am not going to worry about the latter, since I wasn’t planning to use isomalloc for heap migration anyway. My regular MPI code on which the AMPI version is based runs fine for all the parameters I have tried, but I reckon that it may contain a memory bug that manifests itself only with load balancing

Rob

rfvander@klondike:~/Cjacobi3D$ make

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi jacobi.o -module CommonLBs -lm

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -DNO_PUP jacobi.C -o jacobi.iso.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.iso jacobi.iso.o -module CommonLBs -memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -tlsglobal jacobi.C -o jacobi.tls.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.tls jacobi.tls.o -tlsglobal -module CommonLBs #-memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/../lib/libconv-util.a(sockRoutines.o): In function `skt_lookup_ip':

sockRoutines.c:(.text+0x334): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi-get.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi-get jacobi-get.o -module CommonLBs -lm

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.000 seconds.

[0] RotateLB created

iter 1 time: 0.142733 maxerr: 2020.200000

iter 2 time: 0.157225 maxerr: 1696.968000

iter 3 time: 0.172039 maxerr: 1477.170240

iter 4 time: 0.146178 maxerr: 1319.433024

iter 5 time: 0.123098 maxerr: 1200.918072

iter 6 time: 0.131063 maxerr: 1108.425519

iter 7 time: 0.138213 maxerr: 1033.970839

iter 8 time: 0.138295 maxerr: 972.509242

iter 9 time: 0.138113 maxerr: 920.721889

iter 10 time: 0.121553 maxerr: 876.344030

CharmLB> RotateLB: PE [0] step 0 starting at 1.489509 Memory: 72.253906 MB

CharmLB> RotateLB: PE [0] strategy starting at 1.489573

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 1.489592 duration 0.000019 s

CharmLB> RotateLB: PE [0] step 0 finished at 1.507922 duration 0.018413 s

iter 11 time: 0.152840 maxerr: 837.779089

iter 12 time: 0.136401 maxerr: 803.868831

iter 13 time: 0.138095 maxerr: 773.751705

iter 14 time: 0.139319 maxerr: 746.772667

iter 15 time: 0.139327 maxerr: 722.424056

iter 16 time: 0.141794 maxerr: 700.305763

iter 17 time: 0.142484 maxerr: 680.097726

iter 18 time: 0.141056 maxerr: 661.540528

iter 19 time: 0.153895 maxerr: 644.421422

iter 20 time: 0.198588 maxerr: 628.564089

[Partition 0][Node 0] End of program

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 7:10 AM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI

Can you try an example AMPI program with load balancing? You can try charm/examples/ampi/Cjacobi3D/, running with something like '

./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1'. You can also test that example with Isomalloc by running jacobi.iso (and as the warning in the Charm preamble output suggests, run with +isomalloc_sync). It also might help to build Charm++/AMPI with '-g' to get stacktraces.

-Sam

On Wed, Nov 23, 2016 at 2:19 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hello Team,

I am trying to troubleshoot my Adaptive MPI code that uses dynamic load balancing. It crashes with a segmentation fault in AMPI_Migrate. I checked and dchunkpup (which I supplied) is called within AMPI_Migrate and finishes on all ranks. That is not to say it is correct, but the crash is not happening there. It could have corrupted memory elsewhere, though, so I gutted it, such that it only asks for and prints the MPI rank of the ranks entering it. I added graceful exit code after the call to AMPI_Migrate, But that is evidently not reached. I understand that this information is not enough for you to identify the problem, but at present I don’t know where to start, since the error occurs in code that I did not write. Could you give me some pointers where to start? Thanks!

Below is some relevant output. If I replace the RotateLB load balancer with RefineLB, some ranks do pass the AMPI_Migrate call, but that is evidently because the load balancer left them alone.

Rob

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ make clean; make amr USE_PUPER=1

rm -f amr.o MPI_bail_out.o wtime.o amr *.optrpt *~ charmrun stats.json amr.decl.h amr.def.h

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c amr.c

In file included from amr.c:66:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

amr.c: In function â€˜AMPI_Mainâ€™:

amr.c:842:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's BG work tile smaller than stencil radius: %d\n",

              ^

amr.c:1080:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 4 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's work tile %d smaller than stencil radius: %d\n",

              ^

amr.c:1518:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d about to call AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

amr.c:1520:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d called AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/MPI_bail_out.c

In file included from ../../common/MPI_bail_out.c:51:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/wtime.c

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -language ampi -o amr   -O3 -std=c99 -DADAPTIVE_MPI amr.o MPI_bail_out.o wtime.o -lm -module CommonLBs

cc1plus: warning: command line option â€˜-std=c99â€™ is valid for C/ObjC but not for C++

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ /opt/charm/charm-6.7.0/bin/charmrun ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Running command: ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 8 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.001 seconds.

[0] RotateLB created

Parallel Research Kernels Version 2.17

MPI AMR stencil execution on 2D grid

Number of ranks                 = 16

Background grid size            = 1000

Radius of stencil               = 2

Tiles in x/y-direction on BG    = 4/4

Tiles in x/y-direction on ref 0 = 4/4

Tiles in x/y-direction on ref 1 = 4/4

Tiles in x/y-direction on ref 2 = 4/4

Tiles in x/y-direction on ref 3 = 4/4

Type of stencil                 = star

Data type                       = double precision

Compact representation of stencil loop body

Number of iterations            = 20

Load balancer                   = FINE_GRAIN

Refinement rank spread          = 16

Refinements:

   Background grid points       = 500

   Grid size                    = 3993

   Refinement level             = 3

   Period                       = 10

   Duration                     = 5

   Sub-iterations               = 1

Rank 12 about to call AMPI_Migrate in iter 0

Rank 12 entered dchunkpup

Rank 7 about to call AMPI_Migrate in iter 0

Rank 7 entered dchunkpup

Rank 8 about to call AMPI_Migrate in iter 0

Rank 8 entered dchunkpup

Rank 4 about to call AMPI_Migrate in iter 0

Rank 4 entered dchunkpup

Rank 15 about to call AMPI_Migrate in iter 0

Rank 15 entered dchunkpup

Rank 11 about to call AMPI_Migrate in iter 0

Rank 11 entered dchunkpup

Rank 3 about to call AMPI_Migrate in iter 0

Rank 1 about to call AMPI_Migrate in iter 0

Rank 1 entered dchunkpup

Rank 3 entered dchunkpup

Rank 13 about to call AMPI_Migrate in iter 0

Rank 13 entered dchunkpup

Rank 6 about to call AMPI_Migrate in iter 0

Rank 6 entered dchunkpup

Rank 0 about to call AMPI_Migrate in iter 0

Rank 0 entered dchunkpup

Rank 9 about to call AMPI_Migrate in iter 0

Rank 9 entered dchunkpup

Rank 5 about to call AMPI_Migrate in iter 0

Rank 5 entered dchunkpup

Rank 2 about to call AMPI_Migrate in iter 0

Rank 2 entered dchunkpup

Rank 10 about to call AMPI_Migrate in iter 0

Rank 10 entered dchunkpup

Rank 14 about to call AMPI_Migrate in iter 0

Rank 14 entered dchunkpup

CharmLB> RotateLB: PE [0] step 0 starting at 0.507547 Memory: 990.820312 MB

CharmLB> RotateLB: PE [0] strategy starting at 0.511685

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 19 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 16, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 0.511696 duration 0.000011 s

Segmentation fault (core dumped)

[charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- <Possible follow-up(s)>
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
  - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
  - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
  - Message not available
    - Re: [charm] Adaptive MPI, Sam White, 11/23/2016
      - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
      - Message not available
        
        Re: [charm] Adaptive MPI, Sam White, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
        
        Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/25/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016