Skip to Content.
Sympa Menu

charm - Re: [charm] CkLoop for a load balancer

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] CkLoop for a load balancer


Chronological Thread 
  • From: François TESSIER <francois.tessier AT inria.fr>
  • To: Phil Miller <mille121 AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, Gengbin Zheng <zhenggb AT gmail.com>
  • Subject: Re: [charm] CkLoop for a load balancer
  • Date: Fri, 18 Oct 2013 19:18:56 +0200
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

When I run the application with the run command above, it crashes (see attachment) or, sometimes, nothing happens. If I run this with +p8, it works perfectly (the application runs fine and the load balancer is carried out on a parallel way) but, of course, only on the first node...

So, what doesn't work is to execute kNeighbor on 8 (or more) nodes, with 8 processes per node and to be able to run my parallel load balancer on the master node every n iterations.

Thanks for you help

François
--
François TESSIER
PhD Student at University of Bordeaux
Inria - Runtime Team
Tel : 0033.5.24.57.41.52
francois.tessier AT inria.fr
PGP 0x8096B5FA
Le 18/10/2013 17:23, Phil Miller a écrit :
Please be more specific - what were the *problems* that you actually encountered? Everything you described seems to be reasonable.

Did it crash? Did it hang? Did the load balancer not run in parallel? Did you get unexpected output? What happened that was wrong?


On Fri, Oct 18, 2013 at 4:05 AM, François Tessier <francois.tessier AT inria.fr> wrote:
Hi!

With the help of some of you, I wrote a parallel load balancer using
CkLoop. But I encounter some problems to run an application with this
load balancer. For example, I try to do experiments with kNeighbor and I
proceeded like that :

- Build charm++ : ./build charm++ mpi-linux-x86_64-smp --with-production -j
- Go to tmp/libs/ck-libs/ckloop then make
- Compile kNeighbor with -module CkLoop

All these steps succeeded. My run command looks like :

./charmrun +p64 -machinefile ~/machinefile ./kNeighbor +ppn8 64 50
262144 10 +balancer TreeMatchLB +LBDebug 1 +setcpuaffinity +pemap 0-7
+CmiSleepOnIdle

The target platform contains 8 nodes with 8 cores on each. I would like
to carry out kNeighbor on 64 processes and parallelize only the load
balancing with CkLoop.

Do you have any suggestion?

Thanks

François

--
___________________
François TESSIER
PhD Student at University of Bordeaux
Inria - Runtime Team
Tel : 0033.5.24.57.41.52
francois.tessier AT inria.fr
http://runtime.bordeaux.inria.fr/ftessier/
PGP 0x8096B5FA



_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm




Running on 8 processors: -machinefile /home/tessier/machinefile_par
./kNeighbor +ppn8 64 50 262144 10 +balancer TreeMatchLB +LBDebug 1
+setcpuaffinity +pemap 0-7 +CmiSleepOnIdle
charmrun> mpirun -np 8 -machinefile /home/tessier/machinefile_par
./kNeighbor +ppn8 64 50 262144 10 +balancer TreeMatchLB +LBDebug 1
+setcpuaffinity +pemap 0-7 +CmiSleepOnIdle
Charm++> Running on MPI version: 2.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired:
MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 8, 8 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.5.0-beta1-959-g7414d2b
CharmLB> Verbose level 1, load balancing period: 0.5 seconds
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> cpuaffinity PE-core map : 0-7
Charm++> Running on 8 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.233 seconds.
[0] TreeMatchLB created
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue

Starting kNeighbor ...
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
[fourmi052:21146] *** Process received signal ***
[fourmi052:21146] Signal: Segmentation fault (11)
[fourmi052:21146] Signal code: Address not mapped (1)
[fourmi052:21146] Failing at address: 0x30
[fourmi052:21146] [ 0] /lib64/libpthread.so.0 [0x7ffff7bd2a90]
[fourmi052:21146] [ 1] ./kNeighbor(_ZN16FuncSingleHelperC1Ei+0x152) [0x4c0380]
[fourmi052:21146] [ 2]
./kNeighbor(_ZN24CkIndex_FuncSingleHelper32_call_FuncSingleHelper_marshall1EPvS0_+0x8f)
[0x4c17ed]
[fourmi052:21146] [ 3] ./kNeighbor(CkDeliverMessageFree+0x31) [0x509f71]
[fourmi052:21146] [ 4] ./kNeighbor(_Z15_processHandlerPvP11CkCoreState+0xc3f)
[0x50ef0f]
[fourmi052:21146] [ 5] ./kNeighbor(CsdScheduleForever+0x48) [0x5b0a18]
[fourmi052:21146] [ 6] ./kNeighbor(CsdScheduler+0x2d) [0x5b0c9d]
[fourmi052:21146] [ 7] ./kNeighbor [0x5aec18]
[fourmi052:21146] [ 8] ./kNeighbor [0x5aecbb]
[fourmi052:21146] [ 9] /lib64/libpthread.so.0 [0x7ffff7bcb070]
[fourmi052:21146] [10] /lib64/libc.so.6(clone+0x6d) [0x7ffff61c710d]
[fourmi052:21146] *** End of error message ***
[fourmi049:15495] [[26864,0],0]-[[26864,1],0] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 21146 on node fourmi052 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Attachment: signature.asc
Description: OpenPGP digital signature




Archive powered by MHonArc 2.6.16.

Top of Page