Skip to Content.
Sympa Menu

charm - Re: [charm] charmlu problem: Too little space to plan even one trailing update.

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] charmlu problem: Too little space to plan even one trailing update.


Chronological Thread 
  • From: Ekaterina Tutlyaeva <xgl AT rsc-tech.ru>
  • To: charm AT cs.uiuc.edu
  • Subject: Re: [charm] charmlu problem: Too little space to plan even one trailing update.
  • Date: Fri, 23 Dec 2016 14:29:03 +0300


Sorry, I've found the solution myself- problem was in the Mem Threshold.
I'm tried Mem Threshold (MB): 60000, but it works with increased x1000 size = 60000000 - may, not MB?


By the way, could you be so kind to give me best optimization advices for system benchmarking using charmLU?


Thank you for your time!

./build LIBS  mpi-linux-x86_64 smp -j14 --with-refnum-type=int -axCORE-AVX2,MIC-AVX512

2016-12-23 13:45 GMT+03:00 Ekaterina Tutlyaeva <xgl AT rsc-tech.ru>:

Dear support,

I've problem with charmlu executing..

The charm built successfully with:
./build LIBS  mpi-linux-x86_64 smp -j14 --with-refnum-type=int -axCORE-AVX2,MIC-AVX512

(Could you, please, advice the most optimized build options for Intel Broadwell and Intel Xeon Phi Knight Landing generation (not mic, newest))?

CharmLU also built successfully with MKL.
I'm using the Intel Xeon CPU E5-2698 v4 with 40 cores, allocated using slurm, so I use 40 processes, the single allocated host for this test and small matrix size.
./charmrun +p40 --bootstrap ssh  -hosts=n00p012 ./charmlu 20400 204 60000
 But for all processes I got the same error:

------------- Processor N Exiting: Called CmiAbort ------------
Reason: Too little space to plan even one trailing update.

(full log is at the end of the letter)

Could you, please, give me the hint, what am I doing wrong?
By the way, there is free space on the partitition. The classic MPI executions completes successfully..
Where can I find the roots of the problem?

Thank you very much for your time!


Exectution log:
Running on 40 processors:  --bootstrap ssh -hosts=n00p012 ./charmlu 20400 204 60000
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 40  --bootstrap ssh -hosts=n00p012 ./charmlu 20400 204 60000
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 40,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID:
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (80-way SMP).
Charm++> cpu topology info is gathered in 0.055 seconds.
Running LU compiled from revision: c9cf77459d7b6669b1340fa60b3939d20440c161
Running LU on 40 processors (40 nodes):
        Matrix size: 20400 X 20400
        Block size: 204 X 204
        Chare Array size: 100 X 100
        Pivot batch size: 51
        Mem Threshold (MB): 60000
        Send Limit: 2
        Mapping Scheme: 2 (Block Cyclic)
        Pivot Redn Scheduling: Off
        Useg Multicast from: Diagonal
Starting solve
[0] Array element at index 0 aborting:
[1] Array element at index 1 aborting:
CkMigratable 'BlockScheduler' aborting:
[2] Array element at index 2 aborting:
CkMigratable 'BlockScheduler' aborting:
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: Too little space to plan even one trailing update
[3] Array element at index 3 aborting:
CkMigratable 'BlockScheduler' aborting:
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Too little space to plan even one trailing update
[4] Array element at index 4 aborting:
CkMigratable 'BlockScheduler' aborting:
[5] Array element at index 5 aborting:
CkMigratable 'BlockScheduler' aborting:
------------- Processor 5 Exiting: Called CmiAbort ------------
Reason: Too little space to plan even one trailing update
[6] Array element at index 6 aborting:
CkMigratable 'BlockScheduler' aborting:

... etc. the same error for all 40 processes


__________
Best regards,
Ekaterina



--
__________
С уважением,
Тютляева Екатерина



Archive powered by MHonArc 2.6.19.

Top of Page