Skip to Content.
Sympa Menu

charm - Re: [charm] Using Charm AMPI

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Using Charm AMPI


Chronological Thread 
  • From: Scott Field <sfield AT astro.cornell.edu>
  • To: Leonardo Duarte <leo.duarte AT gmail.com>
  • Cc: Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] Using Charm AMPI
  • Date: Thu, 29 Oct 2015 12:00:41 -0400

Hi Leonardo,

  I have a charm++ application running on blue waters, and hopefully some of this will carry over to AMPI.

  In addition to the default blue waters environment, I use 

module swap PrgEnv-cray PrgEnv-gnu/5.2.40
module load craype-hugepages2M
module load rca

  and my charm++ build includes the option "persistent". To launch the application I do

>>> aprun -n 2 -r 1 -N 1 -d 31 ./ExecutableName +ppn 30 +pemap 1-30 +commap 0

On startup, my charm++ output looks different from yours. In particular, I see 

"Charm++> Running in SMP mode: numNodes 2,  30 worker threads per process"

while yours reads 

"Charm++> Running in SMP mode: numNodes 2,  1 worker threads per process"

These differences may or may not explain the errors you see. Hopefully it helps. Good luck!

Scott


On Thu, Oct 29, 2015 at 1:58 AM, Leonardo Duarte <leo.duarte AT gmail.com> wrote:
Hello Everyone,

I'm a PhD student at the CEE department of UIUC and I would
really appreciate if anyone could help me with Charm.

I'm trying to run my code on Blue Waters and I'm using a library that uses Charm++ AMPI.
I was able to build and run everything correctly but extremely slow with PrgEnv-gnu.
Now I'm trying to use the native Cray environment.

I'm using this BW environment and modules:

PrgEnv-cray
module load craype-hugepages8M
module load rca

I built charm with this command line:

./build LIBS gni-crayxe craycc  smp  -j16  --with-production --build-shared -O3

My code is composed by a lot of shared libraries that are loaded dynamically by the application using dlopen, dlsym and etc.

I'm able to build my code using this command lines on my makefiles:

To compile code that do not use Charm:
CC -c -fPIC  -O2 -I../../core/include -I../../tecgraf/tops/include -o ../../obj/obj64/linear/Linux3/linear.o ../../plugins/behavior/linear/linear.cpp

To link code that do not use Charm:
CC -shared -Wl,-soname,liblinear.so.1 -o liblinear.so.1.0 ../../obj/obj64/linear/Linux3/linear.o -L../../tecgraf/tops/lib64/Linux3 -ltops -L../../bin/lib64/Linux3 -ltopsim

To compile code that uses Charm:
charmc -language model -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../../bin/charm/include -o ../../obj/obj64/parebepcg/Linux3/parebepcg.o ../../plugins/linearsystem/ebepcg/parebepcg.cpp

To link code that uses Charm:
charmc -shared -language ampi -Wl,-soname,libparebepcg.so.1 -o libparebepcg.so.1.0 ../../obj/obj64/parebepcg/Linux3/parebepcg.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsim

To compile my app:
charmc -language model -c -fPIC  -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../plugins -o ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o ../../tests/app/parmain.cpp

To link my app:
charmc -language ampi -dynamic -o ../../bin/lib64/Linux3/partopsimapp ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsim -lpartopsimlib -Wl, --no-as-needed -ldl

This is the error that I get:

_pmiu_daemon(SIGCHLD): [NID 16828] [c19-9c1s1n0] [Thu Oct 29 00:35:04 2015] PE RANK 0 exit signal Segmentation fault
[NID 16828] 2015-10-29 00:35:04 Apid 28607883: initiated application termination
_pmiu_daemon(SIGCHLD): [NID 16829] [c19-9c1s1n1] [Thu Oct 29 00:35:04 2015] PE RANK 1 exit signal Segmentation fault

I put some extra infos at the end of the email if you need.
I read a lot of things on the internet and I've been trying a lot but know I think I need some help.
Am I missing something? Is this the correct way handle it?
I really appreciate any suggestions.

Thank you.
Leonardo.

Extra infos

These are my environment variables:

echo $PATH
.:/u/psp/duarte/bin/lua5:/u/psp/duarte/bin/tolua5:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/bin:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/usr/local/gsi-openssh-6.2p2-2/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/globus-5.2.4/bin:/usr/local/globus-5.2.4/sbin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2-bwpatch/sbin:/opt/torque/5.0.2-bwpatch/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/rca/1.0.0-2.0502.53711.3.125.gem/bin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1873.1.142.gem/bin:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/pmi/5.0.6-1.0000.10439.140.3.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/cce/8.3.10/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/8.3.10/craylibs/x86-64/bin:/opt/cray/cce/8.3.10/cftn/bin:/opt/cray/cce/8.3.10/CC/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1231.0/bin:/opt/modules/3.2.10.3/bin:/u/psp/duarte/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin

echo $LD_LIBRARY_PATH
.:/u/psp/duarte/topsim/bin/lib64/Linux3:/u/psp/duarte/topsim/bin/libd64/Linux3:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib_so:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/lib:/u/psp/duarte/lib:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib:/usr/local/globus-5.2.4/lib64:/usr/local/globus/lib64


My app output:

Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: numNodes 2,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.6.1-0-g74a2cc5
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 2 unique compute nodes (32-way SMP).
*** Topsim 0.1.0 ***
[0] topParInit() registered
[0] TopParContext created: 0!
[0] topParInit() array created
[1] TopParContext created: 1!
[1] topParInit() registered
[1] topParInit() array created
[0] topParInit() done!
[1] topParInit() done!
[0] PARTOPS: Slave started at processor 0, node: 0, rank: 0.
[0] PARTOPS: MODEL CREATED! rank: 0
[1] PARTOPS: Slave started at processor 1, node: 1, rank: 0.
[1] PARTOPS: MODEL CREATED! rank: 0
Plugin loaded libparebepcg.so
Plugin loaded libpartreader.so
Plugin loaded libisotropic.so
Plugin loaded liblinear.so
Plugin loaded libparsimp.so
Plugin loaded libbrick.so
Plugin loaded libpartreader.so
Plugin loaded libparebepcg.so
Plugin loaded libparloadcontrol.so
Plugin loaded libparwriter.so
Plugin loaded libparsimp.so
Plugin loaded libparjacobi.so
Plugin loaded libbrick.so
Plugin loaded libparwriter.so
Plugin loaded liblinear.so
Plugin loaded libisotropic.so
Plugin loaded libparloadcontrol.so
Plugin loaded libparjacobi.so
Application 28607883 exit codes: 139
Application 28607883 resources: utime ~2s, stime ~2s, Rss ~15384, inblocks ~10927, outblocks ~18489
Thu Oct 29 00:35:04 CDT 2015

This is my PBS script

#!/bin/bash
### set the number of nodes
### set the number of PEs per node
#PBS -l nodes=2:ppn=1:xe
### set the wallclock time
#PBS -l walltime=00:20:00
### set the job name
#PBS -N topsim
### set the job stdout and stderr
#PBS -e topsim.err
#PBS -o topsim.out
### set email notification
#PBS -m bea
### In case of multiple allocations, select which one to charge
##PBS -A xyz

# NOTE: lines that begin with "#PBS" are not interpreted by the shell but ARE
# used by the batch system, wheras lines that begin with multiple # signs,
# like "##PBS" are considered "commented out" by the batch system
# and have no effect.

# If you launched the job in a directory prepared for the job to run within,
# you'll want to cd to that directory
# [uncomment the following line to enable this]
cd $PBS_O_WORKDIR

# Alternatively, the job script can create its own job-ID-unique directory
# to run within.  In that case you'll need to create and populate that
# directory with executables and perhaps inputs
# [uncomment and customize the following lines to enable this behavior]
# mkdir -p /scratch/sciteam/$USER/$PBS_JOBID
# cd /scratch/sciteam/$USER/$PBS_JOBID
# cp /scratch/job/setup/directory/* .

# To add certain modules that you do not have added via ~/.modules
. /opt/modules/default/init/bash # NEEDED to add module commands to shell
#module swap PrgEnv-cray PrgEnv-gnu
module add craype-hugepages8M
module add rca

#export CRAY_ROOTFS=DSL
echo $LD_LIBRARY_PATH

#export APRUN_XFER_LIMITS=1  # to transfer shell limits to the executable

### launch the application
### redirecting stdin and stdout if needed
### NOTE: (the "in" file must exist for input)

# used for timing
date

aprun -n2 -N1 ./partopsimapp ../../../tests/data/input/config/plugins_simp_parebepcg_jacobi_brick.lua ../../../tests/data/input/examples/CantSymm/CantSymm12_2.pos ../../../tests/data/output/CantSymm12_2_result.pos

# used for timing
date
### For more information see the man page for aprun




Archive powered by MHonArc 2.6.16.

Top of Page