Skip to Content.
Sympa Menu

charm - Re: [charm] [EXTERNAL] Re: Segfault from TreeLB::loadConfigFile()?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [EXTERNAL] Re: Segfault from TreeLB::loadConfigFile()?


Chronological Thread 
  • From: Jozsef Bakosi <jbakosi AT lanl.gov>
  • To: Ronak Buch <rabuch2 AT illinois.edu>
  • Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] [EXTERNAL] Re: Segfault from TreeLB::loadConfigFile()?
  • Date: Fri, 29 Oct 2021 11:39:23 -0600
  • Authentication-results: ppops.net; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dkim=pass header.s=lanl header.d=lanl.gov

Hi Ronak,

To reproduce the problem, please do

$ docker pull quinoacomputing/quinoa-build:debian-inciter
$ docker run -ti quinoacomputing/quinoa-build:debian-inciter
quinoa@fae6ddf553da:~$ cd quinoa/build
quinoa@fae6ddf553da:~/quinoa/build$ ctest -R inciter:cfl_u0.9_migr_pe4 -V

As a reference, here are the contents of the dockerfile used to build the
image:

==========================
FROM debian:buster

# Install system-wide prerequisites
#RUN sed -i 's/stable main/stable main contrib non-free/g'
/etc/apt/sources.list
RUN apt-get update -y && apt-get install -y m4 autoconf git cmake gfortran
gcc g++ openmpi-bin libopenmpi-dev gmsh libpugixml-dev libpstreams-dev
libboost-all-dev libblas-dev liblapack-dev liblapacke-dev zlib1g-dev
libhdf5-dev libhdf5-openmpi-dev libbackward-cpp-dev tao-pegtl-dev
binutils-dev libx11-dev libxpm-dev libxft-dev libxext-dev ninja-build flex
bison libdw-dev libdwarf-dev vim wget liblua5.3-dev libssl-dev

# Setup user
RUN adduser --gecos "" --disabled-password quinoa
USER quinoa
WORKDIR /home/quinoa
CMD ["/bin/bash"]
SHELL ["/bin/bash", "-c"]

# Clone quinoa non-recursively
RUN git clone
https://urldefense.com/v3/__http://github.com/quinoacomputing/quinoa.git__;!!DZ3fjg!rSXp2FZyXSTzbwTdTxizil25PdkaddP_Xcp0dEGsBLHFnHD0UFfq6Nr1a4B6kiHRpz2DX7A$

# Checkout commit to be tested
ARG COMMIT
RUN cd quinoa && git checkout $COMMIT && git log -1 HEAD
# Update submodules required by executable tested
RUN wget
https://urldefense.com/v3/__https://raw.githubusercontent.com/quinoacomputing/quinoa/$COMMIT/doc/pages/build.dox__;!!DZ3fjg!rSXp2FZyXSTzbwTdTxizil25PdkaddP_Xcp0dEGsBLHFnHD0UFfq6Nr1a4B6kiHRGAtajg0$

ARG EXECUTABLE
ENV EXECUTABLE $EXECUTABLE
# Get git update submodule command for executable
RUN grep "^@ref $EXECUTABLE" build.dox -A2 | grep -v $EXECUTABLE | grep -v -e
"^$" > submodule_update.sh
RUN cat submodule_update.sh
# Pull in submodules required for the executable
RUN cd quinoa && git submodule init && git submodule update && cd external &&
sh ../../submodule_update.sh
# Build TPLs
RUN cd quinoa && mkdir -p external/build && cd external/build && cmake
-D${EXECUTABLE^^}_ONLY=true -DCHARM_EXTRA_ARGS="--enable-error-checking" ..
&& make -sj$(grep -c processor /proc/cpuinfo)
# Build code
RUN cd quinoa && mkdir -p build && cd build && cmake -GNinja -DRUNNER=mpirun
-DRUNNER_NCPUS_ARG=-n -DRUNNER_ARGS="--bind-to none -oversubscribe"
-DCMAKE_BUILD_TYPE=Debug ../src && ninja ${EXECUTABLE,,}
# Run tests
#RUN cd quinoa/build && npe=$(expr $(grep -c processor /proc/cpuinfo) / 2) &&
if [ ${EXECUTABLE} = unittest ]; then mpirun -n $npe Main/unittest -v -q;
else ctest -j $npe --output-on-failure -LE extreme -R ${EXECUTABLE,,}; fi
==========================

The docker build command was:

docker build --build-arg COMMIT=develop --build-arg EXECUTABLE=inciter -t
quinoacomputing/quinoa-build:debian-inciter --shm-size=1g -f dockerfile .

Inside the image you can also dig out how I built charm. This is stored in
the cmake script

~/quinoa/external/build/charm/src/charm-stamp/charm-build-RELEASE.cmake

which tells you that the charm build command line was:

/home/quinoa/quinoa/external/install/gnu-x86_64/charm/buildold charm++
mpi-linux-x86_64 --enable-error-checking --build-shared --with-production
-j36 -O3 -DNDEBUG

Please let me know if I can help.

Thanks,
Jozsef

On 10.23.2021 14:00, Ronak Buch wrote:
> I've been unable to reproduce this, I've tried both netlrts and MPI builds,
> running both Charm++ example programs and Quinoa with the command line
> you've given, with and without Docker. Are you still seeing these segfaults?
> Do you have any other ideas as to how to reproduce this?
>
> On 9/27/21 18:16, Jozsef Bakosi wrote:
> > The command line is
> >
> > $ mpirun -n 4 --bind-to none -oversubscribe
> > /home/quinoa/quinoa/build/Main/inciter -c slot_cyl_cfl_diagcg.q -i
> > unitsquare_01_3.6k.exo -v -l 10 -u 0.9 +balancer RandCentLB +LBDebug 1 +cs
> >
> > The -u 0.9 means we do overdecomposition.
> >
> > Thanks, Ronak
> > Jozsef
> >
> > On 09.27.2021 22:07, Buch, Ronak Akshay wrote:
> > > I haven't seen this before, but I'll investigate. What's your full
> > > command line?
> > >
> > > On Sep 27, 2021 16:02, Jozsef Bakosi
> > > <jbakosi AT lanl.gov>
> > > wrote:
> > > Hi folks,
> > >
> > > I'm running into a segfault, apparently from TreeLB::loadConfigFile() ,
> > > only in a docker container:
> > >
> > > ------------- Processor 1 Exiting: Caught Signal ------------
> > > Reason: Segmentation fault
> > > [1] Stack Traceback:
> > > [1:1] libpthread.so.0 0x7f00d55db730
> > > [1:2] libc.so.6 0x7f00d55641a7
> > > [1:3] inciter 0x5557cb625af7
> > > [1:4] inciter 0x5557cb6295cc TreeLB::loadConfigFile(CkLBOptions const&)
> > > [1:5] inciter 0x5557cb652017 TreeLB::TreeLB(CkLBOptions const&)
> > > [1:6] inciter 0x5557cb62a36b
> > > CkIndex_TreeLB::_call_TreeLB_marshall1(void*, void*)
> > >
> > > That's all I have as a trace. This is with v7.0.0-rc1. This only happens
> > > with LB on, using RandCentLB.
> > >
> > > Does anyone have an idea what might go wrong and what other info you
> > > need.
> > >
> > > Thanks,
> > > Jozsef



Archive powered by MHonArc 2.6.19.

Top of Page