Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++ SMP runtime problem

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++ SMP runtime problem


Chronological Thread 
  • From: Filippo Gioachin <gioachin AT uiuc.edu>
  • To: Marius Micluta <marius AT biochim.ro>
  • Cc: charm AT cs.uiuc.edu
  • Subject: Re: [charm] Charm++ SMP runtime problem
  • Date: Mon, 1 Jun 2009 12:28:08 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Marius,

one problem that I see with your setup is that you are using an SMP
build of Charm++, while CharmDebug works only for non-SMP builds.
Can you try without "smp" on your build command?

Regards,
Filippo

On Mon, Jun 1, 2009 at 06:15, Marius Micluta
<marius AT biochim.ro>
wrote:
>
> Dear CHARM developers,
>
> I get some erratic failures when launching NAMD with the
> net-linux-x86_64-smp-icc build of charmrun. I found the SMP version of
> charm++/NAMD to be 25-30% faster than the precompiled version on a small HPC
> cluster (4 compute nodes, each with 2 quad-core Xeon CPUs, Gigabit Ethernet
> interconnect) and much faster (2-3 times) than the MPI and TCP builds.
>
> Started with the ++p 32 ++ppn 8 charmrun command line options, NAMD runs
> fine
> sometimes, but in many cases charmrun strangely freezes after displaying
> "cpu
> topology info is being gathered!". The namd2 processes are launched on all
> the
> four compute nodes, but while on two or three nodes the CPU load approaches
> 100% on all the 8 cores, as in a normal run, on the other(s) the CPU load is
> near zero.
>
> I also tried to run charmdebug, but it fails. Launching from the command
> line the command configured by charmdebug (charmrun +pN
> /usr/local/NAMD/namd2 apoa1.namd ++server +cpd +DebugDisplay
> localhost:10.0), I get a curious error at the same point where the run
> freezes:
>
> Charm++> synchronizing isomalloc memory region...
> CPD: Frozen processor N+1
> [0] consolidated Isomalloc memory region: 0x2aaaab59b000 - 0x7a351707da18
> (83404474 megs)
> Charm++> cpu topology info is being gathered!
> CPD: Frozen processor N
> CPD: Signal received on processor N: 11
> CPD: Frozen processor N
> ------------- Processor N Exiting: Caught Signal ------------
> Signal: segmentation violation,
>
> no matter what value between 1 and 32 I choose for N. When N=32, I get many
> "CPD: Frozen processor" lines, well beyond the number of processors that do
> actually exist in the cluster. I also tried to specify the .nodelist file
> with
> the ++nodelist parameter, as well as other command line options, but with no
> effect.
>
> I compiled charm++ (version 6.1, packaged with the NAMD-2.7b1 source) either
> with an older 10.1.011 Intel compiler and with the latest 11.0.083 version
> from
> Intel, as well as with various optimization options, from -O0 to -O3, but
> the
> lock still occurs sporadically.
>
>
> Best regards,
>
> Marius Micluta
>
> --
>
> Marius Micluta
>
> Institute of Biochemistry of the Romanian Academy
> Tel: +40 21 223 90 69
> Fax: +40 21 223 90 68
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
>




Archive powered by MHonArc 2.6.16.

Top of Page