Skip to Content.
Sympa Menu

charm - Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew()

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew()


Chronological Thread 
  • From: Jozsef Bakosi <jbakosi AT gmail.com>
  • To: Phil Miller <mille121 AT illinois.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew()
  • Date: Wed, 23 Nov 2016 11:40:28 -0700

Thanks, Phil, for the help. I'm happy with the older gcc for now. But will update to a new Charm++ version when you guys release next time.

On Wed, Nov 23, 2016 at 9:48 AM, Phil Miller <mille121 AT illinois.edu> wrote:
Ah, ok. I'm actually surprised anything at all ran as far as it did on that version with the newer compiler.

See this bug for why: https://charm.cs.illinois.edu/redmine/issues/1045
I recommend using the current development version at the moment, or backing off to gcc 5.x, or explicitly passing -fno-lifetime-dse to all of your compilations of Charm++ and application code built atop it.

Unfortunately, fixing this particular optimization incompatibility would require a rather ugly API change for us, and we don't even know what a good replacement API would look like.

Sorry for the hassle encountered. The fix for this is obviously part of the upcoming 6.8 release, but has not otherwise been packaged yet. Perhaps we should have identified this as something critical enough to roll up a 6.7.2 release just for that.

On Wed, Nov 23, 2016 at 10:42 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
tag v6.7.1 at bdf6a1b on Fri Apr 15 16:07:34 2016

On Wed, Nov 23, 2016 at 9:33 AM, Phil Miller <mille121 AT illinois.edu> wrote:
Just as a check, what version of Charm++ is this? One of the 6.7 releases, or some more recent git commit?

On Wed, Nov 23, 2016 at 9:46 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
Alright, here is what I have found.

The problem can be reproduced with the unmodified example in charm/examples/charm++/reductions/typed_reduction, which segfaults at the same memcpy, tested with valgrind, only on cray, linux/openmpi is fine).

It only segfaults with the gcc/6.1.0 cray module and only if any optimization is used: i.e., -O0 is okay, -O[123] fails.

Swapping the gcc module to gcc/5.3.0 solves the problem: both debug and opt buld are fine with this earlier gcc version.

PrgEnv-intel/6.0.3 is also fine (only tried -O3).

A side question: Is it safe to reduce a uint64_t using CkReduction::sum_int? It appears to be so, but I'd like to be sure.

Jozsef

On Tue, Nov 22, 2016 at 10:19 PM, Phil Miller <mille121 AT illinois.edu> wrote:

On Tue, Nov 22, 2016 at 11:12 PM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:

- What build configuration are you using on the Cray system? gni-crayxc? mpi-crayxc? smp or no? The full ./build command line would be useful, along with output from 'module list'

No smp. The build command and modules:

$ build charm++ mpi-crayxc --with-production -j40 -O3 -DNDEBUG


As you explore further, could you test out a build without the optimization flags above, and with -g? It would be useful to know if this is a case where compiler optimizations are changing behavior - likely indicating that the code is doing something invalid in the vicinity, but potentially a compiler bug.

In the same vein, could you try with an earlier version of the gcc module loaded, maybe a 4.8.x or 4.9.x? Or with Intel's compiler and corresponding PrgEnv-intel module?








Archive powered by MHonArc 2.6.19.

Top of Page