Skip to Content.
Sympa Menu

charm - Re: [charm] errors when running on multiple physical nodes

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] errors when running on multiple physical nodes


Chronological Thread 
  • From: Evan Ramos <evan AT hpccharm.com>
  • To: Jakub Homola <jakub.homola AT vsb.cz>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] errors when running on multiple physical nodes
  • Date: Tue, 22 Oct 2019 17:56:15 -0500
  • Authentication-results: illinois.edu; spf=none smtp.mailfrom=evan AT hpccharm.com; dkim=pass header.d=hpccharm-com.20150623.gappssmtp.com header.s=20150623; dmarc=none

Hi Jakub,

I believe this example is a red herring because `-memory paranoid` is currently unsupported in SMP mode. Inspection of the crash in a debugger indicates something is going wrong inside the paranoid mode's data structures. I don't observe this issue in a non-SMP build.

--
Evan A. Ramos
Software Engineer
Charmworks, Inc.


On Tue, Oct 22, 2019 at 5:34 PM Jakub Homola <jakub.homola AT vsb.cz> wrote:

Hello,

Thanks for the answer.

 

So I did try a couple of things around that and found out that there actually is some memory corruption. However is seems to be in charm++ generated code. A simple hello world program produces a heap corruption. I am attaching the program and the outputted error message.

 

I compiled the Charm++ library using command “./build charm++ netlrts-linux-x86_64 icc  smp  -j”,

Compiled the simple hello world program using “/path/to/charmc Hello.ci” and then “/path/to/charmc -g -memory paranoid *.cpp -o Hello.x”

And run the program using “./Hello.x”. Also tried running it using “./charmrun ./Hello.x ++local”, but the same error occurred.

The error happened even before anything in the mainchare constructor started executing. The same problem occurred on my local virtual Ubuntu machine as well as on Salomon cluster node running CentOS.

 

I think that this could be somehow related to the original problem I had and would appreciate any help.

Thank you,

Jakub Homola

 

 

From: Evan Ramos
Sent: Monday, October 21, 2019 21:04
To: Jakub Homola
Cc: charm AT lists.cs.illinois.edu
Subject: Re: [charm] errors when running on multiple physical nodes

 

Hi Jakub,

Judging by the stack trace in your second error message, it is likely that the problem is somewhere in your code. I would highly recommend becoming familiar with the ++debug and ++debug-no-pause options, as they will allow you to investigate the issue directly using GDB. You may also want to rebuild Charm++ without the `--with-production` option to enable error checking in the runtime. This set of instructions may help you set up X forwarding: https://uisapp2.iu.edu/confluence-prd/pages/viewpage.action?pageId=280461906

Regards,
--
Evan A. Ramos
Software Engineer
Charmworks, Inc.

 

 




Archive powered by MHonArc 2.6.19.

Top of Page