Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++ on Arcetri cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++ on Arcetri cluster


Chronological Thread 
  • From: Bilge Acun <acun2 AT illinois.edu>
  • To: "Evgeniia Belousova -X (ebelouso - AAP3 INC at Cisco)" <ebelouso AT cisco.com>
  • Cc: "Landon Noll \(chongo\)" <chongo AT cisco.com>, "Thomas Gilgan -X \(thgilgan - AAP3 INC at Cisco\)" <thgilgan AT cisco.com>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Charm++ on Arcetri cluster
  • Date: Wed, 1 Apr 2015 16:51:34 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Evgeniia,

Can you share your whole code with me? I'll try to reproduce the problem. 
Does this problem occur only on the Arcetri cluster?

Thanks,
~Bilge

On 1 April 2015 at 16:28, Evgeniia Belousova -X (ebelouso - AAP3 INC at Cisco) <ebelouso AT cisco.com> wrote:
Hello all,

I have an issue with running a Charm++ app with charmrun, both in SMP and non-SMP modes, for example:

./charmrun ++p 2 my_app app_options
./charmrun +p2 my_app app_options
./charmrun my_app app_options

There is a nodelist file in the same directory containing the following information:

group main ++shell ssh
host pacini004
host pacini005
host pacini006

The program generates a 1D chare array with two elements:

CProxy_Array a = CProxy_Array::ckNew(arr_size); // Array class constructor has no arguments

The first chare in the array works fine, but the the other reports segmentation violation:

------------- Processor 1 Exiting: Caught Signal ------------
Reason: segmentation violation
Suggestion: Try running with '++debug', or linking with '-memory paranoid'
(memory paranoid requires '+netpoll' at runtime).
[1] Stack Traceback:
  [1:0]   [0x5361d4]
  [1:1]   [0x3f266326a0]
  [1:2] _ZN5Sieve5sieveEP11SieveReqMsg+0x182  [0x46dd92]
  [1:3] _ZN13CkIndex_Sieve23_call_sieve_SieveReqMsgEPvS0_+0x2b  [0x46ecd5]
  [1:4] CkDeliverMessageFree+0x28  [0x4a75b8]
  [1:5] _ZN14CkLocRec_local11invokeEntryEP12CkMigratablePvib+0x87  
[0x4c7a57]
  [1:6] _ZN14CkLocRec_local7deliverEP14CkArrayMessage11CkDeliver_ti+0x18f  
[0x4c8d7f]
  [1:7] _ZN8CkLocMgr7deliverEP9CkMessage11CkDeliver_ti+0x41a  [0x4c331a]
  [1:8] _Z15_processHandlerPvP11CkCoreState+0x493  [0x4ae543]
  [1:9] CsdScheduleForever+0x68  [0x53bb18]
  [1:10] CsdScheduler+0x2d  [0x53bc2d]
  [1:11] ConverseInit+0x34a  [0x5393fa]
  [1:12] main+0x27  [0x49df97]
  [1:13] __libc_start_main+0xfd  [0x3f2661ed5d]
  [1:14]   [0x46cba9]
Fatal error on PE 1> segmentation violation

What may cause such an issue? The program works in the standalone mode (./my_app app_options).

Besides, I’m trying to use charmrun with debugging options (./charmrun  ++ssh-display ++debug-no-pause), but it hangs waiting for the client to connect:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "pacini004", IP:10.10.1.4
Charmrun> Charmrun = 173.36.252.226, port = 56750
start_nodes_rsh
Charmrun> Sending "0 173.36.252.226 56750 20113 0" to client 0.
Charmrun> find the node program "/home/ebelouso/testing/sieve/./sieve" at "/home/ebelouso/testing/sieve" for 0.
Charmrun> Starting ssh pacini004 -l ebelouso /bin/bash -f
Charmrun> remote shell (pacini004:0) started
Charmrun> node programs all started
Charmrun remote shell(pacini004.0)> remote responding...
Charmrun remote shell(pacini004.0)> using xterm /usr/bin/xterm
Charmrun remote shell(pacini004.0)> using debugger /usr/bin/gdb
Charmrun remote shell(pacini004.0)> starting node-program...
Charmrun remote shell(pacini004.0)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.

Are there any other options I should use for debugging?

Regards,

Evgeniia Belousova




--
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department



Archive powered by MHonArc 2.6.16.

Top of Page