Skip to Content.
Sympa Menu

charm - [charm] FW: Charm++ on Arcetri cluster

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] FW: Charm++ on Arcetri cluster


Chronological Thread 
  • From: "Desouza, Shanna Marie" <desouzas AT illinois.edu>
  • To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: [charm] FW: Charm++ on Arcetri cluster
  • Date: Fri, 3 Apr 2015 10:26:12 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Following was auto-discarded. I also fixed the issue causing this auto-discard.

On 4/2/15, 8:27 PM, "Evgeniia Belousova -X (ebelouso - AAP3 INC at Cisco)" <ebelouso AT cisco.com> wrote:

Hi Bilge,

In the last e-mail I asked you about running a Charm++ app in the background, but it turned out that the issue is about redirecting stdout and stderr.  If I try to run my app using commands like this:
 ./test -f try +io_flush_system > output 2>&1
./test -f try +io_flush_user > output 2>&1
then for some reason my app cannot parse arguments properly, so it dumps it's help message and exits. Is there some hacks which can help to work around this? Or, probably, there are Charm++ functions that can redirect stderr and stdout to a file?

Thank you!

Evgeniia

From: Evegniia Belousova <ebelouso AT cisco.com>
Date: Wednesday, April 1, 2015 at 3:58 PM
To: Bilge Acun <acun2 AT illinois.edu>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, "Landon Noll (chongo)" <chongo AT cisco.com>, "Thomas Gilgan -X (thgilgan - AAP3 INC at Cisco)" <thgilgan AT cisco.com>
Subject: Re: Charm++ on Arcetri cluster

Hi Bilge,

My whole code is attached. 

My program requires GNU MP library, version 6.0.0a (attached as well). By default it will be installed in /usr/local (GNU MP help: https://gmplib.org/manual/index.html#Top). When linking, you’ll need to add BOTH –lgmp and –lgmpxx options (for GMP and GMP for C++ libraries respectively).

I cannot test my code on a different machine (well, it works on my laptop, but I can use only ./charmrun ++local on it).

When running the the app, you need to provide the following argument: -f <file-name-prefix> (the app will create a file with a name starting with the prefix and dump some data into the file).

By the way, what is the proper way to run a Charm++ app in the background? For some reason, my app cannot parse arguments correctly if I try to use nohup for both standalone mode and charmrun.

Thank you!

Evgeniia

From: Bilge Acun <acun2 AT illinois.edu>
Date: Wednesday, April 1, 2015 at 2:51 PM
To: Evegniia Belousova <ebelouso AT cisco.com>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, "Landon Noll (chongo)" <chongo AT cisco.com>, "Thomas Gilgan -X (thgilgan - AAP3 INC at Cisco)" <thgilgan AT cisco.com>
Subject: Re: Charm++ on Arcetri cluster

Hi Evgeniia,

Can you share your whole code with me? I'll try to reproduce the problem. 
Does this problem occur only on the Arcetri cluster?

Thanks,
~Bilge

On 1 April 2015 at 16:28, Evgeniia Belousova -X (ebelouso - AAP3 INC at Cisco) <ebelouso AT cisco.com> wrote:
Hello all,

I have an issue with running a Charm++ app with charmrun, both in SMP and non-SMP modes, for example:

./charmrun ++p 2 my_app app_options
./charmrun +p2 my_app app_options
./charmrun my_app app_options

There is a nodelist file in the same directory containing the following information:

group main ++shell ssh
host pacini004
host pacini005
host pacini006

The program generates a 1D chare array with two elements:

CProxy_Array a = CProxy_Array::ckNew(arr_size); // Array class constructor has no arguments

The first chare in the array works fine, but the the other reports segmentation violation:

------------- Processor 1 Exiting: Caught Signal ------------
Reason: segmentation violation
Suggestion: Try running with '++debug', or linking with '-memory paranoid'
(memory paranoid requires '+netpoll' at runtime).
[1] Stack Traceback:
  [1:0]   [0x5361d4]
  [1:1]   [0x3f266326a0]
  [1:2] _ZN5Sieve5sieveEP11SieveReqMsg+0x182  [0x46dd92]
  [1:3] _ZN13CkIndex_Sieve23_call_sieve_SieveReqMsgEPvS0_+0x2b  [0x46ecd5]
  [1:4] CkDeliverMessageFree+0x28  [0x4a75b8]
  [1:5] _ZN14CkLocRec_local11invokeEntryEP12CkMigratablePvib+0x87  
[0x4c7a57]
  [1:6] _ZN14CkLocRec_local7deliverEP14CkArrayMessage11CkDeliver_ti+0x18f  
[0x4c8d7f]
  [1:7] _ZN8CkLocMgr7deliverEP9CkMessage11CkDeliver_ti+0x41a  [0x4c331a]
  [1:8] _Z15_processHandlerPvP11CkCoreState+0x493  [0x4ae543]
  [1:9] CsdScheduleForever+0x68  [0x53bb18]
  [1:10] CsdScheduler+0x2d  [0x53bc2d]
  [1:11] ConverseInit+0x34a  [0x5393fa]
  [1:12] main+0x27  [0x49df97]
  [1:13] __libc_start_main+0xfd  [0x3f2661ed5d]
  [1:14]   [0x46cba9]
Fatal error on PE 1> segmentation violation

What may cause such an issue? The program works in the standalone mode (./my_app app_options).

Besides, I’m trying to use charmrun with debugging options (./charmrun  ++ssh-display ++debug-no-pause), but it hangs waiting for the client to connect:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "pacini004", IP:10.10.1.4
Charmrun> Charmrun = 173.36.252.226, port = 56750
start_nodes_rsh
Charmrun> Sending "0 173.36.252.226 56750 20113 0" to client 0.
Charmrun> find the node program "/home/ebelouso/testing/sieve/./sieve" at "/home/ebelouso/testing/sieve" for 0.
Charmrun> Starting ssh pacini004 -l ebelouso /bin/bash -f
Charmrun> remote shell (pacini004:0) started
Charmrun> node programs all started
Charmrun remote shell(pacini004.0)> remote responding...
Charmrun remote shell(pacini004.0)> using xterm /usr/bin/xterm
Charmrun remote shell(pacini004.0)> using debugger /usr/bin/gdb
Charmrun remote shell(pacini004.0)> starting node-program...
Charmrun remote shell(pacini004.0)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.

Are there any other options I should use for debugging?

Regards,

Evgeniia Belousova




--
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department



Archive powered by MHonArc 2.6.16.

Top of Page