Skip to Content.
Sympa Menu

charm - Re: [charm] how to run SimpleArrayHello on multiple PCs

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] how to run SimpleArrayHello on multiple PCs


Chronological Thread 
  • From: Gengbin Zheng <zhenggb AT gmail.com>
  • To: Chiara Orsini <chiara.orsini AT iet.unipi.it>
  • Cc: charm <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] how to run SimpleArrayHello on multiple PCs
  • Date: Tue, 29 Jun 2010 15:23:46 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

How different are these two machines?
It looks like the program starts ok on two machines, but they crashed
at very early stage.

Gengbin

On Tue, Jun 29, 2010 at 9:10 AM, Chiara Orsini
<chiara.orsini AT iet.unipi.it>
wrote:
> Dear charm++ users,
>
>
> I tried to run SimpleArrayHello program on multiple PCs, but the program
> ends with this error:
>
> Charmrun> node programs all connected
> Charmrun: error on request socket--
> Socket closed before recv.
>
> I read the FAQ, but linking with -memory paranoid did not give me any useful
> information.
> Consider two host A and B. I configured host A and B in order to be
> accessible with each other through ssh (without password).
> I can run BasicHelloWorld on host A (launching it from host A) successfully.
> I can run BasicHelloWorldon host B (launching it from host A) successfully.
> But, I cannot run BasicHelloWorld on host A and B (launching it from host
> A). This is my nodelist file:
> group main ++shell ssh
> host IP_B
> host IP_A
> This is the output that I obtain running this command:  ./charmrun
> PROGRAM_NAME +p2 ++verbose
> Charmrun> charmrun started...
> Charmrun> using ./nodelist as nodesfile
> Charmrun> adding client 0: "IP_B", IP:IP_B
> Charmrun> adding client 1: "IP_A", IP:IP_A
> Charmrun> Charmrun = IP_A, port = 50501
> Charmrun> Sending "0 IP_A 50501 2547 0" to client 0.
> Charmrun> find the node program "/PATH/PROGRAM_NAME/prefix" at
> "/PATH/PROGRAM_NAME" for 0.
> Charmrun> node 0: xterm is xterm
> Charmrun> Starting ssh IP_B -l user /bin/sh -f
> Charmrun> remote shell (IP_B:0) started
> Charmrun> Sending "1 IP_A 50501 2547 0" to client 1.
> Charmrun> find the node program "/PATH/PROGRAM_NAME/prefix" at
> "/PATH/PROGRAM_NAME" for 1.
> Charmrun> node 1: xterm is xterm
> Charmrun> Starting ssh IP_A -l user /bin/sh -f
> Charmrun> remote shell (IP_A:1) started
> Charmrun> node programs all started
> Charmrun remote shell(IP_A.1)> remote responding...
> Charmrun remote shell(IP_A.1)> using xterm /usr/X11R6/bin/xterm
> Charmrun remote shell(IP_A.1)> starting node-program...
> Charmrun remote shell(IP_A.1)> rsh phase successful.
> Charmrun remote shell(IP_B.0)> remote responding...
> Charmrun remote shell(IP_B.0)> using xterm /usr/X11R6/bin/xterm
> Charmrun remote shell(IP_B.0)> starting node-program...
> Charmrun remote shell(IP_B.0)> rsh phase successful.
> Charmrun> Waiting for 0-th client to connect.
> Charmrun> Waiting for 1-th client to connect.
> Charmrun> client 1 connected (IP=IP_A data_port=54659)
> Charmrun> client 0 connected (IP=IP_B data_port=34817)
> Charmrun> All clients connected.
> Charmrun> IP tables sent.
> Charmrun> node programs all connected
> Charmrun: error on request socket--
> Socket closed before recv.
>
>
> Could anyone explain how to solve this problem?
> Any information you'll give me will be surely appreciated.
> Thank you in advance.
> Best regards,
> Chiara
>
>
>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
>





Archive powered by MHonArc 2.6.16.

Top of Page