Skip to Content.
Sympa Menu

charm - Re: [charm] charmrun timeout problem

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] charmrun timeout problem


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Dominic Roehm <dominic.roehm AT gmail.com>, Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] charmrun timeout problem
  • Date: Sun, 23 Nov 2014 11:47:25 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Could you also send the output of the command "ip addr"? It looks like charmrun is listening on an interface bound to the address "10.1.255.254", but that may not be an address that the nodes you're running on can connect back to.

On Sat, Nov 22, 2014 at 10:07 AM, Dominic Roehm <dominic.roehm AT gmail.com> wrote:
Hi Phil,

I used the

 ./build charm++ net-linux-x86_64 --with-production -j8

to build charm 6.6.0 from the tar.

The log files with outout and the errorout of the build are attached. To start the simulation I used:

../../CoHMM/charm_bin/charmrun +p16  ++mpiexec ../../CoHMM/charm_bin/2D_Kriging input.json +stacksize 512000 ++verbose


Dominic


On 11/22/2014 01:38 AM, Phil Miller wrote:

Could you show us the build of Charm++ you're using and the full charmrun command you used?

On Nov 21, 2014 11:42 AM, "Dominic Roehm" <dominic.roehm AT gmail.com> wrote:
Hi,

I tried run my charm code on 2 8-core nodes on my local cluster but I
get a timeout by the node-program. It timeouts during the starting
procedure. The itself code run successful on different workstations and
clusters.  Does anyone have an idea what the problem is or how to get
more information about the issue? Error msg:

Dominic

Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 6: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 7: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 8: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 9: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 10: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 11: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 12: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 13: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 14: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 15: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 10.1.255.254, port = 45243
Charmrun> Sending "$CmiMyNode 10.1.255.254 45243 8527 0" to client 0.
Charmrun> find the node program
"/home/dominic/work/hmm/benchmarks/charm_tp2/../../CoHMM/charm_bin/2D_Kriging"
at "/home/dominic/work/hmm/benchmarks/charm_tp2" for 0.
Charmrun> Starting mpiexec ./charmrun.8527
Charmrun> mpiexec started
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm





Archive powered by MHonArc 2.6.16.

Top of Page