Skip to Content.
Sympa Menu

charm - Re: [charm] charmrun timeout problem

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] charmrun timeout problem


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Dominic Roehm <dominic.roehm AT gmail.com>
  • Cc: charm AT cs.uiuc.edu
  • Subject: Re: [charm] charmrun timeout problem
  • Date: Fri, 21 Nov 2014 18:38:46 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Could you show us the build of Charm++ you're using and the full charmrun command you used?

On Nov 21, 2014 11:42 AM, "Dominic Roehm" <dominic.roehm AT gmail.com> wrote:
Hi,

I tried run my charm code on 2 8-core nodes on my local cluster but I
get a timeout by the node-program. It timeouts during the starting
procedure. The itself code run successful on different workstations and
clusters.  Does anyone have an idea what the problem is or how to get
more information about the issue? Error msg:

Dominic

Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 6: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 7: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 8: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 9: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 10: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 11: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 12: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 13: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 14: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 15: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 10.1.255.254, port = 45243
Charmrun> Sending "$CmiMyNode 10.1.255.254 45243 8527 0" to client 0.
Charmrun> find the node program
"/home/dominic/work/hmm/benchmarks/charm_tp2/../../CoHMM/charm_bin/2D_Kriging"
at "/home/dominic/work/hmm/benchmarks/charm_tp2" for 0.
Charmrun> Starting mpiexec ./charmrun.8527
Charmrun> mpiexec started
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm



Archive powered by MHonArc 2.6.16.

Top of Page