Skip to Content.
Sympa Menu

charm - [charm] how to run processes on a distributed environment

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] how to run processes on a distributed environment


Chronological Thread 
  • From: Chiara Orsini <chiara.orsini AT iet.unipi.it>
  • To: charm AT cs.uiuc.edu
  • Subject: [charm] how to run processes on a distributed environment
  • Date: Wed, 23 Jun 2010 16:06:17 +0200
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Dear Charm users,

I tried to run simplearrayhello program on multiple machines (using a nodelist file) but I got some problems.

Specifically:
1) I build charm-6.2.0-net-darwin-x86_64 on a MacBook (Mac OS X 10.6.4, IP address A) and on an iMac (Mac OS X 10.6.4, IP address B).
2) Then I configured node A and node B in order to obtain a ssh access without password
3) I successfully compiled  charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/  test program on both computers
4) Here is the nodefile I saved on node B:

group main ++shell ssh 
host localhost
host node_B_name
host node_A_IP_address



5)  If I run ./charmrun hello +p2 ++verbose on B terminal the program ends, localhost and  node_B_name, indeed, both refer to the local machine.
     Even if the program ends, I obtain this error message "could not lookup DNS configuration info service: (ipc/send) invalid destination port".
     This is the complete output:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "
node_B_name", IP:node_B_IP_address
Charmrun> Charmrun =
node_B_IP_address, port = 51729
Charmrun> Sending "0
node_B_IP_address51729 4610 0" to client 0.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 0.
Charmrun> Starting ssh localhost -l user /bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1
node_B_IP_address 51729 4610 0" to client 1.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 1.
Charmrun> Starting ssh
node_B_name -l user /bin/sh -f
Charmrun> remote shell (
node_B_name:1) started
Charmrun> node programs all started
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(
node_B_name.1)> remote responding...
Charmrun remote shell(
node_B_name.1)> starting node-program...
Charmrun remote shell(
node_B_name.1)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=58160)
Charmrun> client 1 connected (IP=
node_B_IP_address data_port=58504)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
Charm++> Running on 1 unique compute nodes (2-way SMP).
Running Hello on 2 processors for 5 elements
Hello 0 created
Hello 1 created
Hello 2 created
Hi[17] from element 0
Hi[18] from element 1
Hi[19] from element 2
All done
Hello 3 created
Hello 4 created
Hi[20] from element 3
Hi[21] from element 4
Charmrun> Graceful exit.




6)  If I run ./charmrun hello +p3 ++verbose on node B terminal, the program does not end. This time the program should activate a chare on computer A.
     When I run this command I can see two hello processes running on computerB and one hello process running on computer A.
     As I said the program never ends. This is the output I obtain:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "node_B_name", IP:node_B_IP_address
Charmrun> adding client 2: "node_A_IP_address", IP:node_A_IP_address
Charmrun> Charmrun = node_B_IP_address, port = 51794
Charmrun> Sending "0 node_B_IP_address 51794 4640 0" to client 0.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 0.
Charmrun> Starting ssh localhost -l user/bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1 node_B_IP_address 51794 4640 0" to client 1.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 1.
Charmrun> Starting ssh node_B_name-l user/bin/sh -f
Charmrun> remote shell (node_B_name:1) started
Charmrun> Sending "2 node_B_IP_address 51794 4640 0" to client 2.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 2.
Charmrun> Starting ssh node_A_IP_address-l user/bin/sh -f
Charmrun> remote shell (node_A_IP_address:2) started
Charmrun> node programs all started
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(node_B_name.1)> remote responding...
Charmrun remote shell(node_B_name.1)> starting node-program...
Charmrun remote shell(node_B_name.1)> rsh phase successful.
Charmrun remote shell(node_A_IP_address.2)> remote responding...
Charmrun remote shell(node_A_IP_address.2)> starting node-program...
Charmrun remote shell(node_A_IP_address.2)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> Waiting for 2-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=60925)
Charmrun> client 1 connected (IP=node_B_IP_address data_port=59590)
Charmrun> client 2 connected (IP=node_A_IP_address data_port=49701)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port


I would be very grateful if someone could explain me why I can not run the simplearrayhello program on more than one machine.

Thank you for your attention.

Best regards,

Chiara Orsini

 


  • [charm] how to run processes on a distributed environment, Chiara Orsini, 06/23/2010

Archive powered by MHonArc 2.6.16.

Top of Page