Skip to Content.
Sympa Menu

charm - [charm] Unable to run charm++ on infiniband interface

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Unable to run charm++ on infiniband interface


Chronological Thread 
  • From: Michel Espinoza-Fonseca <mef AT ddt.biochem.umn.edu>
  • To: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: [charm] Unable to run charm++ on infiniband interface
  • Date: Wed, 8 Aug 2012 12:04:45 -0500
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi --
 
Recently I tried to run NAMD using charm++ (charmrun) with infiniband support (ibverbs) on our HP Linux cluster running CentOS. I tested both precompiled and my own compiled versions of charmrun. I normally submit the jobs using the following command line:
 

charmrun ++remote-shell ssh ++p 1400 ++verbose ++nodelist \ namd.hostfile namd2 my_job.in

 

The problem appears shortly after the job starts, which normally ends with charmrun terminating (i.e., NAMD does not even start). Most of the times I get the following error:
 

Charmrun> charmrun started...

Charmrun> using namd.hostfile as nodesfile

Charmrun> remote shell (node0004:0) started

Charmrun> remote shell (node0010:1) started

Charmrun> remote shell (node0020:2) started

...

ERROR> starting rsh: Resource temporarily unavailable

ssh_keysign: fork: Resource temporarily unavailable

ssh_keysign: fork: Resource temporarily unavailable

key_sign failed

ssh_keysign: fork: Resource temporarily unavailable

key_sign failed

ssh_keysign: fork: Resource temporarily unavailable

...

key_sign failed

Permission denied (publickey,keyboard-interactive,hostbased)

 

This is a recurring error which still appears after adding "CONV_RSH=ssh" to the PBS file or changing user limits (i.e., ulimit -u). I probably got it running only once or twice (out of tens of attempts). Interestingly, I also tried the SMP build which seems to work fine when "++ppn" is added to the command line, although NAMD scales poorly compared to the ibverbs build.

 

My question is whether the problem could be related to the configuration of our system or I'm missing something that prevents charmrun from initiating properly.
 
Thanks,
Michel


 



Archive powered by MHonArc 2.6.16.

Top of Page