Skip to Content.
Sympa Menu

charm - [charm] charm 6.2.2 and ibverbs

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] charm 6.2.2 and ibverbs


Chronological Thread 
  • From: Lukasz Flis <l.flis AT cyf-kr.edu.pl>
  • To: "Undisclosed.Recipients":
  • Subject: [charm] charm 6.2.2 and ibverbs
  • Date: Tue, 23 Nov 2010 01:54:05 +0100
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
  • Organization: ACC Cyfronet

Dear Charm++ users and developers,

During testing our new cluster with Qlogic based infiniband we have
encountered a problem with NAMD (based on charm 6.2.2):

Attempt to run ibverbs based version ends up with the following error:

...
Charmrun> node programs all started
Charmrun remote shell(n12-1-2.local.0)> remote responding...
Charmrun remote shell(n12-1-2.local.1)> remote responding...
Charmrun remote shell(n12-1-2.local.0)> starting node-program...
Charmrun remote shell(n12-1-2.local.0)> rsh phase successful.
Charmrun remote shell(n12-1-2.local.1)> starting node-program...
Charmrun remote shell(n12-1-2.local.1)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charmrun: error on request socket--
Socket closed before recv.


Using the strace we have obtained additional information on the problem:
write(11,
"\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0"..., 120)
= -1 EINVAL (Invalid argument)
write(2, "Failed to modify QP to RTS\n", 27) = 27
| 00000 46 61 69 6c 65 64 20 74 6f 20 6d 6f 64 69 66 79 Failed t o modify
|
| 00010 20 51 50 20 74 6f 20 52 54 53 0a QP to R TS.
|


Problem has been confirmed to be a Charm++ related by running basic megatest
(pgm) from charm++ suite. The error was exactly the same.

OpenMPI (1.4.3) version of charm using IBVerbs didn't reported any problems.

Additional information:
IB Stack OFED 1.5.2
libibverbs-1.1.4-0.14.gb6c138b
libipathverbs-1.2-2.el5

IB cards: InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 01)

The same version of charm works properly on Mellanox HCAs with OFED 1.4.2

I am not sure whether it's a charm problem or ib related. Any comments and
ideas how to debug the problem are welcome


Best Regards
--
Lukasz Flis



  • [charm] charm 6.2.2 and ibverbs, Lukasz Flis, 11/22/2010

Archive powered by MHonArc 2.6.16.

Top of Page