Skip to Content.
Sympa Menu

charm - Re: [charm] Errors when running charm++ v6.6 with obverts for the qlogic infiniband interface.

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Errors when running charm++ v6.6 with obverts for the qlogic infiniband interface.


Chronological Thread 
  • From: Abhishek Gupta <gupta59 AT illinois.edu>
  • To: "Low, John J." <jlow AT mcs.anl.gov>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Errors when running charm++ v6.6 with obverts for the qlogic infiniband interface.
  • Date: Wed, 7 May 2014 17:00:27 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello,

Can you please test by applying the attached patch (first use qlogic.patch, if that does not work, please apply qlogic2.patch on top of it)?

Thanks,


Abhishek


On Wed, May 7, 2014 at 9:04 AM, Low, John J. <jlow AT mcs.anl.gov> wrote:
Charm++ developers,

I have made several attempts to build charm++ on a Xeon based cluster with a QLogic QDR infiniband network.  I built charm++ with the following command:

"./build charm++ net-linux-x86_64 ibverbs icc --with-production

When I test this build with the hello command I get the following errors.

************************************************************************
Charmrun> IBVERBS version of charmrun
Charmrun> started all node programs in 0.104 seconds.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
[0] Stack Traceback:
  [0:0] CmiAbort+0x4c  [0x51d92c]
  [0:1] initInfiOtherNodeData+0x180  [0x51d100]
  [0:2]   [0x511d48]
  [0:3] ConverseInit+0x13a6  [0x51ab66]
  [0:4] main+0x57  [0x46e567]
  [0:5] __libc_start_main+0xfd  [0x34cc41ed1d]
  [0:6]   [0x46a199]
[0] Stack Traceback:
  [0:0] CmiAbort+0x4c  [0x51d92c]
  [0:1] initInfiOtherNodeData+0x180  [0x51d100]
  [0:2]   [0x511d48]
  [0:3] ConverseInit+0x13a6  [0x51ab66]
  [0:4] main+0x57  [0x46e567]
  [0:5] __libc_start_main+0xfd  [0x34cc41ed1d]
  [0:6]   [0x46a199]
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
Fatal error on PE 0> Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
************************************************************************
I am using version 13.1.3 of the intel compilers. 

Any suggestions on how to build a working ibverbs version of Charm++ for the qlogic PSM interface would be helpful.  We find that the apoa1 benchmark for namd2.9 and charmm++ over mvapich2 does not scale past a few hundred cores on this machine.  We would like to see good scaling up to a few thousand cores for NAMD.  I think having a version of charm++ with ibverbs would help.

Thanks,

John J. Low
Principal Computational Science Specialist
Computing, Environment and Life Sciences
Building 240, 2143
9700 South Cass Avenue
Argonne National Laboratory
Argonne, IL 60439.
630-252-0045
www.linkedin.com/pub/john-low/15/8b0/5aa/



_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm


From 08a5ab57e10f115eff65846298deaa2d9eee8eb3 Mon Sep 17 00:00:00 2001
From: Abhishek <gupta59 AT illinois.edu>
Date: Wed, 7 May 2014 16:51:45 -0500
Subject: [PATCH] Ibverbs - Qlogic bug fix

---
 src/arch/net/machine-ibverbs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/arch/net/machine-ibverbs.c b/src/arch/net/machine-ibverbs.c
index 2d72cc6..bba6a8a 100644
--- a/src/arch/net/machine-ibverbs.c
+++ b/src/arch/net/machine-ibverbs.c
@@ -897,8 +897,8 @@ struct infiOtherNodeData *initInfiOtherNodeData(int node,int addr[3]){
 	// Error code 22 means that there was an invalid parameter when calling to this verbs, try with qlogic-specific parameters 
 	if (err==22)
 	{
-	attr.timeout 	    = 26;
-	attr.retry_cnt 	    = 20;
+	attr.timeout 	    = 14;
+	attr.retry_cnt 	    = 7;
 
 
 	MACHSTATE3(3,"Retry:dlid 0x%x qp 0x%x psn 0x%x",attr.ah_attr.dlid,attr.dest_qp_num,attr.sq_psn);
-- 
1.7.1

From f4a68b340d2e710622e4ff943264c11c1065ccb7 Mon Sep 17 00:00:00 2001
From: Abhishek <gupta59 AT illinois.edu>
Date: Wed, 7 May 2014 16:54:30 -0500
Subject: [PATCH] Qlogic specific define for verbs

---
 src/arch/net/machine-ibverbs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/arch/net/machine-ibverbs.c b/src/arch/net/machine-ibverbs.c
index bba6a8a..cab43f2 100644
--- a/src/arch/net/machine-ibverbs.c
+++ b/src/arch/net/machine-ibverbs.c
@@ -31,7 +31,7 @@
 
 #include <infiniband/verbs.h>
 
-//#define QLOGIC
+#define QLOGIC
 #ifndef QLOGIC
 enum ibv_mtu mtu = IBV_MTU_2048;
 #else
-- 
1.7.1




Archive powered by MHonArc 2.6.16.

Top of Page