Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++/Converse library build errors

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++/Converse library build errors


Chronological Thread 
  • From: Nitin Bhat <nitin AT hpccharm.com>
  • To: "Ortega, Bob" <bobo AT mail.smu.edu>
  • Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: Re: [charm] Charm++/Converse library build errors
  • Date: Mon, 26 Oct 2020 16:14:38 -0500
  • Authentication-results: illinois.edu; spf=none smtp.mailfrom=nitin AT hpccharm.com; dkim=pass header.s=20150623 header.d=hpccharm-com.20150623.gappssmtp.com; dmarc=none

Hi Bob, 

Thanks for your email. 

Can you confirm your UCX version by sending us the output of 'ucx_info -v’? 

I think you’re correct in your assessment. When I tried with an older ucx version (1.3), I was able to replicate your issue. However, starting from ucx 1.4 onwards, those functions (ucp_get_nb and ucp_put_nb) become a part of the UCX API and charm builds correctly with ucx backend. If it’s available, could you try upgrading to hpcx 2.2 and above. Separately, you can directly use UCX (and Open MPI) separately. 

It’s also important to note that there was a known bug (hang) with UCX versions before 1.9. Although it was so far reproduced on only one machine (Frontera @ TACC), it did affect a few specific runs of NAMD, and hence it could affect your runs as well. Since the recently released UCX version 1.9 solves that bug, I recommend that you use that version. It looks hpcx 2.7 ships with that version of UCX. 

Additionally, when you use a UCX build, in order to make sure that charm picks up the ucx libraries from the right place, make sure that the path to the ucx build directory is passed to the charm build command using ‘—basedir=<ucx-base-dir>’. Additionally, you would also need to have <ucx-base-dir>/lib in your LD_LIBRARY_PATH when you run the compiled binary. 

Another comment/question is about the charm build. Is there a reason you’re not using the SMP mode? For larger runs, that mode shows significantly improved performance over the non-SMP mode (your current build command). 

Let us know if you have any additional questions. 

Thanks,
Nitin


On Oct 23, 2020, at 9:14 AM, Ortega, Bob <bobo AT mail.smu.edu> wrote:

Following the NAMD 2.14 Release Notes, I am attempting to build and test the Charm++/Converse library, Infiniband UCX OpenMPI PMIx version.
Using the following command produces errors,
 
./build charm++ ucx-linux=x86_64 icc ompipmix –with-production
 
Resulting in,
 
Performing '/usr/bin/gmake charm++ OPTS=-optimize -production QUIET=' in ucx-linux-x86_64-ompipmix-icc/tmp
/usr/bin/gmake -C libs/ck-libs/completion
gmake[1]: Entering directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'
gmake[1]: Nothing to be done for `all'.
gmake[1]: Leaving directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'
SRCBASE=../../src ./commitid.sh
Dev mode
../bin/charmc  -optimize -production   -I. -o machine.o machine.C
In file included from machine.C(719):
machine-onesided.C(125): error: identifier "ucp_put_nb" is undefined
          statusReq = ucp_put_nb(ep, ncpyOpInfo->srcPtr,
                      ^
 
In file included from machine.C(719):
machine-onesided.C(136): error: identifier "ucp_get_nb" is undefined
          statusReq = ucp_get_nb(ep, (void*)ncpyOpInfo->destPtr,
                      ^
 
compilation aborted for machine.C (code 2)
Fatal Error by charmc in directory /users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp
   Command icpc -fpic -DCMK_GFORTRAN -I../bin/../include -I/usr/include/ -I./proc_management/ -I./proc_management/simple_pmi/ -D__CHARMC__=1 -I. -O2 -fno-stack-protector -c machine.C -o machine.o returned error code 2
charmc exiting...
gmake: *** [machine.o] Error 1
-------------------------------------------------
Charm++ NOT BUILT. Either cd into ucx-linux-x86_64-ompipmix-icc/tmp and try
to resolve the problems yourself, visit
for more information. Otherwise, email the developers at charm AT cs.illinois.edu
 
 
I have consulted with Mellanox thinking this might be an hpcx toolkit related problem.  Their assessment is that it is not a Mellanox issue, but perhaps an hpcx version issue.  We are using version 2.1 hpcx and  Mellanox notes that ucp_put_nb and ucp_get_nb exists in newer HPCX/UCX version, 2.2 or higher.
 
Would you agree that this could be the issue or is there another way to resolve errors?
 
Thank you,
Bob




Archive powered by MHonArc 2.6.19.

Top of Page