Skip to Content.
Sympa Menu

charm - Re: [charm] Charm++/Converse library build errors

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm++/Converse library build errors


Chronological Thread 
  • From: Nitin Bhat <nitin AT hpccharm.com>
  • To: "Ortega, Bob" <bobo AT mail.smu.edu>
  • Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: Re: [charm] Charm++/Converse library build errors
  • Date: Thu, 29 Oct 2020 18:40:14 -0500
  • Authentication-results: illinois.edu; spf=none smtp.mailfrom=nitin AT hpccharm.com; dkim=pass header.d=hpccharm-com.20150623.gappssmtp.com header.s=20150623; dmarc=none

Hi Bob, 

Thanks, that makes sense. The older UCX version is causing the compilation issue while building charm and should be fixed with a newer version of ucx/hpcx. 

Yes, MPI and Verbs are valid choices for charm++ builds on Infiniband machines. However, some of our recent experiments have shown that the UCX layer performs slightly better than MPI (and the launcher for verbs doesn’t scale very well). 

The notes.txt file inside the NAMD source directory is a good starting place for documentation that shows you how to build NAMD and run it in the SMP mode. You can also contact the NAMD folks directly through the mailing list (https://www.ks.uiuc.edu/Research/namd/mailing_list/) and look at archived posts. 

Thanks,
Nitin

On Oct 26, 2020, at 6:14 PM, Ortega, Bob <bobo AT mail.smu.edu> wrote:

Nitin,
 
Thank you so much for your detailed response.  As requested, here is the the output from the ucx_info-v command,
 
[bobo@login05 Linux-x86_64-icc_MPI]$ ucx_info -v
# UCT version=1.3.0 revision 0b45e29
# configured with: --enable-optimizations --disable-logging --disable-debug --disable-assertions --disable-params-check --with-knem=/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u4-x86-64-MOFED-CHECKER/hpcx_root/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.1-1.0.2.0-redhat7.4-x86_64/knem --prefix=/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u4-x86-64-MOFED-CHECKER/hpcx_root/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.1-1.0.2.0-redhat7.4-x86_64/ucx
 
As you can see, it appears we are using the 1.3.0 version.  Now that I have confirmation on what is preventing us from completing this build, I will put in the request for an hpcx (Mellanox toolkit) upgrade, but am not sure how and when that will take place. But I do thank you for providing some background information related to the errors.
 
On another note, although I have been able to successfully build all the Charm++/Converse libraries (except ucx-linux-x86_64) listed in the NAMD 2.14 Release Notes, I  can say that the only charm-arch builds (successful icc and gcc) that I have done significant tests with include:
 
verbs-linux-x86_64
mpi-linux-x86_64
 
This testing has been focused on utilizing MPI to attempt to generate parallel runs of NAMD/Charm that are as efficient as possible. 
 
I am new to the HPC group here.  They have been able to run NAMD/Charm, but only serially.  To your question about SMP, I’m open to conducting such tests, gathering data, and comparing results.  The main objective is to help the chemists run NAMD much faster.  
 
To this end, I’d be extremely interested in your suggestions for SMP (especially the data showing significant performance improvement), any further detailed documentation on running NAMD/Charm in parallel (both distributed and shared memory versions), and anything else you think might be worth my efforts.
 
Thanks again for your help and feedback!
 
Bob
 
 
 
 
From: Nitin Bhat <nitin AT hpccharm.com>
Date: Monday, October 26, 2020 at 4:15 PM
To: "Ortega, Bob" <bobo AT mail.smu.edu>
Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
Subject: Re: [charm] Charm++/Converse library build errors
 
Hi Bob, 
 
Thanks for your email. 
 
Can you confirm your UCX version by sending us the output of 'ucx_info -v’? 
 
I think you’re correct in your assessment. When I tried with an older ucx version (1.3), I was able to replicate your issue. However, starting from ucx 1.4 onwards, those functions (ucp_get_nb and ucp_put_nb) become a part of the UCX API and charm builds correctly with ucx backend. If it’s available, could you try upgrading to hpcx 2.2 and above. Separately, you can directly use UCX (and Open MPI) separately. 
 
It’s also important to note that there was a known bug (hang) with UCX versions before 1.9. Although it was so far reproduced on only one machine (Frontera @ TACC), it did affect a few specific runs of NAMD, and hence it could affect your runs as well. Since the recently released UCX version 1.9 solves that bug, I recommend that you use that version. It looks hpcx 2.7 ships with that version of UCX. 
 
Additionally, when you use a UCX build, in order to make sure that charm picks up the ucx libraries from the right place, make sure that the path to the ucx build directory is passed to the charm build command using ‘—basedir=<ucx-base-dir>’. Additionally, you would also need to have <ucx-base-dir>/lib in your LD_LIBRARY_PATH when you run the compiled binary. 
 
Another comment/question is about the charm build. Is there a reason you’re not using the SMP mode? For larger runs, that mode shows significantly improved performance over the non-SMP mode (your current build command). 
 
Let us know if you have any additional questions. 
 
Thanks,
Nitin
 


On Oct 23, 2020, at 9:14 AM, Ortega, Bob <bobo AT mail.smu.edu> wrote:
 
Following the NAMD 2.14 Release Notes, I am attempting to build and test the Charm++/Converse library, Infiniband UCX OpenMPI PMIx version.
Using the following command produces errors,
 
./build charm++ ucx-linux=x86_64 icc ompipmix –with-production
 
Resulting in,
 
Performing '/usr/bin/gmake charm++ OPTS=-optimize -production QUIET=' in ucx-linux-x86_64-ompipmix-icc/tmp
/usr/bin/gmake -C libs/ck-libs/completion
gmake[1]: Entering directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'
gmake[1]: Nothing to be done for `all'.
gmake[1]: Leaving directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'
SRCBASE=../../src ./commitid.sh
Dev mode
../bin/charmc  -optimize -production   -I. -o machine.o machine.C
In file included from machine.C(719):
machine-onesided.C(125): error: identifier "ucp_put_nb" is undefined
          statusReq = ucp_put_nb(ep, ncpyOpInfo->srcPtr,
                      ^
 
In file included from machine.C(719):
machine-onesided.C(136): error: identifier "ucp_get_nb" is undefined
          statusReq = ucp_get_nb(ep, (void*)ncpyOpInfo->destPtr,
                      ^
 
compilation aborted for machine.C (code 2)
Fatal Error by charmc in directory /users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp
   Command icpc -fpic -DCMK_GFORTRAN -I../bin/../include -I/usr/include/ -I./proc_management/ -I./proc_management/simple_pmi/ -D__CHARMC__=1 -I. -O2 -fno-stack-protector -c machine.C -o machine.o returned error code 2
charmc exiting...
gmake: *** [machine.o] Error 1
-------------------------------------------------
Charm++ NOT BUILT. Either cd into ucx-linux-x86_64-ompipmix-icc/tmp and try
to resolve the problems yourself, visit
for more information. Otherwise, email the developers at charm AT cs.illinois.edu
 
 
I have consulted with Mellanox thinking this might be an hpcx toolkit related problem.  Their assessment is that it is not a Mellanox issue, but perhaps an hpcx version issue.  We are using version 2.1 hpcx and  Mellanox notes that ucp_put_nb and ucp_get_nb exists in newer HPCX/UCX version, 2.2 or higher.
 
Would you agree that this could be the issue or is there another way to resolve errors?
 
Thank you,
Bob




Archive powered by MHonArc 2.6.19.

Top of Page