Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Charm install Error

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Charm install Error


Chronological Thread 
  • From: Abhishek TYAGI <atyagiaa AT connect.ust.hk>
  • To: Jim Phillips <jim AT ks.uiuc.edu>
  • Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: Re: [charm] [ppl] Charm install Error
  • Date: Mon, 25 Jan 2016 05:02:56 +0000
  • Accept-language: en-US
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=atyagiaa AT connect.ust.hk;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:23

HI Jim,

Thankyou verymuch for the help it is badly needed.

Previously the benchmark time was 10-12 ns/day, and now it is "Info: Initial
time: 1 CPUs 0.190928 s/step 2.20981 days/ns 112.023 MB memory"

after running "nvidia-smi". I can see the GPU were now utilized.
+------------------------------------------------------+

| NVIDIA-SMI 352.39 Driver Version: 352.39 |

|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC
|
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
|
|===============================+======================+======================|
| 0 Tesla M2090 On | 0000:09:00.0 Off | Off
|
| N/A N/A P0 79W / 225W | 752MiB / 6143MiB | 42% Default
|
+-------------------------------+----------------------+----------------------+
| 1 Tesla M2090 On | 0000:0A:00.0 Off | Off
|
| N/A N/A P0 120W / 225W | 648MiB / 6143MiB | 29% Default
|
+-------------------------------+----------------------+----------------------+
| 2 Tesla M2090 On | 0000:0D:00.0 Off | Off
|
| N/A N/A P0 166W / 225W | 648MiB / 6143MiB | 37% Default
|
+-------------------------------+----------------------+----------------------+
| 3 Tesla M2090 On | 0000:0E:00.0 Off | Off
|
| N/A N/A P0 81W / 225W | 648MiB / 6143MiB | 34% Default
|
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes: GPU Memory
|
| GPU PID Type Process name Usage
|
|=============================================================================|
| 0 5205 C /usr/local/matlab-R2015b/bin/glnxa64/MATLAB 102MiB
|
| 0 52978 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 0 52974 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 0 52970 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 0 52966 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 1 52979 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 1 52975 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 1 52971 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 1 52967 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 2 52976 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 2 52964 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 2 52968 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 2 52972 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 3 52973 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 3 52965 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 3 52977 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
|
| 3 52969 C ...NAMD_2.11_Linux-x86_64-netlrts-CUDA/namd2 157MiB
+-----------------------------------------------------------------------------+

Do you have some suggestion to increase GPU performance for the MD, as it
only utilizes 157 memory and volatile GPU-util is between 35-45 %, can i
increase to more or I can utilize maximum of it. I think If i will add
twoaway x/y/z option can increase speed of md. Any suggestion could be
helpful.

Thanks and regards
Abhi

Abhishek Tyagi
PhD Student
Chemical and Biomolecular Engineering
Hong Kong University of Science and Technology
Clear Water Bay, Hong Kong


________________________________________
From: Jim Phillips
<jim AT ks.uiuc.edu>
Sent: Sunday, January 24, 2016 5:21 AM
To: Abhishek TYAGI
Cc:
charm AT cs.illinois.edu
Subject: Re: [ppl] [charm] Charm install Error

You can use this binary:
http://www.ks.uiuc.edu/~jim/tmp/NAMD_2.11_Linux-x86_64-netlrts-CUDA.tar.gz

You will need to use the new "+devicesperreplica" option so that each
replica only uses one device and they do not all use the same device.

I suggest the following options (assuming you want 16 replicas):

./charmrun ++local +p16 ./namd2 +devicesperreplica 1 +replicas 16 +idlepoll
+pemap 0-15

Jim


On Sat, 23 Jan 2016, Abhishek TYAGI wrote:

> Dear Jim,
>
> We have infiband, but the queue for that is too long to get job executed,
> therefore, I have no choice other that to use the the interactive mode. The
> culster information is here
> (https://itsc.ust.hk/services/academic-teaching-support/high-performance-computing/gpu-cluster/access-the-cluster/)
>
> The list of nodes for interactive access is as follows.
> Type: Interactive
> Num of Nodes: 1
> CPU: 2 x Intel E5-2650 (8-core
> Memory: 64GB DDR3-133
> Coprocessor: 4 x Nvidia Tesla M2090
>
> I had tried to run the REUS using the same with 16 windows it shows 10
> ns/days, only on CPU's. Its too slow. I tried many thing, but I am not able
> to connect the GPU with the CPU. I tried other versions but not able to do
> run it.
>
> Thanks
>
> Abi
>
>
> Abhishek Tyagi
> PhD Student
> Chemical and Biomolecular Engineering
> Hong Kong University of Science and Technology
> Clear Water Bay, Hong Kong
>
>
> ________________________________________
> From: Jim Phillips
> <jim AT ks.uiuc.edu>
> Sent: Sunday, January 24, 2016 3:12 AM
> To: Abhishek TYAGI
> Cc:
> charm AT cs.illinois.edu
> Subject: Re: [ppl] [charm] Charm install Error
>
> If your cluster has InfiniBand you can use Linux-x86_64-verbs-smp-CUDA.
>
> If you only have ethernet then I'll need to add a new build to the NAMD
> download list.
>
> Jim
>
> On Sat, 23 Jan 2016, Abhishek TYAGI wrote:
>
>> Dear Jim,
>>
>> I am running this version sucessfully on GPU Cluster, however I am not
>> able to execute binaries with Linux-x86_64-netlrts (Multi-copy
>> algorithms). This can only run on CPU version. I am not able to add GPU in
>> this. Can you suggest how to make it work.
>> The gpu cluster : 4 GPU K20 and 2 E series CPU 32 cores.
>>
>> Abhishek Tyagi
>> PhD Student
>> Chemical and Biomolecular Engineering
>> Hong Kong University of Science and Technology
>> Clear Water Bay, Hong Kong
>>
>>
>> ________________________________________
>> From: Jim Phillips
>> <jim AT ks.uiuc.edu>
>> Sent: Tuesday, December 22, 2015 12:19 PM
>> To: Abhishek TYAGI
>> Cc:
>> charm AT cs.illinois.edu
>> Subject: Re: [ppl] [charm] Charm install Error
>>
>> You would run NAMD using charmrun. Directions are in the release notes.
>>
>> Jim
>>
>> On Tue, 22 Dec 2015, Abhishek TYAGI wrote:
>>
>>> Dear Jim,
>>>
>>> Thankyou for the reply. For the updated version, do you have idea how the
>>> tutorial will work, I mean the tutorial is for mpirun.
>>>
>>> Thanks and regards
>>> Abhi
>>>
>>>
>>>
>>> ________________________________________
>>> From: Jim Phillips
>>> <jim AT ks.uiuc.edu>
>>> Sent: Tuesday, December 22, 2015 12:17 AM
>>> To: Abhishek TYAGI
>>> Cc:
>>> charm AT cs.illinois.edu
>>> Subject: Re: [ppl] [charm] Charm install Error
>>>
>>> The REUS feature is now available with verbs-linux-x86_64 and
>>> verbs-linux-x86_64-smp (for CUDA) Charm++ builds, so you should be able to
>>> use the pre-built binaries that are available for download (the ones that
>>> mention "multi-copy algorithms" in the comment).
>>>
>>> Wait a few hours and you can get the 2.11 release.
>>>
>>> Jim
>>>
>>>
>>> On Mon, 21 Dec 2015, Abhishek TYAGI wrote:
>>>
>>>> Dear Jim,
>>>>
>>>> For the NAMD-MPI compiling I am using NAMD-CSV source code. I want to
>>>> compile MPI version. The reason for compiling NAMD with MPI is to run "A
>>>> Tutorial on One-dimensional Replica-exchange Umbrella Sampling".
>>>> According to this tutorial mpirun will be implimented to execute REUS.
>>>> Therefore, I am compiling this version. The GPU cluster have the
>>>> configuration to use infiband.
>>>> (https://itsc.ust.hk/services/academic-teaching-support/high-performance-computing/gpu-cluster/hardware/)
>>>>
>>>> The command I had tried is as follows:
>>>>
>>>> [atyagiaa@login-0
>>>> charm-6.7.0]$ env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64
>>>> --with-production
>>>>
>>>> [atyagiaa@login-0
>>>> charm-6.7.0]$ which mpicxx
>>>> /usr/local/pgi/linux86-64/2013/mpi2/mpich/bin/mpicxx.
>>>>
>>>> The details of compilation of charm were attached herewith the mail.
>>>>
>>>> (For general molecular simulations, I am using NAMD 2.11b1
>>>> multicore-CUDA build.)
>>>>
>>>> Thanks and regards
>>>> Abhi
>>>>
>>>>
>>>> ________________________________________
>>>> From: Jim Phillips
>>>> <jim AT ks.uiuc.edu>
>>>> Sent: Monday, December 21, 2015 10:02 PM
>>>> To: Abhishek TYAGI
>>>> Cc:
>>>> charm AT cs.illinois.edu
>>>> Subject: Re: [ppl] [charm] Charm install Error
>>>>
>>>> Also, you should be using verbs-smp or net-ibverbs-smp on a GPU cluster.
>>>>
>>>> Jim
>>>>
>>>>
>>>> On Mon, 21 Dec 2015, Phil Miller wrote:
>>>>
>>>>> Hi Abhi,
>>>>>
>>>>> Could you please provide the following extra information:
>>>>> - what underlying compiler does your mpicxx call?
>>>>> - what was the output of the configure script at the beginning of the
>>>>> build
>>>>> process?
>>>>>
>>>>> With that, hopefully we can help resolve this.
>>>>>
>>>>> Phil
>>>>>
>>>>> On Mon, Dec 21, 2015 at 4:08 AM, Abhishek TYAGI
>>>>> <atyagiaa AT connect.ust.hk>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I am trying to install charm-6.7.0 on GPU cluster here in our
>>>>>> university.
>>>>>> But I got the following error, I tried to change the input information,
>>>>>> still the same error is coming, I am trying to compile NAMD for MPI on
>>>>>> the
>>>>>> cluster. For your information, mpirun and mpiexec were already
>>>>>> installed on
>>>>>> the cluster. As a user I have limited access. The error is as
>>>>>>
>>>>>>
>>>>>> ../../../../bin/charmc -O -DCMK_OPTIMIZE=1 -o
>>>>>> ../../../../lib/libmoduleCkMulticast.a ckmulticast.o
>>>>>> ar: creating ../../../../lib/libmoduleCkMulticast.a
>>>>>> /bin/cp CkMulticast.decl.h ../../../../include
>>>>>> /bin/cp CkMulticast.def.h ../../../../include
>>>>>> gmake[1]: Leaving directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/multicast'
>>>>>> /usr/bin/gmake -C libs/ck-libs/pythonCCS
>>>>>> gmake[1]: Entering directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/pythonCCS'
>>>>>> (CHARMINC=../../../../include;. $CHARMINC/conv-config.sh; \
>>>>>> if test "$CMK_BUILD_PYTHON" != ""; then (/usr/bin/gmake conditional
>>>>>> OPTS='-O -DCMK_OPTIMIZE=1 ' || exit 1); fi)
>>>>>> gmake[2]: Entering directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/pythonCCS'
>>>>>> gmake[2]: Nothing to be done for `conditional'.
>>>>>> gmake[2]: Leaving directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/pythonCCS'
>>>>>> gmake[1]: Leaving directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/pythonCCS'
>>>>>> /usr/bin/gmake -C libs/ck-libs/io
>>>>>> gmake[1]: Entering directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/io'
>>>>>> gmake[1]: Nothing to be done for `all'.
>>>>>> gmake[1]: Leaving directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/io'
>>>>>> /usr/bin/gmake -C libs/ck-libs/ckloop
>>>>>> gmake[1]: Entering directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/ckloop'
>>>>>> ../../../../bin/charmc -O -DCMK_OPTIMIZE=1 -lpthread
>>>>>> -I../../../../tmp -o
>>>>>> CkLoop.o CkLoop.C
>>>>>> "CkLoop.h", line 105: error: identifier "__sync_add_and_fetch" is
>>>>>> undefined
>>>>>> return __sync_add_and_fetch(&curChunkIdx, 1);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.h", line 119: error: identifier "__sync_add_and_fetch" is
>>>>>> undefined
>>>>>> __sync_add_and_fetch(&finishFlag, counter);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.h", line 264: warning: statement is unreachable
>>>>>> return NULL;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 18: error: identifier "__thread" is undefined
>>>>>> static __thread pthread_cond_t thdCondition; //the signal var of each
>>>>>> pthread to be notified
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 18: error: "pthread_cond_t" has already been declared
>>>>>> in
>>>>>> the
>>>>>> current scope
>>>>>> static __thread pthread_cond_t thdCondition; //the signal var of each
>>>>>> pthread to be notified
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 18: error: expected a ";"
>>>>>> static __thread pthread_cond_t thdCondition; //the signal var of each
>>>>>> pthread to be notified
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 19: error: identifier "__thread" is undefined
>>>>>> static __thread pthread_mutex_t thdLock; //the lock associated with
>>>>>> the
>>>>>> condition variables
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 19: error: "pthread_mutex_t" has already been
>>>>>> declared in
>>>>>> the
>>>>>> current scope
>>>>>> static __thread pthread_mutex_t thdLock; //the lock associated with
>>>>>> the
>>>>>> condition variables
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 19: error: expected a ";"
>>>>>> static __thread pthread_mutex_t thdLock; //the lock associated with
>>>>>> the
>>>>>> condition variables
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 27: error: variable "pthread_mutex_t" is not a type
>>>>>> name
>>>>>> static pthread_mutex_t **allLocks = NULL;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 28: error: variable "pthread_cond_t" is not a type
>>>>>> name
>>>>>> static pthread_cond_t **allConds = NULL;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 64: error: identifier "thdLock" is undefined
>>>>>> pthread_mutex_init(&thdLock, NULL);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 65: error: identifier "thdCondition" is undefined
>>>>>> pthread_cond_init(&thdCondition, NULL);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 70: error: identifier "__sync_add_and_fetch" is
>>>>>> undefined
>>>>>> __sync_add_and_fetch(&gCrtCnt, 1);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 93: warning: missing return statement at end of
>>>>>> non-void
>>>>>> function "ndhThreadWork"
>>>>>> }
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 98: error: expected an expression
>>>>>> allLocks = (pthread_mutex_t **)malloc(sizeof(void *)*numThreads);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 98: error: expected a ";"
>>>>>> allLocks = (pthread_mutex_t **)malloc(sizeof(void *)*numThreads);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 99: error: expected an expression
>>>>>> allConds = (pthread_cond_t **)malloc(sizeof(void *)*numThreads);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 99: error: expected a ";"
>>>>>> allConds = (pthread_cond_t **)malloc(sizeof(void *)*numThreads);
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.def.h", line 88: warning: variable "newmsg" was declared but
>>>>>> never
>>>>>> referenced
>>>>>> CharmNotifyMsg *newmsg = (CharmNotifyMsg *)this;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.def.h", line 123: warning: variable "newmsg" was declared but
>>>>>> never
>>>>>> referenced
>>>>>> HelperNotifyMsg *newmsg = (HelperNotifyMsg *)this;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.def.h", line 158: warning: variable "newmsg" was declared but
>>>>>> never
>>>>>> referenced
>>>>>> DestroyNotifyMsg *newmsg = (DestroyNotifyMsg *)this;
>>>>>> ^
>>>>>>
>>>>>> "CkLoop.C", line 38: warning: function "HelperOnCore" was declared but
>>>>>> never
>>>>>> referenced
>>>>>> static int HelperOnCore() {
>>>>>> ^
>>>>>>
>>>>>> 17 errors detected in the compilation of "CkLoop.C".
>>>>>> Fatal Error by charmc in directory
>>>>>> /d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/ckloop
>>>>>> Command mpicxx -DCMK_GFORTRAN -I../../../../bin/../include
>>>>>> -D__CHARMC__=1 -DCMK_OPTIMIZE=1 -I../../../../tmp -O -c CkLoop.C -o
>>>>>> CkLoop.o returned error code 2
>>>>>> charmc exiting...
>>>>>> gmake[1]: *** [CkLoop.o] Error 1
>>>>>> gmake[1]: Leaving directory
>>>>>> `/d4/atyagiaa/NAMD_CVS-2015-12-17_Source/charm-6.7.0/mpi-linux-x86_64/tmp/libs/ck-libs/ckloop'
>>>>>> gmake: *** [ckloop] Error 2
>>>>>>
>>>>>> Please suggest me what to do?
>>>>>>
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> Abhi
>>>>>>
>>>>>>
>>>>>> Abhishek Tyagi
>>>>>>
>>>>>> PhD Student
>>>>>>
>>>>>> Chemical and Biomolecular Engineering
>>>>>>
>>>>>> Hong Kong University of Science and Technology
>>>>>>
>>>>>> Clear Water Bay, Hong Kong
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>



Archive powered by MHonArc 2.6.16.

Top of Page