ppl-accel AT lists.cs.illinois.edu
Subject: Ppl-accel mailing list
List archive
- From: Michael Robson <mprobson AT illinois.edu>
- To: "Dokania, Harshit" <hdokani2 AT illinois.edu>
- Cc: "ppl-accel AT cs.uiuc.edu" <ppl-accel AT cs.uiuc.edu>
- Subject: [ppl-accel] Profiling OpenAtom on Taub
- Date: Tue, 2 Jun 2015 16:30:18 -0400
- List-archive: <http://lists.cs.uiuc.edu/pipermail/ppl-accel/>
- List-id: <ppl-accel.cs.uiuc.edu>
Hey Harshit,
I've been working for the past few days to get OpenAtom running on the campus cluster so that I can profile your changes to see how we can improve performance. I ran into a couple of roadblocks in compilation (mainly related to pointing both Charm++ and OpenAtom to where cuda actually lives in /usr/loca/cuda/6.5 instead of /usr/local/cuda as assumed). I was able to overcome most of those by explicitly pointing at the correct CUDA location, although I would be curious if you had to make similar changes. I am using both your branch on Charm (harshit) and on OpenAtom (harshit-gpu) however I had to go back one commit on your OpenAtom branch as the latest one was giving me trouble. After finally getting OpenAtom compiled I attempted to run make test in the OpenAtom directory inside a job that requested a K40m and loaded cuda 6.5. However, I was met with the following segfault:
make[1]: Entering directory `/scratch/users/mprobson/openatom/build-O3'
make[1]: Nothing to be done for `compile'.
make[1]: Leaving directory `/scratch/users/mprobson/openatom/build-O3'
=========== Build results are in the build directory: ./build-O3
make[1]: Entering directory `/scratch/users/mprobson/openatom/build-O3/test-output/regression'
Running regression test ees-nl0l1: EES: off for nonlocals; on for locals; .../bin/sh: line 1: 39409 Segmentation fault ../../OpenAtom ../../../data/water_32M_10Ry/regression/cpaimd_config.p1 ../../../data/water_32M_10Ry/regression/water.input.min.ees-nl0l1 2>&1 > op-ees-nl0l1-p1.log
make[1]: *** [op-ees-nl0l1-p1.log] Error 139
make[1]: Leaving directory `/scratch/users/mprobson/openatom/build-O3/test-output/regression'
make: *** [test-regr] Error 2
I've already downloaded the test water set and placed it in the data directory. I'm unsure at this point how to proceed, especially since the latency between submitting a job and getting it back (even for a single GPU node on Taub) is annoying to debug by hand. Therefore I've come to ask if you have any suggestions for changes I need to make to get this whole setup working.
Thanks,
Michael
- [ppl-accel] Profiling OpenAtom on Taub, Michael Robson, 06/02/2015
Archive powered by MHonArc 2.6.16.