Skip to Content.
Sympa Menu

ppl-accel - [ppl-accel] Profiling OpenAtom on Taub

ppl-accel AT lists.cs.illinois.edu

Subject: Ppl-accel mailing list

List archive

[ppl-accel] Profiling OpenAtom on Taub


Chronological Thread 
  • From: Michael Robson <mprobson AT illinois.edu>
  • To: "Dokania, Harshit" <hdokani2 AT illinois.edu>
  • Cc: "ppl-accel AT cs.uiuc.edu" <ppl-accel AT cs.uiuc.edu>
  • Subject: [ppl-accel] Profiling OpenAtom on Taub
  • Date: Tue, 2 Jun 2015 16:30:18 -0400
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/ppl-accel/>
  • List-id: <ppl-accel.cs.uiuc.edu>

Hey Harshit,

I've been working for the past few days to get OpenAtom running on the campus cluster so that I can profile your changes to see how we can improve performance. I ran into a couple of roadblocks in compilation (mainly related to pointing both Charm++ and OpenAtom to where cuda actually lives in /usr/loca/cuda/6.5 instead of /usr/local/cuda as assumed). I was able to overcome most of those by explicitly pointing at the correct CUDA location, although I would be curious if you had to make similar changes. I am using both your branch on Charm (harshit) and on OpenAtom (harshit-gpu) however I had to go back one commit on your OpenAtom branch as the latest one was giving me trouble. After finally getting OpenAtom compiled I attempted to run make test in the OpenAtom directory inside a job that requested a K40m and loaded cuda 6.5. However, I was met with the following segfault:

make[1]: Entering directory `/scratch/users/mprobson/openatom/build-O3'
make[1]: Nothing to be done for `compile'.
make[1]: Leaving directory `/scratch/users/mprobson/openatom/build-O3'
=========== Build results are in the build directory: ./build-O3
make[1]: Entering directory `/scratch/users/mprobson/openatom/build-O3/test-output/regression'
Running regression test ees-nl0l1: EES: off for nonlocals; on for locals; .../bin/sh: line 1: 39409 Segmentation fault      ../../OpenAtom ../../../data/water_32M_10Ry/regression/cpaimd_config.p1 ../../../data/water_32M_10Ry/regression/water.input.min.ees-nl0l1 2>&1 > op-ees-nl0l1-p1.log
make[1]: *** [op-ees-nl0l1-p1.log] Error 139 
make[1]: Leaving directory `/scratch/users/mprobson/openatom/build-O3/test-output/regression'
make: *** [test-regr] Error 2

I've already downloaded the test water set and placed it in the data directory. I'm unsure at this point how to proceed, especially since the latency between submitting a job and getting it back (even for a single GPU node on Taub) is annoying to debug by hand. Therefore I've come to ask if you have any suggestions for changes I need to make to get this whole setup working.

Thanks,
Michael


  • [ppl-accel] Profiling OpenAtom on Taub, Michael Robson, 06/02/2015

Archive powered by MHonArc 2.6.16.

Top of Page