Skip to Content.
Sympa Menu

ppl-accel - Re: [ppl-accel] 5/9 Accel Meeting Minutes

ppl-accel AT lists.cs.illinois.edu

Subject: Ppl-accel mailing list

List archive

Re: [ppl-accel] 5/9 Accel Meeting Minutes


Chronological Thread 
  • From: Michael Robson <mprobson AT illinois.edu>
  • To: Lukasz Wesolowski <wesolwsk AT illinois.edu>
  • Cc: ppl-accel AT cs.uiuc.edu
  • Subject: Re: [ppl-accel] 5/9 Accel Meeting Minutes
  • Date: Mon, 19 May 2014 10:37:58 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/ppl-accel/>
  • List-id: <ppl-accel.cs.uiuc.edu>

Hey All,

My primary task was trying to profile the benchmarks we ran heterogeneously using projections to determine why they under performed. Unfortunately I ran into some trouble with libraries not existing on the Phi. I think I've solved the problem but I don't have projections logs for all the codes yet. I will have these soon but in the meantime there isn't much for me to report on my end.

Sincerely,
Michael

On May 19, 2014 7:47 AM, "Wesolowski, Lukasz" <wesolwsk AT illinois.edu> wrote:
I believe we were planning to have a meeting/telecon today at 11 am. Let's have everyone send a quick update on progress since the last meeting. We will meet if there are significant new results or issues.

In particular, here are some of the action items from the last meeting:

1. OpenAtom GPU runs: profiling and experiments on larger data sets (Eric)
2. Projections/performance analysis of application runs on the Xeon Phi (Ronak and Michael)
3. Multiple ++ppn support
4. Heterogeneous load balancing: determining how well it is currently supported by Charm++ LB strategies

On my end, I looked at cuBLAS to see if it can be supported in GPU Manager. As Ronak mentioned last time, cuBLAS now allows specifying a CUDA stream in which the operations should complete. I noticed that there are custom cuBLAS functions for transferring data to and from the GPU, so support for that would have to be explicitly added in GPU Manager. Overall, adding cuBLAS support to GPU Manager looks doable.

Lukasz


On Sat, May 10, 2014 at 12:58 AM, Ronak Buch <rabuch2 AT illinois.edu> wrote:
Ronak, Eric M., Michael, Lukasz

Overview of various accelerator tools that exist in Charm++

Eric is using GPUs for OpenAtom, but testing was only on a very small data set; it's not clear if we are getting good performance since timing was not fine and input was small.  Currently, it is using CuBLAS, so it uses synchronized kernel calls.

GPU Manager:

  • Lukasz is currently supporting, plans to write documentation for it, fix stability if issues arise
  • Task based library; instead of looking at GPU operations in isolation, group transfer to, computation, and transfer from as a single unit, and offload the whole thing.
  • Using it will stop the CPU from being idle while the GPU is working.
  • One key aspect is that it has its own memory pool for pinned memory used for GPU transfers.  Otherwise, trying to alloc pinned memory while a kernel is executing will block.
  • Not sure if overlapping communication with computation has changed in more recent versions of CUDA
Lukasz thinks that Offload API would be the best solution for the Xeon Phi over GPU Manager.

Ronak and Michael worked on heterogeneous runs on with Xeon Phi, performance is rather slow.

We should take Dave's thesis work (AEMs) and see how useful it is for various applications.  Also, take a look at G-Charm (according to Lukasz, their techniques are basically the same as Kunzman's) (seems to be no code available for G-Charm)

Sanjay's TODOs:
  1. Read Dave Kunzman's thesis
  2. Run Projections or other performance monitoring tools on Xeon Phi applications
  3. Add multiple ++ppn (SMP) for Xeon Phi.

_______________________________________________
ppl-accel mailing list
ppl-accel AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl-accel





Archive powered by MHonArc 2.6.16.

Top of Page