Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] Using TAU with AMPI

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] Using TAU with AMPI


Chronological Thread 
  • From: Xoan Carlos Pardo Martinez <xoanpardo AT gmail.com>
  • To: Chee Wai Lee <cheewai1972 AT gmail.com>
  • Cc: charm AT cs.uiuc.edu
  • Subject: Re: [charm] [ppl] Using TAU with AMPI
  • Date: Fri, 29 Jun 2012 12:13:04 +0200
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Chee,

I'm very grateful to you for your time and assistance.

From your explanations I see my initial supposition was too simplistic. 

If it is of any help I´m executing my tests using "setcpuaffinity" and "pemap" options because I'm not interested (at the moment) in
thread migration, by the contrary, I want my MPI processes to be tied to specific CPU threads.

In the next couple of weeks I'm attending a full-day course, so I'm not going to have time to test anything more. Take your time to test
your ideas ;-D and thanks a lot again!!!!

Best regards,

Xoán C. Pardo


Em 27/06/2012, às 10:04, Chee Wai Lee escreveu:

Hi,

I have thought a little about the problem and will need to spend some more time testing out my ideas over the weekend before I can tell you for sure if what you need can be done with TAU.

Just a brief overview of my thoughts so far:

1. TAU can acquire the AMPI rank each time an AMPI event is encountered through Charm++.
2. TAU has a "context" construct which is so far never really used.
3. Combine this with a regular Charm++ process rank, it may be possible to generate profiles of the nature:

profile.<charm rank>.<AMPI rank>.0

4. Each profile of the above form should represent the work done by each AMPI processor object.

The main question for me is what performance information gets captured in those profiles and how they are presented. The answers to those questions I will have to try to figure out over the weekend.

Thanks!

Best Regards,
Chee Wai

On Jun 24, 2012, at 9:40 PM, Chee Wai Lee wrote:

Hi Xoan,

Good to hear from you again!

Some clarifications about Charm++/TAU with respect to MPI and AMPI:

Your understanding of MPI wrapping by TAU is correct. However, making TAU invoke AMPI_* in place of PMPI_* in its MPI wrappers is not the right thing to do. The reason is because AMPI invocations are (in Charm++ terminology) handled in a "split-phase" fashion. I believe this is still (PPL folk please correct me if I am wrong) the current implementation model for AMPI:

a. AMPI user code executes in its own user-thread of execution (AMPI rank).
b. AMPI_Send invocations put out Charm++ messages to be delivered.
c. AMPI_Recv invocations suspend the local user-thread, allowing any pending Charm++ messages to be delivered to the local process. The NEXT user-thread (AMPI rank) that gets executed on the SAME process is NOT necessarily the same as the one that just suspended itself as a result of the AMPI_Recv call. This is because the number of AMPI ranks per Charm++ process can be >> 1.

The trouble with the direct wrapping of AMPI by TAU is that the above model is NOT what TAU expects! When wrapping MPI calls, TAU expects each MPI rank to own its own process. For example, TAU does not expect more than one MPI_Init invocation to happen in the same process space. Specific to the above description c), TAU's MPI wrapper would enter the following illegal state:

MPI_Recv wrapper
...
TAU_PROFILE_START()
...
AMPI_Recv() // suspends the user-thread!!! But we are still in the same process space!!!
// Now all sorts of things happen in the Charm++ runtime, including going into IDLE mode which can trigger other TAU events, violating TAU re-entrancy rules.
...
TAU_PROFILE_STOP() // By the time we get here, it is most probably a complete mess if AMPI ranks per process >> 1.
...

Normally, for Charm++/TAU integration, TAU relies on information provided by the Charm++ callback interface to figure out which process rank it ought to record information for. Unfortunately, with AMPI, an AMPI (and hence user) rank does not map directly into anything that TAU currently understands. TAU's thread construct is probably the closest abstraction to AMPI ranks. Unfortunately, this will not capture scenarios where an AMPI rank migrates from one process to another. TAU threads are tied to their host processes. TAU will support threads coming into and out of existence (eg, OMP threads and Pthreads) on a process, but will have some trouble mapping persistent but migrating entities like AMPI ranks.

Please give me a couple of days to consider what would be good low-hanging fruit to get you what you want. I think there is a reasonable way to do this, but it will involve some (hopefully minor) changes to the way Charm++ and TAU are integrated.

Right now, this is what I think is the best information you can currently get out of Charm++/TAU:

1. build AMPI with MPI layer.
2. build TAU with MPI support.
3. integrate Charm++ with TAU in the usual way.
4. build your AMPI code with the options "-tracemode Tau -no-trace-mpi"

In this mode, ALL the AMPI ranks that have activity on the same Charm++ process are recorded into the same profile by TAU.

The other option is to just make use of the native Charm++ performance tool Projections. It is a trace-based tool and it will record information that identifies individual AMPI ranks. Last I knew, Projections profiles were still generated by accumulating time distinguished by Entry Method. Perhaps the Charm++ folk might know if there is a way to accumulate time by Charm++ Object (and by extension, AMPI ranks). If not, I'm sure someone within the group can be convinced to implement such a display relatively quickly.

Best Regards,
Chee Wai

Charm++ Group: Please read and follow up. If Projections does not already support what he wants, I believe there are some gaps with respect to AMPI performance visualization that could be filled.

On Jun 21, 2012, at 9:22 AM, Xoan Carlos Pardo Martinez wrote:

Hi Chee,

Sorry for this long time without notices from me.

During this time I´ve been doing many tests with Charm++, AMPI and TAU trying to understand the best way to make the measurements I'm interested in.
At first I was a bit disoriented by your comment about TAU MPI wrapper calls: "As a default, TAU wraps MPI calls. In the case of attempting to measure 
AMPI performance, we do not want to do that (it is actually unclear which MPI layer gets wrapped in this case) but instead interpret the calls according 
to what the Charm++ runtime tells TAU." I suspected that If I wanted comparable AMPI measures with other MPI implementations form the application 
point-of-view, I had to use the TAU MPI wrapping interface over Charm++ compiled without any underlying MPI support. So I've decided to try that approach.

The Tau MPI wrapper library expects that the underlying MPI implementation provides an implementation of the PMPI interface (MPI Profiling interface).
Most MPI functions in the wrapper library have implementations like this:

int  MPI_fun( ... )
{
  int  returnVal;
  int typesize;
  
  TAU_PROFILE_TIMER(tautimer, "MPI_fun()",  " ", TAU_MESSAGE);
  TAU_PROFILE_START(tautimer);

// ... some TAU stuff here ...

  returnVal = PMPI_fun( ... );

// ... some TAU stuff here ...

  TAU_PROFILE_STOP(tautimer);
  return returnVal;
}

That is, they are calls to PMPI functions surrounded by TAU code to start/stop trace of events. Unfortunately AMPI doesn't provide an PMPI interface. So
I had to modify ampi.h in the src/libs/ck-libs/ampi Charm++ directory. For every MPI function I defined the following:

#ifdef TAU_MPI
   #define PMPI_fun AMPI_fun
   int MPI_fun( ... );
#else
   #define MPI_fun AMPI_fun
#endif
int AMPI_fun( ... );

This way, when compiling with TAU, PMPI functions are renamed as AMPI functions, and TAU will be tracing AMPI function calls.

To test the modifications I follow these steps (really I did LOTS of tests until it works):

1. Build Charm++ without underlying MPI support
2. Build AMPI (without defining TAU_MPI)
3. Build TAU using AMPI and -TRACE and -MPITRACE options and without using any thread library.
4. Build Charm++ Tau tracing support using the TAU libs and makefile created in step 3
5. Build an example MPI application (PI calculation) with TAU instrumentation and AMPI. I linked it with the -tracemode Tau option.

The good news are that it works. I can execute the application and get the TAU trace files with performance information about the MPI function calls.
Bad news are that from time to time execution crashes for different reasons. Sometimes I get error messages from Charm++ and others I get signal
exceptions apparently caused by TAU. I´ve noticed also that executing with the ++ppn option (with a value different from 1) doesn´t work at all.

For me it´s really hard to know what can be causing this behaviour. I wonder if anyone from the Charm (or TAU) team could give me any assistance
If I provide debug information of failed executions.

Best regards
----
Xoan C. Pardo
Computer Architecture Group
University of A Coruña - Spain



Em 13/04/2012, às 19:18, Chee Wai Lee escreveu:

Hi Xoan,

I do not believe anyone has attempted to use TAU profiling (via the Charm++-TAU measurement interface) on AMPI so far. I should be able to help you navigate through some of the issues and perhaps adapt the framework to enable what you desire if some functionality is missing.

The way TAU interfaces with the Charm++ runtime to measure the performance of Charm++ events (on top of which AMPI is built) is through callback hooks provided by the runtime system. You will need to follow steps similar to the following:

http://www.nic.uoregon.edu/tau-wiki/Guide:NAMDTAU

The only requirements on the configuration and building of TAU is that it should match the Charm++ build as closely as possible (the architecture, the compilers and the use of any underlying MPI layers). In your case, the use of "-no-trace-mpi" is critical. As a default, TAU wraps MPI calls. In the case of attempting to measure AMPI performance, we do not want to do that (it is actually unclear which MPI layer gets wrapped in this case) but instead interpret the calls according to what the Charm++ runtime tells TAU.

You may contact me via skype if you need any interactive assistance on the matter. What you are doing is new as far as I know and might require some work to get going correctly. Thanks!

Best Regards,
Chee Wai Lee
University of Oregon

On Apr 10, 2012, at 6:23 AM, Xoan Carlos Pardo Martinez wrote:

Hi,

I´m interested in making some performance measurements using different MPI implementations. I´m using TAU as profile tool and AMPI is one of the implementations i´m interested in.

My question is about the right way to configure/compile TAU to use it with AMPI. Before building the Tau target in CHARM, is it necessary to compile and configure TAU to use the commands/headers/libraries in the AMPI directory? If so, is it also necessary to specify the -charm building option to use CHARM threads in TAU?

Thanks for your assistance

Best regards
----
Xoan C. Pardo
Computer Architecture Group
University of A Coruña - Spain
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm

_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl









Archive powered by MHonArc 2.6.16.

Top of Page