charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] FW: Obtaining Charm++ profile

From: Rob Van der Wijngaart <robv AT nvidia.com>
To: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
Subject: [charm] FW: Obtaining Charm++ profile
Date: Fri, 15 Jun 2018 21:59:04 +0000
Accept-language: en-US
Authentication-results: illinois.edu; spf=softfail smtp.mailfrom=robv AT nvidia.com; dmarc=none header.from=nvidia.com
Msip_labels: MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Enabled=True; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SiteId=43083d15-7273-40c1-b7db-39efd9ccc17a; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Owner=robv AT nvidia.com; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SetDate=2018-06-14T18:02:40.1106954Z; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Name=Unrestricted; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Application=Microsoft Azure Information Protection; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Extended_MSFT_Method=Automatic; Sensitivity=Unrestricted

Hello all,

I hope you don’t mind a few follow-up questions, after Ronak’s very useful input.

If in a PE’s log there are BEGIN_PROCESSING and END_PROCESSING events for a certain entry method without intervening CREATION events, does that mean that the entry method does not send explicit messages, but uses parameter marshalling for the communication? In this case, is it possible to determine how much time was spent on communication and how much on computation?
I understand that in the CREATION records (with a “1” in the first field) the fifth field is the sequence number of the event on that PE. When I search for all creation records on a specific PE, the fifth field is indeed monotonically increasing. However, some sequence numbers are missing. Are there other events (not CREATION) that make up for the missing sequence numbers? I did some quick search through the log, and it didn’t look like it.
Is it correct to interpret the time stamps of corresponding CREATION and START_PROCESSING events on different PEs as the time a send operation returned on the source and the time the receive operation returned on the target? If so, are the time stamps globally consistent, so that I can compare stamps on different PEs to determine how it took from send to receive a message? Also, if my interpretation is correct, there should be a 1-1 correspondence between CREATION and START_PROCESSING events. Would there also be a matching END_PROCESSING event on the receiving PE in this case, and, if so, what would be the meaning of it?
If I use SDAG methods in my code, I have essentially a very-long-running entry method, right (sorry, it has been a few years since I implemented Stencil in Charm++, and I don’t remember all the right terminology)? Should I then expect many CREATION events (message sends) between the beginning and end of the main entry method?

Here are a few details I found out about the extremely simple Stencil workload from the Parallel Research Kernels, which I ran like this: ./charmrun +p16 ./stencil 9 1000 4.This means 9 iterations (plus one not included in the timing but included in the trace), a grid of 1000x1000 points, and an overdecomposition factor of 4, so a chare array of size 8x8, divided among 16 PEs. I scanned the traces for event type 1, 2, and 3, and recorded the most frequently occurring. The code is iterative, and I can see that some event counts are independent of the number of iterations, so not really interesting. The top recurring ones are as follows for PE 0 (there are some differences among the PEs, no doubt caused by the fact that some chares do not have topological neighbors in some coordinate direction, because they work on a part of the grid adjacent to the domain boundary):

Entry method	#creation events	#begin_processing events	#end_processing events
Stencil_begin_iteration	0	40	40
Stencil_compute	0	40	40
Stencil_processGhosts	0	110	110
SDAG_RTS	0	126	126
receiveGhosts(ghostMsg* impl_msg)	110	110	110

Does this make sense to you? As you can see, only receiveGhosts has creation events, and is has as many of those as it has begin_processing and end_processing events. Of the high-frequency methods, the only one I did not explicitly name is SDAG_RTS. A typical trace record for the being_processing event related to SDAG_RTS is this:

2 65535 3 188894 -1 0 0 0 0 2 0 0 157140

For all records the datum after the time stamp is -1. I don’t know how to interpret that. Also, according to Ronak’s earlier message, the seventh item in this record signifies message length, but in all records of this type of SDAG_RTS the seventh field is zero. Should I simply ignore these events as non-communicating?

Thanks for any help you can offer!

Rob

From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Wednesday, June 13, 2018 2:48 PM
To: Rob Van der Wijngaart <robv AT nvidia.com>
Cc: charm AT cs.illinois.edu
Subject: Re: [charm] Obtaining Charm++ profile

Hi Rob,

If I'm understanding what you're trying to do, I think you can do it all using the Projections logs themselves. There are CREATION events that represent message sends in the logs; these events are lines beginning with 1. The third field on this line represents the entry ID of the message (that is to say, what method will be executed on the target), the fourth field the timestamp. The fifth and sixth fields are the event and PE, respectively. These can be matched against BEGIN_PROCESSING lines to see where the triggering message came from.

For example, this CREATION line on PE 1:

1 19 135 303111 2 1 112 0

corresponds to this BEGIN_PROCESSING on PE 2:

2 4 135 303136 2 1 112 0 3 127966

You can see that the two fields after the timestamp match exactly (2 1). These mean that this creation was the second "event" on PE 1 and that the execution was triggered by the second "event" on PE 1, respectively.

This is the mechanism that Projections uses to determine dependencies. If you do a linear read through every log file, CREATION events that occur between BEGIN_PROCESSING and END_PROCESSING events are attributed to that entry method. We use this to construct a map from every message in the application to its sending entry method and another map from every entry method to a list of the messages it sends. We can then use these to do both forward and backward dependency analysis.

To answer your other questions, I think logging events in _processHandler should work, but that will only give you receive events and might be too low level to easily capture all the information you want.

As far as not getting output, did you enable the ARRAY_DEBUG_OUTPUT flag? Doing so should give you some debug output.

-Ronak

On Wed, Jun 13, 2018 at 11:12 AM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:

Hello Team,

I am guessing that Ronak is out of the office. Could someone else take a stab at answering my questions? Thanks much!

Rob

From: Rob Van der Wijngaart
Sent: Wednesday, June 13, 2018 8:21 AM
To: 'Ronak Buch' <rabuch2 AT illinois.edu>
Cc: 'charm AT cs.illinois.edu' <charm AT cs.illinois.edu>
Subject: RE: [charm] Obtaining Charm++ profile

To follow up some more, I edited the block I mentioned as below, but when I run my code, which uses Chare arrays, I do not get any debug output. Is that to be expected? Thanks.

Rob

#if ARRAY_DEBUG_OUTPUT

#   define DEB(x) /*CkPrintf x*/ //General debug messages

#   define DEBI(x) /*CkPrintf x*/ //Index debug messages

#   define DEBC(x) /*CkPrintf x*/ //Construction debug messages

#   define DEBS(x) CkPrintf x //Send/recv/broadcast debug messages

#   define DEBM(x) /*CkPrintf x*/ //Migration debug messages

#   define DEBL(x) /*CkPrintf x*/ //Load balancing debug messages

#   define DEBK(x) /*CkPrintf x*/ //Spring Cleaning debug messages

#   define DEBB(x) /*CkPrintf x*/ //Broadcast debug messages

#   define AA "ArrayBOC on %d: "

#   define AB ,CkMyPe()

#   define DEBUG(x) x

#else

#   define DEB(X) /*CkPrintf x*/

#   define DEBI(X) /*CkPrintf x*/

#   define DEBC(X) /*CkPrintf x*/

#   define DEBS(x) /*CkPrintf x*/

#   define DEBM(X) /*CkPrintf x*/

#   define DEBL(X) /*CkPrintf x*/

#   define DEBK(x) /*CkPrintf x*/

#   define DEBB(x) /*CkPrintf x*/

#   define str(x) /**/

#   define DEBUG(x)

#endif

From: Rob Van der Wijngaart
Sent: Wednesday, June 13, 2018 7:58 AM
To: 'Ronak Buch' <rabuch2 AT illinois.edu>
Cc: 'charm AT cs.illinois.edu' <charm AT cs.illinois.edu>
Subject: RE: [charm] Obtaining Charm++ profile

Hi Ronak,

To follow up on yesterday’s message, I noticed at the top of ck-core/ckarray.C that it is possible to enable certain messages by editing the block shown below. I will experiment with that now, but have a few questions:

DEBS(x) appears to control debug IO for send/recv/broadcast, but DEBB(x) also controls debug IO for broadcasts. Is the latter simply a subset of the former?
I don’t see a similar debug block in ck.C. Is it possible to obtain similar functionality, though?

Thanks!

Rob

#define ARRAY_DEBUG_OUTPUT 0

#if ARRAY_DEBUG_OUTPUT

#   define DEB(x) CkPrintf x //General debug messages

#   define DEBI(x) CkPrintf x //Index debug messages

#   define DEBC(x) CkPrintf x //Construction debug messages

#   define DEBS(x) CkPrintf x //Send/recv/broadcast debug messages

#   define DEBM(x) CkPrintf x //Migration debug messages

#   define DEBL(x) CkPrintf x //Load balancing debug messages

#   define DEBK(x) CkPrintf x //Spring Cleaning debug messages

#   define DEBB(x) CkPrintf x //Broadcast debug messages

#   define AA "ArrayBOC on %d: "

#   define AB ,CkMyPe()

#   define DEBUG(x) x

#else

#   define DEB(X) /*CkPrintf x*/

#   define DEBI(X) /*CkPrintf x*/

#   define DEBC(X) /*CkPrintf x*/

#   define DEBS(x) /*CkPrintf x*/

#   define DEBM(X) /*CkPrintf x*/

#   define DEBL(X) /*CkPrintf x*/

#   define DEBK(x) /*CkPrintf x*/

#   define DEBB(x) /*CkPrintf x*/

#   define str(x) /**/

#   define DEBUG(x)

#endif

From: Rob Van der Wijngaart
Sent: Tuesday, June 12, 2018 9:23 PM
To: 'Ronak Buch' <rabuch2 AT illinois.edu>
Cc: 'charm AT cs.illinois.edu' <charm AT cs.illinois.edu>
Subject: RE: [charm] Obtaining Charm++ profile

Hi Ronak,

I’ve played around a bit with the trace, but it still does not give me quite what I need. I am trying to build a dependency graph, which requires, besides the time stamp and the source of the event, also the target. I am looking at file src/ck-core/ck.C, trying to decide where to insert the requisite print statements. Will I capture all communication events by instrumenting the big case statement in function _processHandler? Thanks!

Rob

From: Rob Van der Wijngaart
Sent: Wednesday, June 6, 2018 8:51 AM
To: 'Ronak Buch' <rabuch2 AT illinois.edu>
Cc: charm AT cs.illinois.edu
Subject: RE: [charm] Obtaining Charm++ profile

OK, thanks, Ronak. I’ll see how this works out once I’ve processed the graph info of my application.

Rob

From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 4:05 PM

To: Rob Van der Wijngaart <robv AT nvidia.com>
Cc: charm AT cs.illinois.edu
Subject: Re: [charm] Obtaining Charm++ profile

Yes, you can basically ignore the rest of the fields (they're some mixture of ordering, IDs, performance counters, etc.). The trailing zeros are some of these fields and can usually be ignored.

dummy_thread_ep is basically a catchall for things that are traced that the runtime doesn't have enough information for to attribute to something else. It can be meaningful, but it could be a lot of different things depending on what exactly your program is doing, so it's hard for me to make any conclusions without more information. Usually looking at the bookending events that have meaningful attributions is enough to figure out what's going on in those sections.

On Tue, Jun 5, 2018 at 5:51 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:

Thanks much, Ronak!

Below is a representative snippet from one of my log files.

The lines that start with 2 or 3 have 8 and 13 fields each. I guess I can ignore the fields that you’re not describing. A lot of the records that start with 2 or 3 refer to ENTRY CHARE 0, which refers to dummy_thread_ep, according to the sts file (the line says: “ENTRY CHARE 0 dummy_thread_ep 0 0”). Is that a meaningful mapping, and do the trailing zeroes have any significance? Thanks.

Rob

3 0 355 14898768 569 16 0 11638933

2 4 0 14898768 2950 14 0 0 14 369 0 0 0

3 0 0 14898864 2950 14 0 11638933

14 14898866 14

15 14901654 14

2 5 355 14901657 893 16 100 0 0 0 0 0 11641931

1 4 0 14901667 2951 14 0 0

3 0 355 14901669 893 16 0 11641931

2 5 355 14901669 900 16 100 0 0 0 0 0 11641931

1 4 0 14901679 2952 14 0 0

3 0 355 14901681 900 16 0 11641931

2 4 0 14901681 2952 14 0 0 14 2118 0 0 0

3 0 0 14901772 2952 14 0 11641931

2 5 355 14901775 954 16 100 0 0 0 0 0 11641931

1 4 0 14901787 2953 14 0 0

3 0 355 14901787 954 16 0 11641931

2 4 0 14901789 2953 14 0 0 14 1958 0 0 0

3 0 0 14901878 2953 14 0 11641931

2 5 355 14901880 1019 16 100 0 0 0 0 0 11641931

1 4 0 14901890 2954 14 0 0

3 0 355 14901894 1019 16 0 11641931

2 5 355 14901895 1006 16 100 0 0 0 0 0 11641931

1 4 0 14901906 2955 14 0 0

3 0 355 14901906 1006 16 0 11641931

2 4 0 14901908 2955 14 0 0 14 1653 0 0 0

3 0 0 14901993 2955 14 0 11641931

2 4 0 14901995 2955 14 0 0 14 1349 0 0 0

3 0 0 14902085 2955 14 0 11641931

2 4 0 14902086 2955 14 0 0 14 1796 0 0 0

3 0 0 14902175 2955 14 0 11641931

14 14902176 14

15 14902230 14

From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 3:09 PM

To: Rob Van der Wijngaart <robv AT nvidia.com>
Cc: charm AT cs.illinois.edu
Subject: Re: [charm] Obtaining Charm++ profile

No, we don't have a good description of the record format right now other than the source code that generates it in Charm (src/ck-perf/trace-projections.C) and the source code that parses it in Projections (GenericLogReader.java). I'll provide a brief summary below, but if there's something specific that you want, I can explain it.

In general, each line in the Projections log file represents a specific event. The ones that you're probably interested in are those beginning with 2 or 3, which represent BEGIN_PROCESSING and END_PROCESSING events (the start and end of entry methods). The third field in those lines represents the entry ID, which can be converted a function name by using the sts file (in which ENTRY CHARE lines provide a mapping between an entry ID and the actual name of the entry method). The fourth field represents the timestamp of the event (so the difference between the two values in this position for corresponding BEGIN_PROCESSING and END_PROCESSING events is the total time of the entry method). The sixth value is the PE, which indicates which PE served as that event's source. The seventh field for BEGIN_PROCESSING events is the message length.

Hopefully that helps.

Thanks,

Ronak

On Tue, Jun 5, 2018 at 4:38 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:

OK, I got my .log, .sts, and .projrc files for my run! Next question, if you don’t mind. I know of Projections, but I’d like to use my own graph analysis tools to parse the log files. Is there a description of the records in the .log and .sts files? Thanks!

Rob

From: Rob Van der Wijngaart
Sent: Tuesday, June 5, 2018 2:11 PM
To: 'Ronak Buch' <rabuch2 AT illinois.edu>
Cc: charm AT cs.illinois.edu
Subject: RE: [charm] Obtaining Charm++ profile

Thanks, Ronak!

Rob

From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 2:05 PM
To: Rob Van der Wijngaart <robv AT nvidia.com>
Cc: charm AT cs.illinois.edu
Subject: Re: [charm] Obtaining Charm++ profile

You built Charm++ with the correct flag ("--enable-tracing"). The issue is that the application, when linked, needs the "-tracemode projections" flag.

Specifically, if you're building NAMD (as you seem to be), you can just run "make projections," which includes the "-tracemode projections" flag already.

Thanks,

Ronak

On Tue, Jun 5, 2018 at 12:47 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:

Hello Team,

I am trying to obtain a profile of a NAMD run using Charm++. I built Charm++ this way:

./build charm++ multicore-linux64 gcc --with-production --enable-tracing -j40 -tracemode summary

According to the man page I should now automatically obtain a profile if I run the application like this:

./namd-2.13-multicore-cuda +traceroot $PWD +p32 +setcpuaffinity +pemap 0-31 +devices 0,1,2,3 stmv/stmv.namd

However, I do not see any profile. If I then add +sumonly to the command line, I get this error message:

WARNING: +sumonly is a command line argument beginning with a '+' but was not parsed by the RTS.

If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.

So it appears I did not build Charm++ properly to support tracing. Can you tell me how to do it right?

Second, I am actually interesting in doing projections, not obtaining a summary profile, but I do not understand this instruction in the manual:

Link time option: -tracemode projections

After all, I only issue a build command to create the runtime, with no explicit link calls issued. Can you advise?
This concerns v6.8.2, which I obtained as a tar ball.

Thanks,
Rob

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

RE: [charm] Obtaining Charm++ profile, (continued)