Skip to Content.
Sympa Menu

charm - Re: [charm] messages not being received

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] messages not being received


Chronological Thread 
  • From: Lukasz Wesolowski <wesolwsk AT illinois.edu>
  • To: Robert Steinke <rsteinke AT uwyo.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] messages not being received
  • Date: Tue, 7 Oct 2014 13:36:40 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Bob,

Since the number of messages sent in your code is high, you may find
that communication overhead degrades performance. If that proves to be
the case, aggregation of messages may be a possible solution. As Prof.
Kale suggested, you can use TRAM for that purpose
(http://charm.cs.illinois.edu/manuals/html/libraries/5.html).

If you are interested, I can point you at some examples.

Lukasz

On Tue, Oct 7, 2014 at 12:42 PM, Robert Steinke
<rsteinke AT uwyo.edu>
wrote:
> I figured out the problem. There are duplicates in the neighbor lists so
> when I received an initialization message from that neighbor I only marked
> the first one as initialized and it was never getting past the initialize
> neighbor SDAG code. Thanks for the suggestions. They helped me think about
> where it could be getting hung up.
>
> Bob
>
>
> On 10/06/2014 06:11 PM, Lukasz Wesolowski wrote:
>>
>> Hi Bob,
>>
>> Please tell us for which entry method in your code the message is not
>> being received.
>>
>> Also, if you could answer the following questions, I think they would
>> help in tracking down the problem:
>>
>> 1. Have you verified that the missing messages are actually getting
>> sent (i.e. that all the elements that should receive a message are
>> actually being sent one)?
>> 2. If the target entry method corresponds to an SDAG when clause, is
>> it possible that the message has been received at the level of the
>> runtime system, but the corresponding when statement is not being
>> reached (e.g. due to unsatisfied when clauses earlier in the code)?
>>
>> While I would not rule out the possibility of messages getting lost, I
>> do not think it is very likely.
>>
>> Lukasz
>>
>> On Mon, Oct 6, 2014 at 6:08 PM, Robert Steinke
>> <rsteinke AT uwyo.edu>
>> wrote:
>>>
>>> I've been working on my problem where I send messages to an entire chare
>>> array and some messages don't arrive.
>>>
>>> I've been trying to create a minimal example that exhibits the problem.
>>> I've gotten down to about 2000 lines of code. I can't see any bugs in my
>>> code. Would anyone be willing to take a look at it or try to debug it on
>>> your system?
>>>
>>> I am running on CentOS6 with the newly released Charm 6.6.0. The build
>>> is
>>> mpi-linux-x86_64, and the MPI is mpich-3.0.1. The problem shows up when
>>> I
>>> run on only one process element. I haven't tried it on more.
>>>
>>> The problem depends on a neighbor graph that is read in from a file. At
>>> the
>>> start, each chare initializes itself and then sends an initialization
>>> message to its neighbors. These messages all arrive, but when I try to
>>> send
>>> a subsequent message to all elements of the chare array some elements
>>> don't
>>> receive it. If I use hardcoded neighbor relationships like each element
>>> is
>>> connected to the ones numerically before and after it the problem doesn't
>>> occur. But when I use the neighbor graph that I want to use from the
>>> file
>>> the problem occurs. The problem is not caused by reading from the file.
>>> I
>>> can read the file and then overwrite the neighbor values with hardcoded
>>> ones
>>> and the problem doesn't occur.
>>>
>>> I've attached the code, but the file with the neighbor relationships is a
>>> 6GB netCDF file. I can send it to whoever is willing to work on the
>>> problem. You will need to have the NetCDF library to link against my
>>> code.
>>>
>>> Thanks,
>>> Bob Steinke
>>>
>>>
>>>
>>>
>>> On 10/03/2014 03:46 PM, Robert Steinke wrote:
>>>>
>>>> I'm having a problem with my charm application.
>>>>
>>>> Before I get into the problem, I tried to use the ccs_tools charm
>>>> debugger, but haven't been able to yet. I read in the manual that it
>>>> only
>>>> works for net-* versions of charm, and I am running on an mpi-* version.
>>>> The process of getting my code to run on a net-* version started to turn
>>>> into a real mess. For example I'm using the parallel version of the
>>>> NetCDF
>>>> library that requires MPI. I could probably get it running on a net-*
>>>> version, but my first question is whether that's the right road to be
>>>> going
>>>> down. Is it likely the ccs_tools debugger will be useful for solving
>>>> this
>>>> problem, or is there something else I can do?
>>>>
>>>> Here's the problem:
>>>>
>>>> In an entry method of one object I have a loop that sends out messages
>>>> to
>>>> every element of a chare array. I'm sending an individual message to
>>>> each
>>>> object in a loop, not a broadcast through the array proxy, because I
>>>> need to
>>>> send different parameters to each object. Like this:
>>>>
>>>> for (ii = 0; ii < proxySize; ii++)
>>>> {
>>>> proxy[ii].message(parameters[ii]);
>>>> }
>>>>
>>>> When proxySize is large and I send a lot of messages (about 37,000) a
>>>> couple percent of them never arrive. The missing messages are scattered
>>>> around the array. When I send a small number of messages they all
>>>> arrive.
>>>>
>>>> Has anyone experienced something like this before?
>>>>
>>>> I was hoping that the ccs_tools debugger would be able to do things like
>>>> show me the queued messages so I can see messages being sent and
>>>> received so
>>>> I can tell if this is really a problem with charm not delivering
>>>> messages or
>>>> if I'm doing something wrong. Is this something that ccs_tools could
>>>> show
>>>> me?
>>>>
>>>> Thanks for the help,
>>>>
>>>> Bob Steinke
>>>>
>>>> _______________________________________________
>>>> charm mailing list
>>>> charm AT cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>
>>>
>>>
>>> _______________________________________________
>>> charm mailing list
>>> charm AT cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>
>




Archive powered by MHonArc 2.6.16.

Top of Page