Skip to Content.
Sympa Menu

charm - Re: [charm] messages not being received

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] messages not being received


Chronological Thread 
  • From: Lukasz Wesolowski <wesolwsk AT illinois.edu>
  • To: Robert Steinke <rsteinke AT uwyo.edu>
  • Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] messages not being received
  • Date: Mon, 6 Oct 2014 19:11:20 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hi Bob,

Please tell us for which entry method in your code the message is not
being received.

Also, if you could answer the following questions, I think they would
help in tracking down the problem:

1. Have you verified that the missing messages are actually getting
sent (i.e. that all the elements that should receive a message are
actually being sent one)?
2. If the target entry method corresponds to an SDAG when clause, is
it possible that the message has been received at the level of the
runtime system, but the corresponding when statement is not being
reached (e.g. due to unsatisfied when clauses earlier in the code)?

While I would not rule out the possibility of messages getting lost, I
do not think it is very likely.

Lukasz

On Mon, Oct 6, 2014 at 6:08 PM, Robert Steinke
<rsteinke AT uwyo.edu>
wrote:
> I've been working on my problem where I send messages to an entire chare
> array and some messages don't arrive.
>
> I've been trying to create a minimal example that exhibits the problem.
> I've gotten down to about 2000 lines of code. I can't see any bugs in my
> code. Would anyone be willing to take a look at it or try to debug it on
> your system?
>
> I am running on CentOS6 with the newly released Charm 6.6.0. The build is
> mpi-linux-x86_64, and the MPI is mpich-3.0.1. The problem shows up when I
> run on only one process element. I haven't tried it on more.
>
> The problem depends on a neighbor graph that is read in from a file. At the
> start, each chare initializes itself and then sends an initialization
> message to its neighbors. These messages all arrive, but when I try to send
> a subsequent message to all elements of the chare array some elements don't
> receive it. If I use hardcoded neighbor relationships like each element is
> connected to the ones numerically before and after it the problem doesn't
> occur. But when I use the neighbor graph that I want to use from the file
> the problem occurs. The problem is not caused by reading from the file. I
> can read the file and then overwrite the neighbor values with hardcoded ones
> and the problem doesn't occur.
>
> I've attached the code, but the file with the neighbor relationships is a
> 6GB netCDF file. I can send it to whoever is willing to work on the
> problem. You will need to have the NetCDF library to link against my code.
>
> Thanks,
> Bob Steinke
>
>
>
>
> On 10/03/2014 03:46 PM, Robert Steinke wrote:
>>
>> I'm having a problem with my charm application.
>>
>> Before I get into the problem, I tried to use the ccs_tools charm
>> debugger, but haven't been able to yet. I read in the manual that it only
>> works for net-* versions of charm, and I am running on an mpi-* version.
>> The process of getting my code to run on a net-* version started to turn
>> into a real mess. For example I'm using the parallel version of the NetCDF
>> library that requires MPI. I could probably get it running on a net-*
>> version, but my first question is whether that's the right road to be going
>> down. Is it likely the ccs_tools debugger will be useful for solving this
>> problem, or is there something else I can do?
>>
>> Here's the problem:
>>
>> In an entry method of one object I have a loop that sends out messages to
>> every element of a chare array. I'm sending an individual message to each
>> object in a loop, not a broadcast through the array proxy, because I need
>> to
>> send different parameters to each object. Like this:
>>
>> for (ii = 0; ii < proxySize; ii++)
>> {
>> proxy[ii].message(parameters[ii]);
>> }
>>
>> When proxySize is large and I send a lot of messages (about 37,000) a
>> couple percent of them never arrive. The missing messages are scattered
>> around the array. When I send a small number of messages they all arrive.
>>
>> Has anyone experienced something like this before?
>>
>> I was hoping that the ccs_tools debugger would be able to do things like
>> show me the queued messages so I can see messages being sent and received
>> so
>> I can tell if this is really a problem with charm not delivering messages
>> or
>> if I'm doing something wrong. Is this something that ccs_tools could show
>> me?
>>
>> Thanks for the help,
>>
>> Bob Steinke
>>
>> _______________________________________________
>> charm mailing list
>> charm AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>
>
>
> _______________________________________________
> charm mailing list
> charm AT cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>




Archive powered by MHonArc 2.6.16.

Top of Page