Skip to Content.
Sympa Menu

charm - Re: [charm] messages not being received

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] messages not being received


Chronological Thread 
  • From: Robert Steinke <rsteinke AT uwyo.edu>
  • To: <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] messages not being received
  • Date: Mon, 6 Oct 2014 17:08:36 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I've been working on my problem where I send messages to an entire chare array and some messages don't arrive.

I've been trying to create a minimal example that exhibits the problem. I've gotten down to about 2000 lines of code. I can't see any bugs in my code. Would anyone be willing to take a look at it or try to debug it on your system?

I am running on CentOS6 with the newly released Charm 6.6.0. The build is mpi-linux-x86_64, and the MPI is mpich-3.0.1. The problem shows up when I run on only one process element. I haven't tried it on more.

The problem depends on a neighbor graph that is read in from a file. At the start, each chare initializes itself and then sends an initialization message to its neighbors. These messages all arrive, but when I try to send a subsequent message to all elements of the chare array some elements don't receive it. If I use hardcoded neighbor relationships like each element is connected to the ones numerically before and after it the problem doesn't occur. But when I use the neighbor graph that I want to use from the file the problem occurs. The problem is not caused by reading from the file. I can read the file and then overwrite the neighbor values with hardcoded ones and the problem doesn't occur.

I've attached the code, but the file with the neighbor relationships is a 6GB netCDF file. I can send it to whoever is willing to work on the problem. You will need to have the NetCDF library to link against my code.

Thanks,
Bob Steinke



On 10/03/2014 03:46 PM, Robert Steinke wrote:
I'm having a problem with my charm application.

Before I get into the problem, I tried to use the ccs_tools charm debugger, but haven't been able to yet. I read in the manual that it only works for net-* versions of charm, and I am running on an mpi-* version. The process of getting my code to run on a net-* version started to turn into a real mess. For example I'm using the parallel version of the NetCDF library that requires MPI. I could probably get it running on a net-* version, but my first question is whether that's the right road to be going down. Is it likely the ccs_tools debugger will be useful for solving this problem, or is there something else I can do?

Here's the problem:

In an entry method of one object I have a loop that sends out messages to every element of a chare array. I'm sending an individual message to each object in a loop, not a broadcast through the array proxy, because I need to send different parameters to each object. Like this:

for (ii = 0; ii < proxySize; ii++)
{
proxy[ii].message(parameters[ii]);
}

When proxySize is large and I send a lot of messages (about 37,000) a couple percent of them never arrive. The missing messages are scattered around the array. When I send a small number of messages they all arrive.

Has anyone experienced something like this before?

I was hoping that the ccs_tools debugger would be able to do things like show me the queued messages so I can see messages being sent and received so I can tell if this is really a problem with charm not delivering messages or if I'm doing something wrong. Is this something that ccs_tools could show me?

Thanks for the help,

Bob Steinke

_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm

Attachment: charm_error.tar.gz
Description: GNU Zip compressed data




Archive powered by MHonArc 2.6.16.

Top of Page