Skip to Content.
Sympa Menu

charm - Re: [charm] Debugging Race Conditions

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Debugging Race Conditions


Chronological Thread 
  • From: Eric Bohm <ebohm AT illinois.edu>
  • To: <charm AT cs.uiuc.edu>
  • Subject: Re: [charm] Debugging Race Conditions
  • Date: Tue, 19 Aug 2014 09:46:30 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Race conditions are one of the most difficult kinds of bug to unravel.

The record/replay feature is designed to help under these condition.  A run with +record will create files which record the exact order of events.  A run with +replay will replay execution from the event log created by the +record feature.  That way one can issue multiple runs with +record until the sought after condition occurs and +replay the target within a debugger.

Can you get ++debug to work for any charm program?  When applicable, it does rely on ssh and xforwarding working correctly, so sometimes this issue can be resolved by adding this line to your .ssh/config:
ForwardX11 yes

or by adding -X to the ssh command line in the nodelist file.

On 08/18/2014 02:51 PM, Robert Bird wrote:
Hey all

I've got a (rare) race condition, where by a charm element is inserted twice (according to the error int he stack trace when Charm aborts). 

I can only get this to happen in parallel, with random message queues, so I'm having a hard time tracking it down.

Is there an obvious way to debug race conditions such as this? 

I've tried to use ++debug in order to get reliable access to the trace in gdb, but it doesn't seem to launch quite as expected. I get debug prints about the threads at the start and the program runs, but no xterm window appears (nor waits for my input to start -- As far as I can tell I meet all the requirements, I can spawn X-window, $DISPLAY is set, xterm is in path.)

Any obvious pointers/hints? Especially about a general method for tracking down race conditions

Thanks
Bob
NB:  During a quick chat with Phil Miller he mentioned +record, does this allow me to record it in parallel, then replay on a serial gdb? 

--
Robert Bird http://go.warwick.ac.uk/robertbird
+44 (0)24 7652 2863 CS202, High Performance Lab Department of Computer Science University of Warwick


_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm




Archive powered by MHonArc 2.6.16.

Top of Page