Skip to Content.
Sympa Menu

charm - [charm] Fault Tolerance Documentation

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Fault Tolerance Documentation


Chronological Thread 
  • From: "Wang, Felix Y." <wang65 AT llnl.gov>
  • To: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
  • Subject: [charm] Fault Tolerance Documentation
  • Date: Tue, 24 Jul 2012 15:28:04 -0700
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Hello PPL,

I'm an intern at LLNL over the summer, and I've been working on a code port of the LULESH proxy application to Charm++ and have started to put in some constructs for fault tolerance (checkpoints/restarts) these past few days. Unfortunately, the documentation that is generally available online is rather sparse, and it does not point to any good examples for checkpointing and restarting as it is actually used in a program. Fortunately, I've been able to meet with Xiang to discuss what to actually do the implementation, and she was able to point me to some example code as well as how to build Charm++ to incorporate these constructs in the first place, among other necessary items.

Please take this email as a request to provide a more comprehensive manual section on the fault tolerance aspects of Charm++. A section/link to a tutorial, such as with the PUPers, would also be helpful.

Thanks,

--- Felix



Archive powered by MHonArc 2.6.16.

Top of Page