Skip to Content.
Sympa Menu

charm - [charm] Fault tolerant Jacobi

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Fault tolerant Jacobi


Chronological Thread 
  • From: Kiril Dichev <K.Dichev AT qub.ac.uk>
  • To: charm AT lists.cs.illinois.edu
  • Subject: [charm] Fault tolerant Jacobi
  • Date: Fri, 20 Jul 2018 16:15:44 +0100
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=prvs=07396b98a1=K.Dichev AT qub.ac.uk; dkim=pass header.d=qub.ac.uk header.s=qub-rsa; dmarc=none

Hello,

I am a new user of Charm++ and AMPI.

I’ve done some research on fault tolerance in MPI in the last year, and I see some nice ways to couple it with AMPI (happy to explain if anyone is interested). I used a Jacobi solver before, so it would be nice to use the same for AMPI to get going. I am especially interested to test the parallel recovery capabilities that were presented in work such as this one, for Jacobi among other codes: https://repositoriotec.tec.ac.cr/bitstream/handle/2238/7150/Using%20Migratable%20Objects%20to%20Enhance%20Fault%20Tolerance%20Schemes%20in%20Supercomputers.pdf?sequence=1&isAllowed=y


However, I am not sure where to begin. I pulled the official Charm++ repo, which contains some MPI Jacobi code in tests/ampi/jacobi3d. In particular, it has some kill files as well, which a very old tutorial tells me can be used to specify failure scenarios for PEs. However, it seems the +pkill_file option doesn’t even exist anymore, so that’s outdated, and I don’t know if the code is up-to-date either.

On the other hand, there is a repo here, according to the documentation in the main repo:
ssh://charm.cs.illinois.edu:9418/benchmarks/ampi-benchmarks

… which I can’t access, and apparently it also has Jacobi codes I can run with AMPI. Maybe that is the one I need? If it is, can I use it if I’m not affiliated with any US institutions?

Any help which is the up-to-date Jacobi + AMPI would be much appreciated. In addition, any help how to experiment with parallel recovery via migration would be great.


Regards,
Kiril Dichev




Archive powered by MHonArc 2.6.19.

Top of Page