Skip to Content.
Sympa Menu

charm - [charm] Fault Tolerance

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Fault Tolerance


Chronological Thread 
  • From: <alberto.ortiz09 AT gmail.com>
  • To: charm AT lists.cs.illinois.edu
  • Subject: [charm] Fault Tolerance
  • Date: Thu, 16 Feb 2017 11:40:46 -0600

Hi,

I am using AMPI on a Zynq-cluster, having each Zynq a dual-core ARM. Currently
I am using 3 MicroZed boards (each one has a Zynq device). I was interested in
using AMPI from the start instead of using OpenMPI since it provides the user
with fault tolerance, adaptability and resilience.

The problem I have is that I don't know how to use or activate its fault
tolerance. I am programing in C using the MPI language and compiling the
programs with ampicc. The fault tolerance test I would like to try is to have
the 3 devices runing a task and reboot or plug off one of them, expecting AMPI
to redistribute the threads that were started in the unplugged device to the
working devices. I don't know if this kind of fault tolerance is implemented
nor how to take advantage of or use the fault tolerance implemented.

Another thing I would like to ask is if AMPI has support for run-time load
balancing. For example, if I were to multiply 10 big matrices and one node
ended its task before others, how can I implement the run-time load balance in
order to load the node with more work taken from other overloaded nodes?

Thank you in advance for the continuous support,
Alberto.


  • [charm] Fault Tolerance, alberto.ortiz09, 02/16/2017

Archive powered by MHonArc 2.6.19.

Top of Page