Skip to Content.
Sympa Menu

charm - Re: [charm] ParFUM Crashes w/ Parallel Partitioning

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] ParFUM Crashes w/ Parallel Partitioning


Chronological Thread 
  • From: Phil Miller <phil AT hpccharm.com>
  • To: kevin.mueller AT utas.utc.com
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] ParFUM Crashes w/ Parallel Partitioning
  • Date: Thu, 27 Oct 2016 11:53:55 -0500

Hi Kevin,

Thanks again for bringing this bug back to our attention. It turns out that the bug in question wasn't too bad to fix in the ParFUM codebase. That will be integrated soon, and ship in our upcoming 6.8 release.

Actually benefiting from the fixed code will take a bit of work, though, for the moment. We'll hopefully have these issues hammered out in the near future - unless you're currently working with very large meshes that require parallel partitioning, the easiest path for you at the moment may be to use the default serial partitioning without overdecomposition (i.e. don't set +vp, or set it equal to the number of MPI processes) for the moment. In that case, you can basically skip reading the rest of this message.

Excess technical details follow:

To integrate with ParFUM, ParMetis has to be compiled against Adaptive MPI, our implementation of MPI built on top of Charm++. Instructions for this are below. The Charm++ system used to bundle ParMetis to automate this, but it had to be stripped out due to the restrictive license terms ParMetis is normally distributed under by UMN (no redistribution, no commercial use).

Charmworks (the commercial development/support organization for Charm++) is working on arranging with UMN to sublicense ParMetis for our users, but that will take a bit of time. If you have a ParMetis commercial license from UMN already, then this isn't a concern for you. If not, we'd be happy to bundle that in with a commercial ParFUM license if you want to put this into production use.

Here's how to get ParMetis working with ParFUM as patched to fix bug #422:

1. Build AMPI as you previously built ParFUM, e.g.:

./build AMPI mpi-linux-x86_64-ifort

2. Point an environment variable at the build directory

export AMPI_DIR=/path/to/charm-v.6.7.1/mpi-linux-86_64-ifort

3. Configure and build ParMetis in its directory

make config cc=$AMPI_DIR/bin/ampicc cxx=AMPI_DIR/bin/ampicxx prefix=$AMPI_DIR ; make ; make install

4. Compile ParFUM against the just-built ParMetis

make -C $AMPI_DIR/tmp/libs/ck-libs ParFUM

5. Compile your application using ParFUM as before

Even more excessive technical details, to get overdecomposition and load balancing working today:

Additionally, running ParFUM's ParMetis-based parallel partitioning with overdecomposition (an argument to +vp greater than the number of processes) requires a slightly patched version of serial Metis (which ParMetis calls). Again, we used to bundle this with the Charm++ system, but had to remove it due to licensing issues. Since then, the license for Metis has changed to a more liberal open-source license. So, we'll soon be re-incorporating our modified version of Metis to the overall Charm++ distribution. For the moment, I'd recommend going ahead with the rest of your application work, and we'll get the worked out in the next release. If you want to pick that up immediately, I can send you our patch for Metis as well.

If anyone on the list has read this far, and hasn't been scared off, you're a brave soul. Charmworks is interested in hiring Applications Engineers who like working on these sorts of problems - please contact me.

Phil


On Thu, Oct 13, 2016 at 12:14 PM, <kevin.mueller AT utas.utc.com> wrote:
ParFUM crashes when attempting to partition a mesh using FEM_Partition_Mode=2
in the source or adding +Parfum_parallel_partition.

Crash message is as follows:
[0] Memory usage on vp 0 at the begining of partition 86002304
master -> number of elements 515777
master -> ptrcount 515777 indcount 2063108 sizeof(MSA1DINT) 56
sizeof(MSA1DINTLIST) 56 memory 104175088

This appears to duplicate Bug #442 and I have reproduced the same error in
that bug report using the 'simple2D' example as well.

https://charm.cs.illinois.edu/redmine/issues/422

Is there a fix or work-around for this issue?

Build Environment as follows:
Red Hat Enterprise Linux Workstation release 7.2 (Maipo)
ParMetis v4.0.3
Charm++ 6.7.1
MPICH v3.2
ifort 2011.2.137




Archive powered by MHonArc 2.6.19.

Top of Page