Skip to Content.
Sympa Menu

charm - Re: [charm] Charm 6.2.2 Release Candidate Ready

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Charm 6.2.2 Release Candidate Ready


Chronological Thread 
  • From: Tom Quinn <trq AT astro.washington.edu>
  • To: Phil Miller <mille121 AT illinois.edu>
  • Cc: Charm Mailing List <charm AT cs.illinois.edu>, gzheng AT illinois.edu
  • Subject: Re: [charm] Charm 6.2.2 Release Candidate Ready
  • Date: Fri, 3 Sep 2010 21:09:57 -0700 (PDT)
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I retried my failed restart with the latest development version, and I got:

Main(CkMigrateMessage) called
Group TraceControlPointsBOC is not yet capable of migration.
...
------------- Processor 282 Exiting: Caught Signal ------------
Signal: segmentation violation
Suggestion: Try running with '++debug', or linking with '-memory paranoid' (memo
ry paranoid requires '+netpoll' at runtime).
Group TraceControlPointsBOC is not yet capable of migration.
[282] Stack Traceback:
[282:0] /lib64/libc.so.6 [0x2aaaab9b5dc0]
[282:1] _Z22CkPupArrayElementsDataRN3PUP2erEi+0x37f [0x637d7b]



Tom Quinn Astronomy, University of Washington
Internet:
trq AT astro.washington.edu
Phone: 206-685-9009

On Tue, 31 Aug 2010, Phil Miller wrote:

On Tue, Aug 31, 2010 at 20:05, Gengbin Zheng
<zhenggb AT gmail.com>
wrote:

This could have something to do with the recent change to the default array
mapping. Only yesterday I noticed this bug for checkpoint/restart, and I
asked Abhinav to fix it.
Phil, is that in the release candidate?

http://charm.cs.illinois.edu/cgi-bin/gitweb2.cgi?p=charm.git;a=log;h=refs/tags/charm-6.2.2-pre3

Yes, it is. It would be really nice to get an explicit test for this
bug into the tree.


Gengbin

On Tue, Aug 31, 2010 at 6:00 PM, Tom Quinn
<trq AT astro.washington.edu>
wrote:

I'm still having problems with restarts from checkpoints: The symptom
seems to be that the restart entry executes fine until it calls a proxy
broadcast: only 4055 out of 4096 elements get the broadcast, then ChaNGa
dies soon afterwards.  The restart has 5 empty "arr_*.dat" files.  This is
running the net-linux-x86_64-ibverbs-icc on 512 cores of the NASA Pleiades
machine.

Tom Quinn       Astronomy, University of Washington
Internet:      
trq AT astro.washington.edu
Phone:          206-685-9009

On Mon, 30 Aug 2010, Phil Miller wrote:

I've just incorporated an additional bugfix applicable to checkpoint
restart. It's in the repository tagged as charm-6.2.2-pre3.

On Mon, Aug 30, 2010 at 12:52, Phil Miller
<mille121 AT illinois.edu>
wrote:
A release candidate of Charm 6.2.2 is available from the Git
repository (git checkout charm-6.2.2-pre2), with the following changes
from 6.2.1:

FEM: prepend string to timestamp files for compatibility with newer
paraviews
Chare Array default mapping: fix bugs on late insertion and apparent
imbalance
Makefile: Avoid accidental wrong generation of pup_f.f90
Chkpt: account for processors with 0 objects
add a test for restarting on a smaller number of pes.
fixed default array map in case when check/restart happens on
different processors. The binsize has to be updated.
icc: Staticly link Intel's libraries on all versions >=9
CkMulticast Reductions: Set result message reference number to userFlag
Reductions: Set the reference number on result messages to the userFlag
NetFEM: add prefix to timestep filenames for Paraview 3.x
Fall back on gethostname if a node has multiple IP addresses
bluegenep: Update compilers
bluegenep: fix path for XLF
CPU affinity on mpi-crayxt-smp: correct getXTNodeID calls for SMP
mpi-crayxt-smp: fix cputopology to account for multiple cores per node
xlC: update linker flags for blueprint
xlC 64-bit: link with big library TOC
trace-summary: Bugfixes
Change timestep filename format to work with paraview 3.x
CPU Affinity +pemap: fix a buffer overfow bug.
Docs: note slowness in CPU topology gathering from DNS issues
xlc: don't pass archaic -qstaticinline
Socket Routines: use getifaddrs (when available) for getting local IP.
configure: check if cp -p works
LAPI: Only copy argv in PEs that are not rank 0 in their process
configure: Drop archaic test on C library version
Fix the reading of processors lists after +pemap block addition
Increment version number to 6.2.2


Collectively, these improve support for various architectures, fix
bugs in some newer features, and offer some minor improvements for
testing and development work.

Please test this code and post any discovered problems to the list.
Barring any new issues, it will be released as 6.2.2 in the next day
or two.


_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm








Archive powered by MHonArc 2.6.16.

Top of Page