charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Reduction on std::vector

From: Jozsef Bakosi <jbakosi AT lanl.gov>
To: Sam White <white67 AT illinois.edu>
Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
Subject: Re: [charm] Reduction on std::vector
Date: Thu, 3 May 2018 15:33:16 -0600
Authentication-results: illinois.edu; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dmarc=pass header.from=lanl.gov

I see. Can this be a problem with asynchronous logic? Is it possible that
multiple contributions somehow append (or shrink) the array during reduction?
I
doubt it. I could imagine this only with (buggy or wrong) user-defined
reductions.

Another difference I had is that n was not an int but it was std::size_t both
on
the sending and the receiving side, which is an unsigned long. In other
words, I
did:

std::array< double, 2 > sum{{ 0.0, 0.0 }};
contribute( sum.size()*sizeof(double), sum.data(), ... )

which feeds an unsigned long as the first argument, and

entry [reductiontarget] void minstat( double d[n], std::size_t n );

void minstat( tk::real* d, std::size_t n );

which receives an unsigned long, but I bet during Charm++-land, that array
size
is an int, correct? Can that type-mismatch be a cause of this error?

Thanks,
J

On 05.03.2018 16:15, Sam White wrote:
> I don't think we've ever encountered that issue. Our per-commit and nightly
> tests all run charm/examples/charm++/reductions/typed_reduction, which has
> [reductiontarget] entry methods with the following signatures:
>
> entry [reductiontarget] void typed_array_ints_done(int results[n], int n);
>
> entry [reductiontarget] void typed_indiv_ints_done(int x, int y, int z);
>
> entry [reductiontarget] void typed_array_doubles_done(int n, double
> results[n]);
>
> Each reduction target method contains assertions that the results are
> correct, and I don't remember ever seeing a failure in those tests.
>
> -Sam
>
> On Thu, May 3, 2018 at 3:58 PM, Jozsef Bakosi
> <jbakosi AT lanl.gov>
> wrote:
>
> > Another question: Have you guys ever encountered a problem with a
> > reduction that
> > used the target in the form of
> >
> > entry [reductiontarget] void minstat(double min[n], int n);
> >
> > that was contributed to with the absolutely certainly correct size array
> > as
> >
> > std::array< double, 2 > sum{{ 0.0, 0.0 }};
> > contribute( sum.size()*sizeof(double), sum.data(),
> > CkReduction::sum_double, cb );
> >
> > yet it had the wrong value n in the reduction target? Very occasionally
> > (e.g., 1
> > of 50K builds), this is triggered for me (by an assert), and I have no
> > clue how
> > to even begin thinking about this.
> >
> > For now, I'm replacing my reduction target to take a list of doubles, as
> > in
> >
> > entry [reductiontarget] void minstat(double min1, double min2);
> >
> > instead of the 1st version because n is not large, but I wonder if I am
> > just
> > sweeping this problem under the rug by not even being able to test for the
> > number doubles received. (I.e., if I still have this problem, now min2
> > might
> > just not be valid at all - which will just be the wrong result and error
> > later.)
> >
> > This is very curious...any clue is appreciated.
> >
> > J
> >
> > On 05.03.2018 14:41, Jozsef Bakosi wrote:
> > > Thanks, that makes sense. Thanks for the help.
> > > J
> > >
> > > On 05.03.2018 15:35, Sam White wrote:
> > > > [reductiontarget] entry methods are not type-safe, and you can't
> > overload
> > > > them currently. We have a related issue to this open right now:
> > > > https://charm.cs.illinois.edu/redmine/issues/1700
> > > >
> > > > There is no difference between the following:
> > > > 1. entry [reductiontarget] void minstat(double min[n], int n);
> > > > 2. entry [reductiontarget] void minstat(int n, double min[n]);
> > > >
> > > > -Sam
> > > >
> > > > On Thu, May 3, 2018 at 3:16 PM, Jozsef Bakosi
> > > > <jbakosi AT lanl.gov>
> > wrote:
> > > >
> > > > > Cool, thanks, Sam. Both versions work.
> > > > >
> > > > > However, I am now curious about the 2nd one: If I can contribute a
> > > > > std::vector
> > > > > with a size determined at runtime to a reduction, how can you guys
> > find the
> > > > > right overload at compile-time for the target using the 2nd version?
> > That
> > > > > seem
> > > > > fishy to me, or at least not type-safe.
> > > > >
> > > > > In particular, it also appears to work with
> > > > >
> > > > > entry [reductiontarget] void minstat(double min1, double min2,
> > double
> > > > > min3);
> > > > >
> > > > > And if I use min3 I risk formatting the internet? ;-)
> > > > >
> > > > > Another question: I have so far been using
> > > > >
> > > > > entry [reductiontarget] void minstat(double min[n], int n);
> > > > >
> > > > > but
> > > > >
> > > > > entry [reductiontarget] void minstat(int n, double min[n]);
> > > > >
> > > > > also works. Is there a reason to prefer one over the other? Are they
> > just
> > > > > overloads of the same?
> > > > >
> > > > > J
> > > > >
> > > > > On 05.03.2018 14:45, Sam White wrote:
> > > > > > The [reductiontarget] entry method still needs to take a C-style
> > array
> > > > > of N
> > > > > > doubles as its argument or separate doubles, despite now being
> > able to
> > > > > > contribute a std::vector directly to the reduction. So it should
> > like
> > > > > this:
> > > > > >
> > > > > > entry [reductiontarget] void minstat(int n, double min[n]);
> > > > > >
> > > > > > or this:
> > > > > >
> > > > > > entry [reductiontarget] void minstat(double min1, double
> > > > > > min2);
> > > > > >
> > > > > >
> > > > > > We'll update our documentation to make this clear.
> > > > > >
> > > > > > -Sam
> > > > > >
> > > > > > On Thu, May 3, 2018 at 2:31 PM, Jozsef Bakosi
> > > > > > <jbakosi AT lanl.gov>
> > wrote:
> > > > > >
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > I'm trying do a reduction on a std::vector:
> > > > > > >
> > > > > > > std::vector< double > min{ 1.0, 1.0 };
> > > > > > >
> > > > > > > contribute( min, CkReduction::min_double,
> > > > > > > CkCallback(CkReductionTarget(Target,minstat), target) );
> > > > > > >
> > > > > > > with
> > > > > > >
> > > > > > > chare Target {
> > > > > > > entry [reductiontarget] void minstat( std::vector< double > d
> > );
> > > > > > > }
> > > > > > >
> > > > > > > class Target : public CBase_Target {
> > > > > > > public:
> > > > > > > void minstat( std::vector< double > d ) {
> > > > > > > ...
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > However, I get:
> > > > > > >
> > > > > > > 161: ------------- Processor 0 Exiting: Called CmiAbort
> > ------------
> > > > > > > 161: Reason: Unhandled C++ exception in user code.
> > > > > > > 161:
> > > > > > > 161: [0] Stack Traceback:
> > > > > > > 161: [0:0] CmiAbortHelper+0xc1 [0x7fffe75ccfc1]
> > > > > > > 161: [0:1] +0x7148 [0x7fffe75ca148]
> > > > > > > 161: [0:2] +0x17cb88 [0x7fffe7d59b88]
> > > > > > > 161: [0:3] +0x17cb69 [0x7fffe7d59b69]
> > > > > > > 161: [0:4] +0x22763 [0x7fffe5536763]
> > > > > > > 161: [0:5] +0x25516 [0x7fffe5539516]
> > > > > > > 161: [0:6] +0x254af [0x7fffe55394af]
> > > > > > > 161: [0:7] std::__1::__vector_base_
> > common<true>::__throw_length_
> > > > > error()
> > > > > > > const+0x44 [0x7fffe57e00a4]
> > > > > > > 161: [0:8] std::__1::vector<double,
> > > > > > > std::__1::allocator<double>
> > > > > > > >::__append(unsigned long)+0x139 [0x5aaf59]
> > > > > > > 161: [0:9] std::__1::vector<double,
> > > > > > > std::__1::allocator<double>
> > > > > > > >::resize(unsigned long)+0x7b [0x5aa9db]
> > > > > > > 161: [0:10] void PUP::operator|<double>(PUP::er&,
> > > > > > > std::__1::vector<double, std::__1::allocator<double> >&)+0x44
> > > > > [0x5aa884]
> > > > > > > 161: [0:11] _ZN3PUP6detailorINSt3__16vectorIdNS2_
> > > > > 9allocatorIdEEEEEEvRNS_
> > > > > > > 2erERNS0_21TemporaryObjectHolderIT_Xsr3std16is_constructibleINS_
> > > > > > > 11reconstructESA_EE5valueEEE+0x1d [0x5b204d]
> > > > > > > 161: [0:12] inciter::CkIndex_Transporter::
> > > > > _call_redn_wrapper_minstat_marshall11(void*,
> > > > > > > void*)+0x80 [0x7ffff649d6f0]
> > > > > > > 161: [0:13] CkDeliverMessageFree+0x41 [0x7fffe7d73b21]
> > > > > > > 161: [0:14] +0x19763b [0x7fffe7d7463b]
> > > > > > > 161: [0:15] +0x19ecfb [0x7fffe7d7bcfb]
> > > > > > > 161: [0:16] +0x199496 [0x7fffe7d76496]
> > > > > > > 161: [0:17] _processHandler(void*, CkCoreState*)+0x245
> > > > > [0x7fffe7d75a85]
> > > > > > > 161: [0:18] CmiHandleMessage+0x97 [0x7fffe735faa7]
> > > > > > > 161: [0:19] CsdScheduleForever+0xc4 [0x7fffe735fdd4]
> > > > > > > 161: [0:20] CsdScheduler+0x1a [0x7fffe735fafa]
> > > > > > > 161: [0:21] +0x9c91 [0x7fffe75ccc91]
> > > > > > > 161: [0:22] ConverseInit+0x6c3 [0x7fffe75cc543]
> > > > > > > 161: [0:23] main+0x45 [0x7fffe825e755]
> > > > > > > 161: [0:24] __libc_start_main+0xe7 [0x7fffe4d45a87]
> > > > > > > 161: [0:25] [0x4bd00a]
> > > > > > >
> > > > > > > I also tried with
> > > > > > > entry [reductiontarget] void minstat( const std::vector<
> > double >& d
> > > > > );
> > > > > > > and
> > > > > > > void minstat( const std::vector< double >& d ) {
> > > > > > >
> > > > > > > yielding the same problem. Can someone enlighten me on how to
> > reduce a
> > > > > > > std::vector?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jozsef

[charm] Reduction on std::vector, Jozsef Bakosi, 05/03/2018
- <Possible follow-up(s)>
- Re: [charm] Reduction on std::vector, Sam White, 05/03/2018
  - Re: [charm] Reduction on std::vector, Jozsef Bakosi, 05/03/2018
  - Message not available
    - Re: [charm] Reduction on std::vector, Sam White, 05/03/2018
      - Re: [charm] Reduction on std::vector, Jozsef Bakosi, 05/03/2018
        
        Re: [charm] Reduction on std::vector, Jozsef Bakosi, 05/03/2018
        
        Message not available
        
        Re: [charm] Reduction on std::vector, Sam White, 05/03/2018
        
        Re: [charm] Reduction on std::vector, Jozsef Bakosi, 05/03/2018
        
        Message not available
        Re: [charm] Reduction on std::vector, Sam White, 05/03/2018