charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] All-to-all or redn+bcast

From: Jozsef Bakosi <jbakosi AT lanl.gov>
To: charm AT lists.cs.illinois.edu
Subject: [charm] All-to-all or redn+bcast
Date: Fri, 14 May 2021 09:51:30 -0600
Authentication-results: ppops.net; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dkim=pass header.s=lanl header.d=lanl.gov

Hi folks,

I wanted to know your expert opinion on the following.

We have an all-to-all, computing a min of single scalar real value,
among many chares intended to be running at large scales. This amounts
to our single synchronization point within a time step.

I wonder if replacing the single all-to-all with a reduction + broadcast
targeting each chare may allow for more overlap. I believe a single
all-to-all is implemented as a redn+bcast to/from a single chare, and
the complexity of what I'm suggesting is probably worse, nevertheless
worth asking.

In code, with DG being a chare array, I'm suggesting to replace

contribute( sizeof(double), &mindt, CkReduction::min_double,
CkCallback(CkReductionTarget(DG,solve), thisProxy) );

with

for all DG chares i
contribute( sizeof(double), &mindt, CkReduction::min_double,
CkCallback(CkReductionTarget(DG,solve), thisProxy[i]) );
end

Would this allow for more overlap by removing the global sync or I would
throw the baby out with the bathwater because I am replacing the log(n)
algorithmic/parallel complexity with n due to the for loop?

Thanks,
Jozsef
--
Jozsef Bakosi, PhD, LANL CCS-2, o:505-665-0950, c:505-695-4523

[charm] All-to-all or redn+bcast, Jozsef Bakosi, 05/14/2021
- Re: [charm] All-to-all or redn+bcast, Eric Mikida, 05/17/2021
  - Re: [charm] [EXTERNAL] Re: All-to-all or redn+bcast, Jozsef Bakosi, 05/17/2021