Skip to Content.
Sympa Menu

charm - Re: [charm] Randomized queue

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Randomized queue


Chronological Thread 
  • From: Jozsef Bakosi <jbakosi AT lanl.gov>
  • To: charm AT cs.illinois.edu
  • Subject: Re: [charm] Randomized queue
  • Date: Tue, 11 Jul 2017 12:19:38 -0600
  • Authentication-results: illinois.edu; spf=softfail smtp.mailfrom=jbakosi AT lanl.gov

Follow-up:

Resorting to my favorite debugging technique, blind experimentation ;-), I
have
found that labelling some of the entry methods as [expedited], whose calls I
*think* might be in the queues at the same time, I managed to successfully run
all my regression tests which exercise the asynchronous logic in question.
These tests run the same problem with 1,2,3,4,5,6,7,8 PEs and no
virtualization,
with the same numbers of PEs with some virtualization, and yet another series
with a lot of virtualization, i.e., overdecomposition, altogether 8x3=24
tests.

Though this appears to work, it does not necessarily give me a warm feeling
as I
am shooting in the dark with close to zero understanding of what I'm doing.

I do have a question too: Does the [expedited] entry method attribute actually
have an effect even with randomized queues? It appears so, but I guess, the
randomized queues are not purely random then ...

Jozsef

On 07.11.2017 09:40, Jozsef Bakosi wrote:
> Hi folks,
>
> I'm trying to use randomized queues to test the correctness of my
> asynchronous
> logic. In a section of my code I have a set of chare array elements making
> cklocalBranch()-> calls to group branches on their PE which then spawn
> point-to-point communications to each other to exchange information. From
> the
> array elements, I start two sets of these cklocalBranch()-> calls after each
> other, and both of these paths of execution end up in a similar
> communication
> pattern using point-to-point entry method calls among multiple group
> branches to
> different entry methods.
>
> This logic is correct with the default queue, but causes a hang with
> randomized
> queues. I guess since the randomized queue schedules messages randomly, so
> some
> entry methods are instructed to be called (i.e. put in the queue) but never
> end
> up called. I verified that this is the case with simple printouts: correctly
> called but with, e.g., Group::thisProxy[ otherPE ].fn(), but fn() on the
> otherPE
> does not always gets called, i.e., not on all PEs as it should be, only on
> some
> of the PEs called.
>
> My questions:
>
> 1. Can I expect two simultaneous sets of point-to-point entry method calls
> (targeting different entry methods) on the same group to perform correctly
> with
> randomized queues?
>
> 2. I have tried charmdebug but I only get
>
> ParDebug> [...]/charmrun +p4 [...]/pgm <opts> +cpd +DebugDisplay
> 127.0.1.1:0.0 ++server +DebugSuspend
> ParDebug> Error executing [...]/charmrun +p4 [...]/pgm <opts> +cpd
> +DebugDisplay 127.0.1.1:0.0 ++server +DebugSuspend
>
> The charmdebug documentation says: "charmdebug command line launching only
> works
> on net-*, netlrts-*, and verbs-* builds of Charm++" Does charmdebug work
> with
> the MPI backend?
>
> 3. Is there a way to look at the queues somehow?
>
> Thanks,
> Jozsef
> --
> Jozsef Bakosi
> Computational Physics and Methods (CCS-2)
> Los Alamos National Laboratory



Archive powered by MHonArc 2.6.19.

Top of Page