Skip to Content.
Sympa Menu

charm - Re: [charm] Deadlock detection

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Deadlock detection


Chronological Thread 
  • From: Jozsef Bakosi <jbakosi AT lanl.gov>
  • To: Vinicius Freitas <vinicius.mct.freitas AT gmail.com>, charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Deadlock detection
  • Date: Tue, 26 Jun 2018 18:41:30 -0600
  • Authentication-results: illinois.edu; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dmarc=pass header.from=lanl.gov

Thanks, Vinicius,

I also thought about quiescence detection, but the manual says:

"In Charm++, quiescence is defined as the state in which no processor is
executing an entry point, no messages are awaiting processing, and there are
no
messages in-flight."

Does a deadlock qualify as quiescence? I think if there is a deadlock,
messages
may be awaiting processing, i.e., entry methods are waiting to be called, but
for some reason they are not called. In other words, would quiescence happen
for
a deadlock?

Also even if the answer is yes, how and what kind of information do I get at
the
application level after the my function is called after quiescence? How would
that help debugging what lead to that function call?

Thanks,
Jozsef

On 06.26.2018 21:06, Vinicius Freitas wrote:
> Jozsef,
>
> Charm has a Quiescence detection mechanism that might help you. As soon as
> you start a Quiescence detection in Charm, you will also declare to which
> method it will reduce to (and then, every chare will execute), and it will
> only be triggered once NOTHING is happening on the system. Would this help
> you
> in any way?
>
> Vinicius F.
>
> Em ter, 26 de jun de 2018 20:04, Jozsef Bakosi
> <jbakosi AT lanl.gov>
> escreveu:
>
> > Hi folks,
> >
> > Time to time I run into asynchronous logic errors (that I'm pretty sure
> > are
> > my fault) that non-deterministically produce deadlocks without an error
> > message.
> >
> > I wonder what tools I have available that I can use to diagnose such
> > problems.
> >
> > Somehow, it would be great if I could detect from the runtime system that
> > some messages are being waited on, but nothing really happens in which
> > case
> > I could dump some messages and their labels/ids/etc, that would help
> > identify at least what entry methods are waiting for messages, or
> > something
> > similar.
> >
> > What do you suggest? How do you deal with such errors?
> >
> > Thanks,
> > Jozsef



Archive powered by MHonArc 2.6.19.

Top of Page