maude-help AT lists.cs.illinois.edu

Subject: Maude-help mailing list

List archive

[Maude-help] fast parsing

From: Chucky Ellison <celliso2 AT illinois.edu>
To: maude-help AT cs.uiuc.edu
Subject: [Maude-help] fast parsing
Date: Wed, 9 Jun 2010 20:41:56 -0500
List-archive: <http://lists.cs.uiuc.edu/pipermail/maude-help>
List-id: <maude-help.cs.uiuc.edu>

I'm trying to parse abstract syntax trees for very large programs (I
use Maude to perform operations on the trees after parsed). I have an
abstract syntax tree syntax in Maude that I can change. I also have
complete control over the actual data that gets given to Maude (I use
an external parser and custom maude-pretty-printer). However, I need
to know how to design the Maude syntax and input for fast parsing. I
should also add that I'm aware that it's not just the PARSING that
takes time, but the creation of the data structures that wind up
holding the terms. I want to also minimize this compilation time.

I've experimented by using a perl program to generate Maude terms, but
I have gotten very inconsistent results. I've tried all sorts of
experiments, but I've included one with this e-mail. I don't
understand why m2.maude is so much slower to parse (compile) than
m1.maude. Both lists have the same number of elements, but m2 is not
uniformly distributed, and is less randomly ordered. m2 will speed up
significantly if the elements are sorted or reverse-sorted. This kind
of thing makes any of my parsing test results questionable---there is
clearly a whole lot going on behind the scenes.

I'll now mention a few ideas I have in mind, but please let me know if
there are other issues that are more important speed-wise. At this
point I'm ready to believe that using operators that start with
lower-case letters is faster than upper case :)

Are there any particular operators to avoid? For example
op seq : K K -> K
vs
op __ : K K -> K

Should I avoid mixfix operators entirely? Or, is it okay to have
mixfix operators as long as I give it to Maude in prefix?
__(a, b)
vs
(a b).

There are elements in the syntax that can be considered very long
lists (for examples, lists of statements). Should I actually have
these be assoc in the syntax? Should I fully paren the elements?
Should I left-assoc or right-assoc if I paren? Also, gathering etc.?
I thought that fully parenthesizing some of these lists would speed
things up, but after trying it, it seemed to actually slow it down.
For example, I have a particular program that takes 50s when given
flat and parsed with [assoc], but 100s when parsed with no attributes
and fully parenthesized as (X (X X)).

Would it be a good idea to generate medium sized constants that
rewrite to portions of the AST? Then none of the input terms would be
huge. E.g.,
op tree : -> K .
eq tree = <one big tree> .
vs
ops tree left right : -> K .
eq tree = left right .
eq left = <smaller tree> .
eq left = <smaller tree> .
or, similarly for lists
eq list = <one big list>
vs
eq list = left right .

Having a new constant for every subterm seems like it would be
counterproductive, but perhaps this technique in moderation could
help?

At the risk of repeating myself, is there set of guidelines, both for
designing a syntax and crafting the actual input, to maximize parsing
speed? Remember, because I'm generating the input to Maude, and have
control over the syntax, pretty much any solution works as long as it
makes Maude's parsing faster.

Any help would be appreciated,
-Chucky

Attachment: m2.maude
Description: Binary data

Attachment: m1.maude
Description: Binary data

[Maude-help] fast parsing, Chucky Ellison, 06/09/2010