Skip to Content.
Sympa Menu

k-user - Re: [[K-user] ] Differences between Token and Lexer

k-user AT lists.cs.illinois.edu

Subject: K-user mailing list

List archive

Re: [[K-user] ] Differences between Token and Lexer


Chronological Thread 
  • From: Radu Mereuta <headness13 AT gmail.com>
  • To: Eric Huber <echuber2 AT illinois.edu>
  • Cc: k-user AT lists.cs.illinois.edu
  • Subject: Re: [[K-user] ] Differences between Token and Lexer
  • Date: Tue, 31 May 2016 00:16:03 +0300

I guess the best place to look at right now, is the GitHub wiki: https://github.com/kframework/k/wiki
Specifically this page: https://github.com/kframework/k/wiki/Syntax-specification-for-the-new-parser

We are still in the process of cleaning up some old predefined .k files, but the newest ones can be found at /include/builtin/
You can find there all the predefined syntax, operations, and defaults.

Other sources of documentation might be outdated, but those files are used right now for compilation (and are being tested), so they are the (barebone) reference for now.

Radu


On Mon, May 30, 2016 at 11:55 PM, Eric Huber <echuber2 AT illinois.edu> wrote:

Thank you for the clarification. Actually it's K4 that I have been playing around with, so I'm trying to understand the pieces of documentation that I can find and reading the examples packaged with the distribution, which is where I found this note. Is there a rundown of K4's lexer syntax anywhere?

Many thanks,
Eric Huber

On May 30, 2016 3:14 PM, "Radu Mereuta" <headness13 AT gmail.com> wrote:
Hi Eric,

Token is exactly like Lexer, with the addition that it tries to infer a follow restriction automatically (nothing too fancy, just look at the regex, and if it has a * or + group at the end, add a follow rule -/- with that group). This works in simple cases like Identifiers, but on more complex regular expressions, like MODNAME above, it requires user attention.

I'm assuming you are using K3.6, which uses SDF as a parser generator, so the behavior is manifested from there.
The follow restrictions are useful in cases like the following:
syntax Exp ::= Exp Exp | Id
syntax Id ::= Lexer{[a-zA-Z][a-zA-Z0-9]*}
A simple input like "ab", with the rules above, will be ambiguous, resulting in the following parses: '__(Id("a"), Id("b")) and Id("ab").
Adding the follow restriction:
syntax Id -/- [a-zA-Z0-9]
will tell the parser not to generate the Id("a") because it is followed by a 'b'. So it forces the parser to continue parsing greedily.
Eventually, Id will stop parsing at the first whitespace, punctuation or <eof>, and the follow restriction will be respected in that case.

I hope this clarifies things a bit.

If you are interested in more things about K, we are working on a new version, K4, which has a completely redesigned lexer,
which allows for more control, compactness and brevity.

Radu
K dev


On Fri, May 27, 2016 at 8:44 AM, Eric Huber <echuber2 AT illinois.edu> wrote:
Hi, I'm wondering if I understand correctly the differences between Token and Lexer. Lexer is not greedy, so it must be given the follow rule with -/- to show when it may NOT stop collecting characters for one token, correct? Whereas Token is greedy and can not use -/- rules.

Also, this detail from ref-manual.k seems counter-intuitive:
  The construction {<char set> "<separator>"}+ will match 'A', 'A-A',
  'A-A-A', but not 'A-'.

Doesn't the syntax suggest that the separator character must follow each character? I take it that the only reason this works is because in the particular example:

syntax MODNAME ::= Lexer{"#"?{[a-z0-9A-Z\_]+ "-"}+}
syntax MODNAME -/- [a-z0-9A-Z\-\_]

it is using Lexer with the construction, so it is not a greedy selection, and the follow exclusion rule says that - may not appear at the end. But the line in the documentation makes it sound like a special construction to specify ranges with a separator.

Thank you,
Eric Huber





Archive powered by MHonArc 2.6.16.

Top of Page