Help with Lexer/Parser Internals Please!

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with Lexer/Parser Internals Please!

John Hind
Help Guys!

I wonder if anyone can offer me some help with a powerpatch to the Lua
lexer/parser (lparser.c)?

I have long published a powerpatch which adds "syntax sugar" to default a
missing value in a constructor list to Boolean true, intended for
tables implementing sets, on the analogy of the sugar for lists:

t = {["cat"], ["dog"]}  -- Shortcut for t = {["cat"] = true, ["dog"] = true}

I've always wanted to extend this to:

t = {[cat],[dog]}

So, similarly to keys for methods, you can drop the quotes if the string is
a valid Lua name.

Whatever you think of this as a language innovation (which we can discuss
later when it exists as a powerpatch) at this point I just need
help implementing it. I'm modifying the function recfield in lparser.c. The
problem is I need to look two tokens ahead (the closing ']' and
then NOT '='). I tried the following test with luaX_lookahead:

/* Input -> [name]} */
int x1 = ls->t.token;                    /* x1 = TK_NAME */
int x2 = luaX_lookahead(ls);   /* x2 = ']' */
int x3 = luaX_lookahead(ls);   /* x3 = '}' */
luaX_next(ls);
int x4 = ls->t.token;                   /* x4 = '}'; expected ']' */

It works as expected with a single luaX_lookahead, but with two in a row the
result is returned correctly but it has the side-effect of moving
the parse point forward one token (is this a bug?).

Searching the Lua code base, luaX_lookahead is only used in one place and
then only for a single-token look ahead, so if it is a bug it has no
effect in standard Lua.

Is it possible to achieve this? Either I would need to be able to look two
tokens forward without moving the parse point, or I'd need to be
able to back up two tokens and re-parse once I'd hit (or not hit) the '='
token.

Anyone managed anything like this before?

- John


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Reply | Threaded
Open this post in threaded view
|

Re: Help with Lexer/Parser Internals Please!

Luiz Henrique de Figueiredo
> The problem is I need to look two tokens ahead

The standard Lua lexer llex.c does not support two lookaheads.

Reply | Threaded
Open this post in threaded view
|

Re: Help with Lexer/Parser Internals Please!

Frédéric van der Plancke
In reply to this post by John Hind
On 23/09/2015 16:30, John Hind wrote:

> Help Guys!
>
> I wonder if anyone can offer me some help with a powerpatch to the Lua
> lexer/parser (lparser.c)?
>
> I have long published a powerpatch which adds "syntax sugar" to default a
> missing value in a constructor list to Boolean true, intended for
> tables implementing sets, on the analogy of the sugar for lists:
>
> t = {["cat"], ["dog"]}  -- Shortcut for t = {["cat"] = true, ["dog"] = true}
>
> I've always wanted to extend this to:
>
> t = {[cat],[dog]}
>
> So, similarly to keys for methods, you can drop the quotes if the string is
> a valid Lua name.
>
> Whatever you think of this as a language innovation (which we can discuss
> later when it exists as a powerpatch)
This looks dangerous.
What is

cat = "tiger"
t = {[cat, dog]}

supposed to do ? Is your cat a "tiger" or just an ordinary "cat" ?

Frederic


Reply | Threaded
Open this post in threaded view
|

Re: Help with Lexer/Parser Internals Please!

Javier Guerra Giraldez
In reply to this post by John Hind
On Wed, Sep 23, 2015 at 9:30 AM, John Hind <[hidden email]> wrote:
> t = {["cat"], ["dog"]}  -- Shortcut for t = {["cat"] = true, ["dog"] = true}
>
> I've always wanted to extend this to:
>
> t = {[cat],[dog]}
>
> So, similarly to keys for methods, you can drop the quotes if the string is
> a valid Lua name.

if the idea of the patch is that omitted values default to `true`,
then i'd say that the short form should be

t = { cat, dog }

because in normal Lua

t = { cat=true, dog=true }

is equivalent to

t = { ['cat']=true, ['dog']=true }

so if the `=true` part is optional, then you would be able to say either

t = { ['cat'], ['dog'] }

or

t = { cat, dog }

and the form {[cat], [dog]} would try to use the variables `cat` and `dog`




of course, i personally don't think that sets are special enough, and
usually just do

function Set(t)
    local o = {}
    for _, v in ipairs(t) do
        o[v] = true
    end
    return o
end

t = Set{'cat', 'dog'}

seems short enough and very readable


--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Help with Lexer/Parser Internals Please!

Luiz Henrique de Figueiredo
> t = Set{'cat', 'dog'}
>
> seems short enough and very readable

Or simply
        t = Set'cat, dog'
where
        function Set(s)
                local t={}
                for k in s:gmatch("%w+") do
                        t[k]=true
                end
                return t
        end

Adapt the pattern "%w+" as needed.

Reply | Threaded
Open this post in threaded view
|

Re: Help with Lexer/Parser Internals Please!

Coda Highland
In reply to this post by Javier Guerra Giraldez
On Wed, Sep 23, 2015 at 9:23 AM, Javier Guerra Giraldez
<[hidden email]> wrote:

> On Wed, Sep 23, 2015 at 9:30 AM, John Hind <[hidden email]> wrote:
>> t = {["cat"], ["dog"]}  -- Shortcut for t = {["cat"] = true, ["dog"] = true}
>>
>> I've always wanted to extend this to:
>>
>> t = {[cat],[dog]}
>>
>> So, similarly to keys for methods, you can drop the quotes if the string is
>> a valid Lua name.
>
> if the idea of the patch is that omitted values default to `true`,
> then i'd say that the short form should be
>
> t = { cat, dog }
>
> because in normal Lua
>
> t = { cat=true, dog=true }
>
> is equivalent to
>
> t = { ['cat']=true, ['dog']=true }
>
> so if the `=true` part is optional, then you would be able to say either
>
> t = { ['cat'], ['dog'] }
>
> or
>
> t = { cat, dog }
>
> and the form {[cat], [dog]} would try to use the variables `cat` and `dog`

The form { cat, dog } is already well-defined (it creates a
two-element list containing the contents of the variables cat and
dog). The form should be { ['cat'], ['dog'] }.

/s/ Adam