parser hacking: stringification

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

parser hacking: stringification

Sven Olsen
Hi guys,

This is another post reporting on a parser hack I've been playing around with.  I've been trying to keep my Lua hacking to a sensible minimum; at this point, the only piece of non-standard sugar I'm really attached to is Peter Shook's 'in' shorthand.  Peter's is one of the few language tweaks that really does seem to succeed in simplifying a lot of my code without also obfuscating it.  

That said, sometimes I can't resist the urge to try making little improvements, and one of the hacks I've been playing with is starting to feel like it may be worth a list post, so here we go:

In my own scripts, I often find that I need to create tables that copy some selection of variables out of the current scope.  For example:

{
  star=star,
  planet=planet,
  year=year,
}

To make those tables a bit less verbose and typo prone, I added an extension to the syntax sugar for fields, one which makes 

{ ..star, ..planet, ..year} 

shorthand for 

{star=star, planet=planet, year=year}

I'm not sure what to call this type of shorthand -- and I'm having a hard time thinking of a similar feature in any other languages.   The operator it most reminds me of is the "stringification" allowed in C's preprocessor.  But perhaps some of the more knowledgeable members of the list can suggest a better analogy :)

Anyhow, once I'd finished hacking the table construction code, I couldn't resist taking it a little further, and adding a version of my "stringification" shorthand to function arguments.  The idea here is to simplify function calls like:

   draw_point(x,y,"color",color,"size",size,"alpha",alpha)

By allowing them to be abbreviated as:

   draw_point(x,y,..color,..size,..alpha)

Both cases of the "stringficiation" shorthand are easy to hack into lparser.c -- together they're around a 50 line diff.  My implementation is admittedly crude -- it works by referencing the current token's seminfo string.  An interesting side effect of the approach is that the shorthand can sometimes give useful results when it's applied to more complex expressions.  For example 

{..planet, ..planet.star} becomes shorthand for {planet=planet,star=planet.star} and

draw_point(x,y,..hash[color]) becomes shorthand for draw_point(x,y,"color",hash[color])

I should perhaps throw a syntax error in such cases, but, it's a useful enough feature that I'm tempted to leave the code as it is.  

As always, if anyone's interested, I'll take a shot at packaging the diff as a powerpatch.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

spir ☣
On 16/11/2012 03:36, Sven Olsen wrote:

> Hi guys,
>
> This is another post reporting on a parser hack I've been playing around
> with.  I've been trying to keep my Lua hacking to a sensible minimum; at
> this point, the only piece of non-standard sugar I'm really attached to is
> Peter Shook's 'in' shorthand.  Peter's is one of the few language tweaks
> that really does seem to succeed in simplifying a lot of my code without
> also obfuscating it.
>
> That said, sometimes I can't resist the urge to try making little
> improvements, and one of the hacks I've been playing with is starting to
> feel like it may be worth a list post, so here we go:
>
> In my own scripts, I often find that I need to create tables that copy some
> selection of variables out of the current scope.  For example:
>
> {
>    star=star,
>    planet=planet,
>    year=year,
> }
>
> To make those tables a bit less verbose and typo prone, I added an
> extension to the syntax sugar for fields, one which makes
>
> { ..star, ..planet, ..year}
>
> shorthand for
>
> {star=star, planet=planet, year=year}
>
> I'm not sure what to call this type of shorthand -- and I'm having a hard
> time thinking of a similar feature in any other languages.   The operator
> it most reminds me of is the "stringification" allowed in C's preprocessor.
>   But perhaps some of the more knowledgeable members of the list can suggest
> a better analogy :)
>
> Anyhow, once I'd finished hacking the table construction code, I couldn't
> resist taking it a little further, and adding a version of my
> "stringification" shorthand to function arguments.  The idea here is to
> simplify function calls like:
>
>     draw_point(x,y,"color",color,"size",size,"alpha",alpha)
>
> By allowing them to be abbreviated as:
>
>     draw_point(x,y,..color,..size,..alpha)
>
> Both cases of the "stringficiation" shorthand are easy to hack into
> lparser.c -- together they're around a 50 line diff.  My implementation
> is admittedly crude -- it works by referencing the current token's seminfo
> string.  An interesting side effect of the approach is that the shorthand
> can sometimes give useful results when it's applied to more complex
> expressions.  For example
>
> {..planet, ..planet.star} becomes shorthand for
> {planet=planet,star=planet.star} and
>
> draw_point(x,y,..hash[color]) becomes shorthand
> for draw_point(x,y,"color",hash[color])
>
> I should perhaps throw a syntax error in such cases, but, it's a useful
> enough feature that I'm tempted to leave the code as it is.
>
> As always, if anyone's interested, I'll take a shot at packaging the diff
> as a powerpatch.
>
> -Sven

There's something like your version for params in Ocaml (don't remember the
details, however it's a bit more complicated there as they introduce a kind of
virtuel param between formal and actual ones). There are also languages which
make such automatic ids for custom types to avoid idiotic code like
    Point.init = function(p, x, y)
       p.x = x
       p.y = y
    end
which I find close to your point (sic).
I'd call that "auto-id" since it's about providing auto-matic ids from the id of
the passed variable it-self (auto).

denis

Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Sven Olsen
Hrm, so I'm starting to doubt that a patch file is the best way of sharing this hack.  My hunch is that most people who might want it are probably already using hacked versions of lparser.c, so, a machine readable diff isn't going to be ideal :) 

But the code is easy enough to talk through, so here's a quick implementation-level description.

The entry point for the table shorthand is in lparser.c : field(), where you can hook into the table constructor by adding a switch case:

static void field (LexState *ls, struct ConsControl *cc) {
  /* field -> listfield | recfield */
  switch(ls->t.token) {
    case TK_CONCAT: {
      luaX_next(ls);
      table_shorthand(ls,cc);
      break;
    }

Implementing the shorthand is then just a matter of writing table_shorthand(), a modified version of rectfield() that gets both the key and the value from the expression following '..'.   Such a modification is fairly easy, once you realize that the lexer stores semantic info for the most recent string literal, name, or numeric constant inside ls->t.seminfo.  

In the case that the most recent seminfo is a numeric constant, we probably want to throw a syntax error.  It's unclear if {..77} should be interpreted as {[77]=77}, or {["77"]=77}, and, neither interpretation seems likely to be that useful in practice.  

But, if the most recent stored seminfo is a string, it's probably a sensible choice for our key. 
    
/* a quick shorthand hack that transforms {..f} to {f=f}. */
static void table_shorthand (LexState *ls, struct ConsControl *cc) {
  expdesc key, val;
  FuncState *fs = ls->fs;
  int reg;
  int rkkey;
  checklimit(fs, cc->nh, MAX_INT, "items in a constructor");
  cc->nh++;
  expr(ls, &val);
  reg = ls->fs->freereg;
  if(ls->t.seminfo.ts) {
    codestring(ls, &key, ls->t.seminfo.ts);
  }
  else {
    luaX_syntaxerror(ls, ".. shorthand used with a non-string expression.");
  }
  rkkey = luaK_exp2RK(fs, &key);
  luaK_codeABC(fs, OP_SETTABLE, cc->t->u.info, rkkey, luaK_exp2RK(fs, &val));
  fs->freereg = reg;  /* free registers */
}

Implementing the function shorthand is a little trickier.  Rather than calling expr() from explist(), we will call dup_expr(), an expr() wrapper that will check for and handle the .. shorthand.  However, the implementation of dup_expr() is necessarily ugly, because we won't be ready to insert the seminfo string into the call stack after we're finished parsing the expression.  As I understand Lua's bytecode generation, the best way to do this is by reserving a spot before we start the expression parse, and then using a kludge to convince luaK_exp2nextreg to write the string into the empty slot.

/* another parser hack.  this one turns foo(..bar) into foo("bar",bar). */
static void dup_expr(LexState *ls, expdesc *v, int *np) {
  int dup = testnext(ls,TK_CONCAT); 
  FuncState *fs = ls->fs;
  int reg;
  if(dup) {
    reg=fs->freereg;
    luaK_reserveregs(fs, 1);
  }
  expr(ls,v);
  if(dup) {
    if(ls->t.seminfo.ts) {
      int old_free = fs->freereg;
      expdesc varname;
      codestring(ls, &varname, ls->t.seminfo.ts);
      /* trick luaK_exp2nextreg into writting to the */
      /* previously reserved register.  i believe this is safe.. */
      fs->freereg=reg;
      luaK_exp2nextreg(fs, &varname);
      fs->freereg=old_free;
      (*np)++;
    }
    else luaX_syntaxerror(ls, "stringification shorthand used on a non-string expression.");
  }
}

static int explist (LexState *ls, expdesc *v) {
  /* explist -> expr { `,' expr } */
  int n = 1;  /* at least one expression */
  dup_expr(ls, v,&n);
  while (testnext(ls, ',')) {
    luaK_exp2nextreg(ls->fs, v);
    dup_expr(ls, v,&n);
    n++;
  }
  return n;
}

I should note that my own parser is based on the 5.2 source -- but, I suspect that both hacks would also work with 5.1.  If anyone tries it, let me know whether you have any success :)

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Sven Olsen

In the case that the most recent seminfo is a numeric constant, we probably want to throw a syntax error.  It's unclear if {..77} should be interpreted as {[77]=77}, or {["77"]=77}, and, neither interpretation seems likely to be that useful in practice.  


Though now that I think about it, having 
   print(..f(i)) 
be valid, while throwing an error on
   print(..f(1))
is a little weird.  I'm pretty certain that, in the cases where we encounter an undefined seminfo.ts, seminfo.r must be holding a numeric constant.  So replacing the codestring() call with

   init_exp(&key, VKNUM, 0);
   key.u.nval = ls->t.seminfo.r;

should be a fairly reasonable way of handling the situation.

But, as I said earlier, I'm really not convinced that any kind of support for complex expressions is a good idea -- throwing an error anytime the expression ends up parsing more than a single name token would feel cleaner.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Luiz Henrique de Figueiredo
In reply to this post by Sven Olsen
> To make those tables a bit less verbose and typo prone, I added an
> extension to the syntax sugar for fields, one which makes
>
> { ..star, ..planet, ..year}
>
> shorthand for
>
> {star=star, planet=planet, year=year}

If you can use another syntax, like say !star, then this can be easily done
with a token filter, without the need to dive into the parser or produce a
patch.

Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Robert Virding
In reply to this post by Sven Olsen
I think that the ease of adding this syntactic change is really irrelevant to whether it should be included. I personally think that it leads to unclear and cryptic code. Also it is not consistent, the same syntax means different things in in different places. Which is a Bad Thing.

Robert


From: "Sven Olsen" <[hidden email]>
To: "Lua mailing list" <[hidden email]>
Sent: Friday, 16 November, 2012 3:36:25 AM
Subject: parser hacking: stringification

Hi guys,

This is another post reporting on a parser hack I've been playing around with.  I've been trying to keep my Lua hacking to a sensible minimum; at this point, the only piece of non-standard sugar I'm really attached to is Peter Shook's 'in' shorthand.  Peter's is one of the few language tweaks that really does seem to succeed in simplifying a lot of my code without also obfuscating it.  

That said, sometimes I can't resist the urge to try making little improvements, and one of the hacks I've been playing with is starting to feel like it may be worth a list post, so here we go:

In my own scripts, I often find that I need to create tables that copy some selection of variables out of the current scope.  For example:

{
  star=star,
  planet=planet,
  year=year,
}

To make those tables a bit less verbose and typo prone, I added an extension to the syntax sugar for fields, one which makes 

{ ..star, ..planet, ..year} 

shorthand for 

{star=star, planet=planet, year=year}

I'm not sure what to call this type of shorthand -- and I'm having a hard time thinking of a similar feature in any other languages.   The operator it most reminds me of is the "stringification" allowed in C's preprocessor.  But perhaps some of the more knowledgeable members of the list can suggest a better analogy :)

Anyhow, once I'd finished hacking the table construction code, I couldn't resist taking it a little further, and adding a version of my "stringification" shorthand to function arguments.  The idea here is to simplify function calls like:

   draw_point(x,y,"color",color,"size",size,"alpha",alpha)

By allowing them to be abbreviated as:

   draw_point(x,y,..color,..size,..alpha)

Both cases of the "stringficiation" shorthand are easy to hack into lparser.c -- together they're around a 50 line diff.  My implementation is admittedly crude -- it works by referencing the current token's seminfo string.  An interesting side effect of the approach is that the shorthand can sometimes give useful results when it's applied to more complex expressions.  For example 

{..planet, ..planet.star} becomes shorthand for {planet=planet,star=planet.star} and

draw_point(x,y,..hash[color]) becomes shorthand for draw_point(x,y,"color",hash[color])

I should perhaps throw a syntax error in such cases, but, it's a useful enough feature that I'm tempted to leave the code as it is.  

As always, if anyone's interested, I'll take a shot at packaging the diff as a powerpatch.

-Sven

Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Sven Olsen
In reply to this post by Luiz Henrique de Figueiredo

If you can use another syntax, like say !star, then this can be easily done
with a token filter, without the need to dive into the parser or produce a
patch.

If we were going to do a version of this shorthand as a token filter, we'd also probably want to follow Robert's suggestion, and use different operators for the table and function cases.  But yes, I'd think a { *star } or draw_point(x,y,!color) style shorthand would be a simple (and arguable useful?) use case for a token filter.  The the token filtering approach would also have some error handling advantages over a parser patch, as it would make it easy to limit the valid arguments to single name tokens, whereas my patch currently does weird things when it hits a complex expression.

On the other hand, a parser patch is a bit faster -- and, for myself, at least, has the advantage of already being implemented :)

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Sven Olsen
In reply to this post by Robert Virding

I think that the ease of adding this syntactic change is really irrelevant to whether it should be included. I personally think that it leads to unclear and cryptic code. Also it is not consistent, the same syntax means different things in in different places. Which is a Bad Thing.

Yes, there are certainly good reasons to stay away from this hack :)

The strongest argument against it, I think, is that there's really not much precedent for this kind of shorthand in any other language.  So any programmer who comes across it, no matter how experienced, is going to be confused.  And  that's never a good thing :)

But, in my own scripts at least, the shorthand is simplifying a lot of situations where I'd otherwise be writing highly repetitive code.  And after living with it for a couple months, I'm finding it clearer than vanilla lua.  My hope, when I first wrote it up, was that the shorthand would have an obfuscation/simplification tradeoff similar to +='s.  So while it would certainly be obscure for people who weren't used to it, it would have a chance of clarifying things for people who were.  To that end, I think it works better if it's implemented using a relatively large, multi-character token.  {.star} or {!star} are both easy to miss-read as {star}.  But {..star} is fairly obviously some kind of shorthand.  And I like that it's evocative of '...', as the semantics remind me of an 'et cetera'.

That said, if I had to pick a single addition to make to the 5.2 Lua spec, this wouldn't be it.  I'd go with either Peter's table unpack semantic, or possibly an increment operator.  But, there's a small community of people on the list who enjoy playing around with parser hacks, and for their benefit, I thought the idea might still be worth posting :)

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Isaac Dupree-2
In reply to this post by Sven Olsen
On 11/15/2012 09:36 PM, Sven Olsen wrote:

> { ..star, ..planet, ..year}
>
> shorthand for
>
> {star=star, planet=planet, year=year}
>
> I'm not sure what to call this type of shorthand -- and I'm having a
> hard time thinking of a similar feature in any other languages.   The
> operator it most reminds me of is the "stringification" allowed in C's
> preprocessor.  But perhaps some of the more knowledgeable members of the
> list can suggest a better analogy :)

Haskell has a syntax for this:
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#record-puns

-Isaac


Reply | Threaded
Open this post in threaded view
|

Re: parser hacking: stringification

Sven Olsen

Haskell has a syntax for this:
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#record-puns

Clearly, Haskell is awesome :)

Thanks Isaac.

-Sven

PS: I'm not certain I like the name though.  "Record Puns" doesn't give you much clue about what the syntax does. But it's still good to know there is, in fact, a precedent.