introducing luaSuper

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

introducing luaSuper

Asko Kauppi

The work on luaSuper is approaching release, and I would like to share some of it here, although "code complete" stage is still a few days away, hopefully no more.

The concept is making 'luas' command a syntax modifiable version of the 'lua' interpreter. This is how it goes:

luas -s select demo.lua # loads demo.lua with syntax mods from 'luas_select.so'

or, one can make demo.lua load the syntax mods automatically, by having the following shebang line:

	#!/.../luas -s select

Some differentiation of luaSuper to regular token filtering, and MetaLua:

	- made in C++ (about 2500 lines, 4 .cpp files)
- parsing speed should be similar to regular Lua parser, _even_ when the syntax is modified
	- multiple syntax mods are usable
	- different source files can use each their own set of syntax mods

Performance and ease of making syntax modifications have been the design goals. As a sample, below is what it takes to make a "select" syntax mod, build by:

	g++ -bundle luas_select.cpp -o luas_select.so

I will hopefully get the code ready during next week. It'd be motivating to get some feedback, ideas, questions already now.

-asko


/*
* SELECT token filter                   Copyright 2006-07 Asko Kauppi
*
* Simplifies the 'select' usage a bit, making it look less like a function
*
*      #...        -> select('#',...)  = number of values in ...
*      ...[n]      -> (select(n,...))  = value #n of ...
*
* 'select(n,...)' is still used explicitly when wanting all values from 'n' onwards.
*
* License: MIT/Lua5 (see LICENSE)
*/

#include "plugin.hpp"

/*
* Called by luaSuper parser, for "#..."
*/
static bool CATCH1( PluginParser& p )
{
    p.replace( 0,2, "select", "(", "\"#\"", ",", "...", ")" );
    return true;
}

/*
* Called by luaSuper parser, for "...[exp]"
*/
static bool CATCH2( PluginParser& p )
{
    p.replace( 0,2, "(", "select", "(" );
    p.replace( -1,1, ",", "...", ")", ")" );
    return true;
}

/*
* Override normal syntax:
*
*    ["exp"]= { opt(unop),
* alt{ "nil", "false", "true", "<number>", "<string>", "...",
*                    { "{", opt(fieldlist), "}" },
*                    { "function", funcbody },
*                    prefixexp },
*               opt( binop, exp ) }
*      -->
*    ["exp"]= { opt( ent( opt(unop),
*					alt{ { "#", "..." },
* { "...", opt{ "[", exp, "]" } },
*                         		opt(binop,exp) ),
*			original }
*
* Note: Not repeating the original syntax makes us play nice with potential * other plugins. We would otherwise let the original path be considered * first, but "..." within it would make a match, avoiding "... [exp]"
*       extension to work. Alas, we need to be first.
*
*       Other plugins may further extend the 'exp' syntax, appending
*       or prepending their modifications, without troubling us or the
* original syntax (unless, of course, the syntaxes themselves do clash).
*/
void syntax_select( PluginSyntax& syntax ) {

// Note: use 'syntax["exp"]' to get the _current_ syntax, using 'ref()' // would make a loop! However, for "...[exp]" we can use 'ref()' // since it's within an 'alt' case. Ref allows our syntax to be // recursively applied, and further changes to 'exp' to apply also
    //       within our syntax.

    const EntBase *unop= ref("unop");
    const EntBase *binop= ref("binop");
    const EntBase *exp= ref("exp");

    syntax["exp"]= opt( ent( opt(unop),
                             alt( "#", ent( "...", CATCH1 ),
"...", ent( "[", exp, "]", CATCH2 ) ),
                             opt(binop,exp)
                            ),
syntax["exp"] // earlier exp ('ref' would make a loop!)
                        );
}



Reply | Threaded
Open this post in threaded view
|

Re: introducing luaSuper

David Manura
Asko Kauppi writes:
> The work on luaSuper is approaching release, and I would like to  
> share some of it here

I tend to agree with what Fabien wrote below in the "Lua is not skin deep"
thread in favor of syntax mods being expressed in Lua rather than C/C++ for the
practical reason of reducing maintenance costs.  Ease of use is one of your
stated design goals after all.  Still, there are some advantages to patching the
parser in C, and maybe your approach could reduce the complexity of that.

Fabien writes:
> I'm obviously biased [toward] metalua here, but it seems to me
> that source parsing can always be taken out of the critical
> optimization path, so it ought to be dealt with in Lua rather
> than in portable assembler, if it is intended to be tweaked by
> users. C maintenance costs are much higher than lua's, so it
> makes sense to patch the VM, but not the compiler IMO. 

Asko Kauppi writes:
> This is how it goes:
> 	luas -s select demo.lua	# loads demo.lua with syntax mods from  
> 'luas_select.so'
> or, one can make demo.lua load the syntax mods automatically, by  
> having the following shebang line:
> 	#!/.../luas -s select

Here, we need to compile each syntax mod to a shared object in a platform
dependent way.  Then we need to specify it on the command line or edit a
system-dependent absolute path in the script.  It would seem to me that each
module should itself know which syntax mod(s) it needs, possibly specified with
a compiler directive, and the interpreter should know how to find them:

  -- mycode.lua
  #use select
  local oc = require "othercode"
  function test(...)
    return #..., ...[1], oc()
  end
  print(test(2,3,4))
  print(test(2,3,4))

  -- othercode.lua
  #use incrementops
  local x = 0
  return function()
    x += 5
    return x
  end

  $ LUA_CPATH=... luas mycode.lua

> - made in C++ (about 2500 lines, 4 .cpp files)
> - parsing speed should be similar to regular Lua parser,
>   _even_ when the syntax is modified...
> Performance and ease of making syntax modifications have been the
> design goals

Could the syntax mod descriptions be written in Lua, using Lua as a data
description language, but processed in C/C++ (like what LPeg does), while still
maintaining performance?  Those writing code in Lua must have already accepted
the performance of Lua.


Reply | Threaded
Open this post in threaded view
|

Re: introducing luaSuper

Asko Kauppi

mycode.lua:

#!.../luas -s select
  local oc = require "othercode"
  function test(...)
    return #..., ...[1], oc()
  end
  print(test(2,3,4))
  print(test(2,3,4))

othercode.lua:

#!.../luas -s incrementops
  local x = 0
  return function()
    x += 5
    return x
  end

When running these files, luaSuper actually reads the shebang line, and loads the -s syntax mods for that file (and only for that file). This way, we don't have to define any new "meta-require" mechanisms. Loading of automatic -l libraries could be done the same way, but they seem to be in the diminishing lane anyways, now that 'require' is there.

LUA_SPATH env.var can be used to point where syntax mod units are placed, with same syntax as LUA_PATH and LUA_CPATH already use.

Could the syntax mod descriptions be written in Lua, using Lua as a data description language, but processed in C/C++ (like what LPeg does), while still maintaining performance? Those writing code in Lua must have already accepted
the performance of Lua.

I've had a working Lua setup about this for about a year, but the parsing performance was never good enough to take it "seriously". Then someone mentioned why couldn't the syntax extensions be made in C, and of course they can. I don't see much added value in making syntax mods in Lua; the C code does have access to the Lua state however, which is important for certain kinds of mods (macros etc.).

-asko


David Manura kirjoitti 3.11.2007 kello 4:00:

Asko Kauppi writes:
The work on luaSuper is approaching release, and I would like to
share some of it here

I tend to agree with what Fabien wrote below in the "Lua is not skin deep" thread in favor of syntax mods being expressed in Lua rather than C/ C++ for the practical reason of reducing maintenance costs. Ease of use is one of your stated design goals after all. Still, there are some advantages to patching the parser in C, and maybe your approach could reduce the complexity of that.

Fabien writes:
I'm obviously biased [toward] metalua here, but it seems to me
that source parsing can always be taken out of the critical
optimization path, so it ought to be dealt with in Lua rather
than in portable assembler, if it is intended to be tweaked by
users. C maintenance costs are much higher than lua's, so it
makes sense to patch the VM, but not the compiler IMO.

Asko Kauppi writes:
This is how it goes:
	luas -s select demo.lua	# loads demo.lua with syntax mods from
'luas_select.so'
or, one can make demo.lua load the syntax mods automatically, by
having the following shebang line:
	#!/.../luas -s select

Here, we need to compile each syntax mod to a shared object in a platform dependent way. Then we need to specify it on the command line or edit a system-dependent absolute path in the script. It would seem to me that each module should itself know which syntax mod(s) it needs, possibly specified with a compiler directive, and the interpreter should know how to find them:

  -- mycode.lua
  #use select
  local oc = require "othercode"
  function test(...)
    return #..., ...[1], oc()
  end
  print(test(2,3,4))
  print(test(2,3,4))

  -- othercode.lua
  #use incrementops
  local x = 0
  return function()
    x += 5
    return x
  end

  $ LUA_CPATH=... luas mycode.lua

- made in C++ (about 2500 lines, 4 .cpp files)
- parsing speed should be similar to regular Lua parser,
  _even_ when the syntax is modified...
Performance and ease of making syntax modifications have been the
design goals

Could the syntax mod descriptions be written in Lua, using Lua as a data description language, but processed in C/C++ (like what LPeg does), while still maintaining performance? Those writing code in Lua must have already accepted
the performance of Lua.



Reply | Threaded
Open this post in threaded view
|

Re: introducing luaSuper

steve donovan
Asko:
> parsing performance was never good enough to take it "seriously".
> Then someone mentioned why couldn't the syntax extensions be made in
> C, and of course they can. I don't see much added value in making
> syntax mods in Lua;

The cool thing about token filters is that you are using the Lua lexer
to do the hard stuff. Raw Lua is not good at tokenizing, its string
operations don't match the need for operations on a stream of
characters. With token filters I'm finding very respectable
performance with Lua-based syntax extensions, in fact I had to
generate some artificial tests to notice any delay at all.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: introducing luaSuper

Fabien-3
In reply to this post by Asko Kauppi
On 11/4/07, Asko Kauppi <[hidden email]> wrote:
> Could the syntax mod descriptions be written in Lua, using Lua as a
> data description language, but processed in C/C++ (like what LPeg does),
> while still maintaining performance?  Those writing code in Lua must have
> already accepted the performance of Lua.

I've had a working Lua setup about this for about a year, but the
parsing performance was never good enough to take it "seriously".
Then someone mentioned why couldn't the syntax extensions be made in
C, and of course they can. I don't see much added value in making
syntax mods in Lua; the C code does have access to the Lua state
however, which is important for certain kinds of mods (macros etc.).

* What are the actual cases where you found Lua-based parsing to be too slow?

* Is this unacceptable lack of parsing speed observed in general circumstances (several metaprogramming tools, several kinds of extensions, used for several target programs), or has it been merely observed while using token filters at what they aren't designed for (reshaping/extending the syntax in depth)?

* In these problematic cases, why wasn't pre-compilation a sensible option?

* Most importantly, if you can't see a substantial productivity gain by working in Lua rather than C, why are you using Lua at all? Unless you're sticking to surface syntax mods that would be addressed very adequately by token filters, compilation is the poster child of problems benefiting from high level language features. That's why most of ICFP contest challenges are about compilation...

 parsing performance was never good enough to take it "seriously".

I would bet that what isn't taken seriously isn't the parsing performance (who still cares about that on multicore platforms? Even when targetting embedded devices, your development platform has several times more horsepower than required). Fiddling with "skin-deep" syntax does more harm than good 99% of the time: speaking the same language as your fellow developers is way more important than having your "end" keywords replaced by braces or significant indentation, or being able to increment with "++". So not only developers won't accept some additional clutter in their compilation chain to handle such details, they'll also fly away from a platform which neglects human interoperability at such an hard-to-get-wrong level.

Code maintenance is mainly about guessing what your predecessors meant while coding, so gratuitous idiosyncrasies are a time bomb you leave to your successors. New syntaxes are only legitimate when they support new ways of thinking. If you don't feel like they deserve a chapter of conceptual explanations, they're probably not worth it.

Reply | Threaded
Open this post in threaded view
|

Re: introducing luaSuper

Asko Kauppi

You might be right.  I was merely expecting that something a magnitude or so slower than regular parser (I did run tests with the Lua implementation I had about a year ago) would never be considered seriously as an alternative to the built-in parser. Maybe I've been wrong, and should have given that version more publicity.

Anyways, it seems now best to just get the code out, and let people play with it if they want. Our approaches are different, and maybe all this is good for the next phase of Lua development.

-asko



Fabien kirjoitti 5.11.2007 kello 1:12:

On 11/4/07, Asko Kauppi <[hidden email]> wrote:
> Could the syntax mod descriptions be written in Lua, using Lua as a
> data description language, but processed in C/C++ (like what LPeg does),
> while still maintaining performance?  Those writing code in Lua must have
> already accepted the performance of Lua.

I've had a working Lua setup about this for about a year, but the
parsing performance was never good enough to take it "seriously".
Then someone mentioned why couldn't the syntax extensions be made in
C, and of course they can. I don't see much added value in making
syntax mods in Lua; the C code does have access to the Lua state
however, which is important for certain kinds of mods (macros etc.).

* What are the actual cases where you found Lua-based parsing to be too slow?

* Is this unacceptable lack of parsing speed observed in general circumstances (several metaprogramming tools, several kinds of extensions, used for several target programs), or has it been merely observed while using token filters at what they aren't designed for (reshaping/extending the syntax in depth)?

* In these problematic cases, why wasn't pre-compilation a sensible option?

* Most importantly, if you can't see a substantial productivity gain by working in Lua rather than C, why are you using Lua at all? Unless you're sticking to surface syntax mods that would be addressed very adequately by token filters, compilation is the poster child of problems benefiting from high level language features. That's why most of ICFP contest challenges are about compilation...

 parsing performance was never good enough to take it "seriously".

I would bet that what isn't taken seriously isn't the parsing performance (who still cares about that on multicore platforms? Even when targetting embedded devices, your development platform has several times more horsepower than required). Fiddling with "skin-deep" syntax does more harm than good 99% of the time: speaking the same language as your fellow developers is way more important than having your "end" keywords replaced by braces or significant indentation, or being able to increment with "++". So not only developers won't accept some additional clutter in their compilation chain to handle such details, they'll also fly away from a platform which neglects human interoperability at such an hard-to-get-wrong level.

Code maintenance is mainly about guessing what your predecessors meant while coding, so gratuitous idiosyncrasies are a time bomb you leave to your successors. New syntaxes are only legitimate when they support new ways of thinking. If you don't feel like they deserve a chapter of conceptual explanations, they're probably not worth it.