Are Lua's own lexer/parser suitable for writing a syntax highlighter?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Steven Degutis
Are Lua's llex.h and lparser.h files suitable for me to use to build a
structure that I can then traverse to do my syntax highlighting in a
rich text field?

If so, is there any documentation anywhere on using functionality
within these files for this purpose?

Note: This is for educational reasons, so while I could just use a
third party library, it defeats the point.

-Steven

Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Luiz Henrique de Figueiredo
> Are Lua's llex.h and lparser.h files suitable for me to use to build a
> structure that I can then traverse to do my syntax highlighting in a
> rich text field?

For syntax highlighting, you don't need the parser, just the lexer, right?

> If so, is there any documentation anywhere on using functionality
> within these files for this purpose?

There is no official documentation except the source.

The only interface that should interest you is llex and the SemInfo
struct. To see how these work, search the archives for "proxy.c" and
"token filter". See for instance
        http://lua-users.org/lists/lua-l/2012-03/msg00309.html

See also my recent rewrite of my lstrip as a token filter:
        http://lua-users.org/lists/lua-l/2014-07/msg00058.html

The code in lstrip should be a good place to start. The only problem is
that you won't be able to run Lua programs since the token filter in
lstrip eats all tokens and the parser sees an empty stream.

You may use a global variable to switch off the token filter and then
allow your app to run Lua programs. This should be easy to do.

Feel free to contact me about all this.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Peng Zhicheng
In reply to this post by Steven Degutis
On 09/10/2014 03:21 AM, Steven Degutis wrote:

> Are Lua's llex.h and lparser.h files suitable for me to use to build a
> structure that I can then traverse to do my syntax highlighting in a
> rich text field?
>
> If so, is there any documentation anywhere on using functionality
> within these files for this purpose?
>
> Note: This is for educational reasons, so while I could just use a
> third party library, it defeats the point.
>
> -Steven
>
I read the source code a few times. I feel the lexer should be easy to
adapt for such usage. but the parser is not, imho.

the Lua parser has tight connection with the VM code generator. the parser
generates the coresponding VM instructions once it has recognized some
syntactic constructs. it won't build a intermediate data structure such as
an AST.

yet from another point of view, Lua's parser is well written, thus it is not
hard to understand. so I suppose it not be difficult to write your own parsing
code to suite your needs.

Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Steven Degutis
In reply to this post by Luiz Henrique de Figueiredo
> For syntax highlighting, you don't need the parser, just the lexer, right?

I was under the impression that the parser just gives me more
information about the file, so that I can color things more
specifically, e.g. it may recognize an identifier as being a local
variable or a global, so I can color them differently.

But perhaps this is not the case. I have not heavily looked into the
source code of Lua's parser or lexer yet.

> The only interface that should interest you is llex and the SemInfo
> struct. To see how these work, search the archives for "proxy.c" and
> "token filter". See for instance
>         http://lua-users.org/lists/lua-l/2012-03/msg00309.html
>
> See also my recent rewrite of my lstrip as a token filter:
>         http://lua-users.org/lists/lua-l/2014-07/msg00058.html
>
> The code in lstrip should be a good place to start.

Thank you! I will read your references.

> The only problem is that you won't be able to run Lua programs
> since the token filter in lstrip eats all tokens and the parser sees
> an empty stream.
>
> You may use a global variable to switch off the token filter and then
> allow your app to run Lua programs. This should be easy to do.

Would creating a new private lua_State for my syntax-highlighter's be
a reasonable way to work around this?

-Steven

Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Luiz Henrique de Figueiredo
> I was under the impression that the parser just gives me more
> information about the file, so that I can color things more
> specifically, e.g. it may recognize an identifier as being a local
> variable or a global, so I can color them differently.

Oh, right, you need a parser for that. But you may be able to hook into
the parser to get this info. See
        http://lua-users.org/lists/lua-l/2011-06/msg00091.html
 
> Would creating a new private lua_State for my syntax-highlighter's be
> a reasonable way to work around this?

No. You'd have to use a patched Lua and this will affect all Lua programs
that you load. You really need to turn off the token filter globally.

Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

KHMan
In reply to this post by Steven Degutis
On 9/10/2014 3:21 AM, Steven Degutis wrote:
> Are Lua's llex.h and lparser.h files suitable for me to use to build a
> structure that I can then traverse to do my syntax highlighting in a
> rich text field?
>
> If so, is there any documentation anywhere on using functionality
> within these files for this purpose?
>
> Note: This is for educational reasons, so while I could just use a
> third party library, it defeats the point.

If you just convert the lexer, then you can classify many types of
tokens and highlight accordingly. The result will be roughly
equivalent to the kind of output that SciTE or Notepad++
(Scintilla-based editors) provides. (For smarter editors this may
not be enough.)

I have done one using a Lua 5.1 llex more-or-less work-alike,
heavily modified to use Lua shortcuts rather than C-style coding.
The text-to-HTML function clocks in at 208 SLOC, plus an
additional small function to generate some HTML entities.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia


Reply | Threaded
Open this post in threaded view
|

Re: Are Lua's own lexer/parser suitable for writing a syntax highlighter?

Jean-Luc Jumpertz-2
In reply to this post by Steven Degutis

Le 10 sept. 2014 à 04:43, Steven Degutis <[hidden email]> a écrit :

>> For syntax highlighting, you don't need the parser, just the lexer, right?
>
> I was under the impression that the parser just gives me more
> information about the file, so that I can color things more
> specifically, e.g. it may recognize an identifier as being a local
> variable or a global, so I can color them differently.
>
> But perhaps this is not the case. I have not heavily looked into the
> source code of Lua's parser or lexer yet.

If you need more information than what the lexer provides, you may be interested by LuaSyntaxer, valuable at  https://bitbucket.org/jean_luc/luasyntaxer

Extract from the README:

> Lua Syntaxer adds syntax analyzing capabilities to Lua 5.2 at the C API level.
>
> Syntax analysis is callback-based. It is performed in a lua_State by calling a single function lua_parser(). This function take a notifyfunction callback parameter which is called each time the parser has discovered significant syntax information in the analyzed Lua source code chunk.
> The internal parser is closely based on Lua's own syntax parser lparser.c. This provides a good level of confidence that the Lua syntax structure reported by Lua Syntaxer will be identical to the interpretation of this program by Lua byte code compiler.
>
> Lua Syntaxer does not built the AST for the analyzed code chunk. It is intended to be a low-level utility on top of which programmers can build an AST with the appropriate structure matching their own needs. As such, Lua Syntaxer can be used for implementing a Lua syntax-aware text editor, a code static analysis tool …

If you want to start from the official  lexer, you will need to do at least the following changes  (you can see the corresponding code in the llex.h and llex.m files in LuaSyntaxer):
- add a token comment TK_COMMENT to the defined tokens and notify comments token in function llex(), instead of just skipping the comments;
- associate a character range with each returned token, which is more handy for syntax highlighting than just a line number. :-)

Jean-Luc