OR, quantifier support in Lua patterns

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

OR, quantifier support in Lua patterns

Sai Manoj Kumar Yadlapati
Hi all,

Lua supports its own version of regular expression matching.
But it doesn't have the | (pipe symbol) support and the quantifier support - a{1,5} meaning a can occur anywhere from 1 to 5 times.

Both of these are present in PCRE. I am curious to know why these are not supported.Is it not supported intentionally or was it never considered?

Thanks
Sai Manoj

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Andrew Gierth
>>>>> "Sai" == Sai Manoj Kumar Yadlapati <[hidden email]> writes:

 Sai> Hi all,
 Sai> Lua supports its own version of regular expression matching.

Well, to be precise it supports a pattern-matching function that falls a
long way short of regular expressions.

 Sai> But it doesn't have the | (pipe symbol) support and the quantifier
 Sai> support - a{1,5} meaning a can occur anywhere from 1 to 5 times.

 Sai> Both of these are present in PCRE. I am curious to know why these
 Sai> are not supported.Is it not supported intentionally or was it
 Sai> never considered?

Maybe this answers your question:

% size liblua-5.3.so libpcre.so
    text   data   bss      dec       hex   filename
  236048   6457     0   242505   0x3b349   liblua-5.3.so
  483084   1237   152   484473   0x76479   libpcre.so

i.e. PCRE is nearly double the size of the entirety of Lua. (Even a
relatively minimal POSIX regexp implementation would be 2.5 times the
size of the Lua string library - ~50kB vs. ~20kB on my system.)

You can use LPEG instead (which is even more powerful than regular
expressions though has a bit of a learning curve), or if you're not
worried about size then there's a Lua binding for PCRE.

--
Andrew.

Jim
Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Jim
In reply to this post by Sai Manoj Kumar Yadlapati
On Thu, Sep 27, 2018 at 6:40 AM Sai Manoj Kumar Yadlapati
<[hidden email]> wrote:
> Lua supports its own version of regular expression matching.
> But it doesn't have the | (pipe symbol) support and the quantifier support - a{1,5} meaning a can occur anywhere from 1 to 5 times.
>
> Both of these are present in PCRE. I am curious to know why these are not supported.Is it not supported intentionally or was it
> never considered?

this is a very useful and often needed feature to add to Lua's builtin patterns.
i also would like to see it added and had that topic also on my
wishlist of items
that should be added to Lua.

"|" grouping of alternatives is not only present in PCRE, but also in
POSIX regex,
contained in the libc of unix systems and hence usable without the need of
extra libs like PCRE.

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Lorenzo Donati-3
In reply to this post by Sai Manoj Kumar Yadlapati
On 27/09/2018 06:39, Sai Manoj Kumar Yadlapati wrote:

> Hi all,
>
> Lua supports its own version of regular expression matching.
> But it doesn't have the | (pipe symbol) support and the quantifier support
> - a{1,5} meaning a can occur anywhere from 1 to 5 times.
>
> Both of these are present in PCRE. I am curious to know why these are not
> supported.Is it not supported intentionally or was it never considered?
>
> Thanks
> Sai Manoj
>

To reinforce what Andrew said in his reply: please note that Lua
patterns are NOT regular expressions. That is they haven't got the same
expressive power as regexes, and that's /by design/. The goal was/is to
keep Lua size small.

I can't say if implementing alternation (i.e. that OR operator) will
increase Lua size by much, but I suspect it will.

-- Lorenzo

Jim
Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Jim
On 9/29/18, Lorenzo Donati <[hidden email]> wrote:
> that's /by design/. The goal was/is to keep Lua size small.
>
> I can't say if implementing alternation (i.e. that OR operator) will
> increase Lua size by much, but I suspect it will.

how comes that squirrel has them ?
does that make squirrel NOT small ?
or has that more to do with it being written in c++ ?
(which was an unnecessary mistake imo)

btw: squirrel separated the interpreter and its std libs
into 2 different c libs which looks like a good idea to me.

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Lorenzo Donati-3
On 30/09/2018 19:25, Jim wrote:

> On 9/29/18, Lorenzo Donati <[hidden email]> wrote:
>> that's /by design/. The goal was/is to keep Lua size small.
>>
>> I can't say if implementing alternation (i.e. that OR operator) will
>> increase Lua size by much, but I suspect it will.
>
> how comes that squirrel has them ?
> does that make squirrel NOT small ?
> or has that more to do with it being written in c++ ?
> (which was an unnecessary mistake imo)
>

I don't know squirrel, so I can't say. Anyway, did you compare the size
(both source and executable) of Lua with those of squirrel?

If squirrel can implement a full regex engine in less space than Lua, it
could be worth pointing that out to Lua team.

On the other hand I suspect squirrel regex engine may be implemented
using C++ regex classes, so the implementation could be very terse in
squirrel source. Moreover even executable size could be smaller because
the object code of the C++ regex engine could reside in some
system/platform DLL, if not linked statically into the squirrel interpreter.

Keep in mind that Lua is written in very portable C (almost all C89,
some few parts C99) and its pattern facility is built into the source,
so it can be compiled on any system with a barebone C compiler. Lua
/can/ be compiled as C++, but it doesn't use any C++ library facility
that is not also in a C library.

If squirrel regex implementation relies on C++-specific libraries,
comparing it to Lua is not actually fair: you should compare it against
Lua /together with/ a regex engine binding, like a PCRE binding, instead.

> btw: squirrel separated the interpreter and its std libs
> into 2 different c libs which looks like a good idea to me.
>
>

If you really don't need some Lua library you can compile a version of
Lua interpreter disabling some of them. The fact that this is not done
by default at the Lua code level seen by the interpreter depends on the
fact that Lua is primarily an engine to be embedded in some custom C
code, the so called "C application" (this is by design). So a C
programmer can choose which library to include in the compilation anyway.

Moreover, if Lua is compiled as a DLL on a PC-class machine, the C
application can be small by simply linking to Lua dynamically.

If you really need Lua code statically linked to your C code, then you
can customize what parts of Lua you really need anyway.

The standard interpreter is just a very lightweight C application that
happens to embed a Lua engine. Lua "the language" wasn't designed to be
run only in the context of a command line interpreter.


Jim
Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Jim
On 10/1/18, Lorenzo Donati <[hidden email]> wrote:
> I don't know squirrel, so I can't say. Anyway, did you compare the size
> (both source and executable) of Lua with those of squirrel?
well, yes, squirrel(-lang.org) is in deed bigger:
static libs:
interpreter/vm lib:
-rw-r--r-- 1 root root 523K May 27  2016 /usr/local/lib/libsquirrel_static.a
its std lib (which is in a separate lib):
rw-r--r-- 1 root root 138K May 27  2016 /usr/local/lib/libsqstdlib_static.a
vs:
-rw-r--r-- 1 root root 423K Jul 24 22:52 /usr/local/lib64/liblua.a
for lua interpreter + stdlib

binaries:
-rwxr-xr-x 1 root root  20K May 27  2016 /usr/local/bin/sq*
-rwxr-xr-x 1 root root 415K May 27  2016 /usr/local/bin/sq_static*
(the interpreter/vm bin is also used for compiling bytecode, no
separate binary necessary)

vs lua:
-rwxr-xr-x 1 root root 213K Jul 24 22:14 /usr/local/bin/lua*
-rwxr-xr-x 1 root root 144K Jul 24 22:14 /usr/local/bin/luac*

> On the other hand I suspect squirrel regex engine may be implemented
> using C++ regex classes, so the implementation could be very terse in
> squirrel source.
in deed, i have not considered this as i am no c++ user and try to avoid
the crap at all costs, but under windows heavy usage of c++ seems to be
the norm.

but the squirrel author has implemented a "tiny regex lib" in ansi C
(T-Rex is a minimalistic regular expression library written in ANSI C)
because he "couldn't find any free regular expression library that wasn't huge
and bloated, while most of the time he needed just basic functionalities"
as he wrote on
http://www.demichelis.net/default.aspx?content=projects&template=projects

a quick look into the squirrel sources reveals that he has implemented the
squirrel regex functions in sqstdlib/sqstdrex.cpp (663 lines) in procedural
c style without use of any c++ stdlib regex helper classes.

so does that really bloat and make squirrel BIG in any way ?
or is it just the usual cheap excuse as in "we cant bloat lua with binary/octal
integer literals" (which anyone else has of course) but have hex
integer literals,
since it does NOT bloat the language in any way and can't be done with
tonumber().

i am really tired of always the same lamenting.
we would not use lua if we had not already written thousands of lines
of binding
c code (which was a very stupid decision we bitterly regret by now).
the only reason that stopped us from using squirrel in the first place was
that is written in c++ (with all the dependencies that introduces
without any gain).
if squirrel could be rewritten in c we would use it instantly and port
all the c binding
code to it.

the squirrel c api is also much better designed.

> Keep in mind that Lua is written in very portable C (almost all C89,
> some few parts C99) and its pattern facility is built into the source,
> so it can be compiled on any system with a barebone C compiler. Lua
> /can/ be compiled as C++, but it doesn't use any C++ library facility
> that is not also in a C library.
well, the usual  lamento.
on unix you get posix regex and LOTS of other useful functions for FREE
by the c lib (which you have to use anyway) or as direct syscalls.

when building via "make linux" (for instance) you know what platform is used
and what it offers (at least std posix functions).
(you can also check feature macros if you prefer)
so instead of linking against big bloated crap like libreadline use what's
already in the c lib for FREE.

that could provide a table "regex" (or "re") that makes use of the c lib's
extended posix regex (which has to be there as required by the posix std.
btw: are the c++ std regex classes implemented using this c lib support ?)

there should be also a table "posix" (or "unix" or just "sys") that contains the
std posix functions (like chdir, mkdir, setenv and the like) and is a
metatable of
the "os" table.

that "posix" table should also have the metatable "linux" on Linux which should
contain Linux-only bindings (similar on freebsd, solaris etc)

> If squirrel regex implementation relies on C++-specific libraries,
which is not the case.

> Moreover, if Lua is compiled as a DLL on a PC-class machine, the C
> application can be small by simply linking to Lua dynamically.
> The standard interpreter is just a very lightweight C application that
> happens to embed a Lua engine. Lua "the language" wasn't designed to be
> run only in the context of a command line interpreter.
i totally understand that, but having above additional tables does not cost much
and should also be provided with the possibility of disabling them when they are
not needed (or even harmful) analogous to the tables/modules Lua
already provides.

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Sean Conner
It was thus said that the Great Jim once stated:

>
> but the squirrel author has implemented a "tiny regex lib" in ansi C
> (T-Rex is a minimalistic regular expression library written in ANSI C)
> because he "couldn't find any free regular expression library that wasn't huge
> and bloated, while most of the time he needed just basic functionalities"
> as he wrote on
> http://www.demichelis.net/default.aspx?content=projects&template=projects
>
> a quick look into the squirrel sources reveals that he has implemented the
> squirrel regex functions in sqstdlib/sqstdrex.cpp (663 lines) in procedural
> c style without use of any c++ stdlib regex helper classes.

  Well, one could wrap those regex functions into a Lua module so it's
available for you to use.

> so does that really bloat and make squirrel BIG in any way ? or is it just
> the usual cheap excuse as in "we cant bloat lua with binary/octal integer
> literals" (which anyone else has of course) but have hex integer literals,
> since it does NOT bloat the language in any way and can't be done with
> tonumber().

  That's for Luis and Roberto to answer.  For me, I never needed to use
octal, and I'm past the need for binary literals (I wouldn't mind them, but
I'm not lamenting their lack).

> i am really tired of always the same lamenting. we would not use lua if we
> had not already written thousands of lines of binding c code (which was a
> very stupid decision we bitterly regret by now). the only reason that
> stopped us from using squirrel in the first place was that is written in
> c++ (with all the dependencies that introduces without any gain). if
> squirrel could be rewritten in c we would use it instantly and port all
> the c binding code to it.

  Why was Lua picked in the first place if you now regret it?  Is is the
fact that Lua doesn't have real regex that makes it suck?  Or are there
other factors that make you regret the choice of Lua?

> the squirrel c api is also much better designed.

  Is this the language described by squirrel-lang.org?  Because if so, the
API seems very close to the Lua API (it downright seems Lua influenced the
design from what I can tell).  What is it about the Lua C API that sucks?
Or why is the squirrel one better?

  Because from my brief look, they seem very similar.

> > Keep in mind that Lua is written in very portable C (almost all C89,
> > some few parts C99) and its pattern facility is built into the source,
> > so it can be compiled on any system with a barebone C compiler. Lua
> > /can/ be compiled as C++, but it doesn't use any C++ library facility
> > that is not also in a C library.

> well, the usual  lamento. on unix you get posix regex and LOTS of other
> useful functions for FREE by the c lib (which you have to use anyway) or
> as direct syscalls.

  In some respects yes.  In other respects no. On Linux you need to link
with pthreads of you use that; not so on other systems.  On Solaris you need
to link with nt if you want to use the network API (socket(), bind(),
accept(), etc) but no so with other Unix systems.  On Windows, POSIX isn't
part of the C library (although I could be wrong, but I would find it
surprising).
 
> when building via "make linux" (for instance) you know what platform is
> used and what it offers (at least std posix functions). (you can also
> check feature macros if you prefer) so instead of linking against big
> bloated crap like libreadline use what's already in the c lib for FREE.
>
> that could provide a table "regex" (or "re") that makes use of the c lib's
> extended posix regex (which has to be there as required by the posix std.
> btw: are the c++ std regex classes implemented using this c lib support ?)

  I don't know which Unix you are using, but the ones I've had experience
with never came with regex "for free" (as part of libc).  

> there should be also a table "posix" (or "unix" or just "sys") that
> contains the std posix functions (like chdir, mkdir, setenv and the like)
> and is a metatable of the "os" table.
>
> that "posix" table should also have the metatable "linux" on Linux which
> should contain Linux-only bindings (similar on freebsd, solaris etc)

  There are Lua modules that provide such functionality but I'm guessing you
want those built into the base Lua distribution.  Roberto and Luis have
different priorities; if you agree with them, use Lua.  If you don't, don't
use Lua.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Dirk Laurie-2
Op Di., 2 Okt. 2018 om 06:11 het Sean Conner <[hidden email]> geskryf:
>
> It was thus said that the Great Jim once stated:
<troll_alert>
> > there should be also a table "posix" (or "unix" or just "sys") that
> > contains the std posix functions (like chdir, mkdir, setenv and the like)
> > and is a metatable of the "os" table.
> >
> > that "posix" table should also have the metatable "linux" on Linux which
> > should contain Linux-only bindings (similar on freebsd, solaris etc)
</troll_alert>

"there should be" is not an acceptable way of making a dubious
suggestion more plausible.

> There are Lua modules that provide such functionality but I'm guessing you
> want those built into the base Lua distribution.  Roberto and Luis have
> different priorities; if you agree with them, use Lua.  If you don't, don't
> use Lua.

Or have a standard set of patches that moulds your personal Lua
version to your liking.
E.g. on my machine "lua -l lexer=pl.lexer" is legal.

-- Dirk

Jim
Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Jim
On 10/2/18, Dirk Laurie <[hidden email]> wrote:
> "there should be" is not an acceptable way of making a
why ? this should be there to make some use of it.

> dubious suggestion more plausible.
how is that dubious ? for me it's very clear.
the example i gave is also very clear and to the point.
could you please explain how it is dubious ?

Jim
Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Jim
In reply to this post by Sean Conner
On 10/2/18, Sean Conner <[hidden email]> wrote:
> It was thus said that the Great Jim once stated:
                                       ^^^^^^^ why are you trying to
ridicule me ?

>   Well, one could wrap those regex functions into a Lua module so it's
> available for you to use.
sure, that's what we will do.
how about directly using squirrel that has that already available ?

>   That's for Luis and Roberto to answer.  For me, I never needed to use
> octal, and I'm past the need for binary literals (I wouldn't mind them, but
> I'm not lamenting their lack).
we make heavy use of octal integer literals and that's absoluely nothing
very extraordinary or exotic in any way.

>   Why was Lua picked in the first place if you now regret it?  Is is the
> fact that Lua doesn't have real regex that makes it suck?  Or are there
> other factors that make you regret the choice of Lua?
we thougt it could be used as a scripting language, akin to perl, python, ruby
and all the others.

we failed to recognize that it was only designed for authors that use it as
a config language for their c program to avoid inventing one of their own.
Lua is obviously not designed to run scripts that do any real work.
it's just a toy, a study of how a config language could look, it's not made
for scripting like say perl or other scripting languages,
it simply was not made for this and thus obviously is not up for the task.
we thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.

it's a really poor design that changes with every release because something
is broken again or poorly thought out.
for instance: how dumb and clueless must one be to release a language that
has only floating point arithmetic in a time were 386, 486SX and other cpus
without an fpu were in wide usage ?
we dont use any floating point arithmetic btw., integer is enough for us.
was Lua designed as a new fortran that could also be used as a config language ?
and now look how many decades (!!!!) it took until someone decided that
integer arithmetic and bitwise ops wasn't that bad.

now look at the regex module, it doesnt work with lua5.3, so we dont have
even a binding for posix regex (that would suffice for us, no need for PCRE),
so we have to implement our own.

have a look at lua rocks that uses a collection of unmaintained garbage
laying and rotting around for years abondoned by the authors since they
stopped using lua for obvious reasons.
that crap is not even usable on linux, and its a total mess on solaris.
imagine perl. python or anyone else would deliver such poor tooling ...

>   Is this the language described by squirrel-lang.org?
well, obvously, mr. genius, that's why i mentioned it.

> Because if so, the
> API seems very close to the Lua API (it downright seems Lua influenced the
> design from what I can tell).  What is it about the Lua C API that sucks?
> Or why is the squirrel one better?

squirrel was based on Lua, the author tried to fix some of the main problems.
a brief look on its api is not enough, read and compare it point by point with
lua's api, then use it in some code and you will understand what i am about.

>   Because from my brief look, they seem very similar.
from a brief look lua might also look like a scripting language, but a
brief look
is not enough. use the squirrel api and you will see what i mean.

>   In some respects yes.  In other respects no. On Linux you need to link
> with pthreads of you use that; not so on other systems.  On Solaris you
> need
> to link with nt if you want to use the network API (socket(), bind(),
> accept(), etc) but no so with other Unix systems.  On Windows, POSIX isn't
> part of the C library (although I could be wrong, but I would find it
> surprising).

#include <unistd.h>

static int Sgetuid ( lua_State * const L )
{
   /* getuid() always succeeds, as required by posix */
   lua_pushinteger ( L, getuid () ) ;
   return 1 ;
}

thats the same on linux, solaris, aix, all of the bsds, and even crap
like macos X.
same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
a few. how does that complicate anything, how exactly does that
bring in pthreads, i missed the point, could you make this more clear
and enlighten us a bit ?

>   I don't know which Unix you are using, but the ones I've had experience
> with never came with regex "for free" (as part of libc).
well posix requires regex to be found in <regex.h>, so every unix has
them, from aix to the bsds, its in the libc that one has to use anyway.
so its there for FREE on all relevant unix platforms.

>   There are Lua modules that provide such functionality but I'm guessing
and dont work for any recent lua version ? like the regex module ?
> you want those built into the base Lua distribution.
exactly.

> different priorities; if you agree with them, use Lua.  If you don't, don't
> use Lua.
exactly,.

i figured out recently that the ruby c api is quite usable by hand, though i
dont like ruby and its oo style that is forced on all its users very much.
advantage is that it is a scripting language not just a config lang,
and scripting is what we are doing.
so we started porting our c bindigs and helper functions to ruby as an
interim workaround, buts thats not the end solution.

in the meantime we had a look at several script languages with a usable
c api, from older ones like forth (fth), (regina) rexx, tcl, to java script
implementations like duktape and mujs to newer inventions like ring-lang.net
but so far we have not found anything that pleases us and in the long run
we will have to implement our own solution as no existing tool is up to the
task (at least we did not found one).

that was exactly what we tried to avoid as we were not interested in
inventing the next extension lang (that only we and no one else will use)
and reinvent the wheel.

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Sean Conner
It was thus said that the Great Jim once stated:
> On 10/2/18, Sean Conner <[hidden email]> wrote:
> > It was thus said that the Great Jim once stated:
>                                        ^^^^^^^ why are you trying to
> ridicule me ?

  I've been using that opening line in email since the early 90s (and you
can check other messages I've sent on this list to see it in use, and this
is the first time it has received a negative repsonse to it.

> >   Well, one could wrap those regex functions into a Lua module so it's
> > available for you to use.
>
> sure, that's what we will do. how about directly using squirrel that has
> that already available ?

  Because this is the *Lua* mailing list, not the *Squirrel* mailing list?

> >   Why was Lua picked in the first place if you now regret it?  Is is the
> > fact that Lua doesn't have real regex that makes it suck?  Or are there
> > other factors that make you regret the choice of Lua?
>
> we thougt it could be used as a scripting language, akin to perl, python,
> ruby and all the others.
>
> we failed to recognize that it was only designed for authors that use it
> as a config language for their c program to avoid inventing one of their
> own. Lua is obviously not designed to run scripts that do any real work.

  That's news to me, since I use Lua to process SIP messages for Verizon
Wireless.  It currently handles around 60,000,000 messages per day without
issue (and we expect that level to rise 10-fold over the next year).

> it's just a toy, a study of how a config language could look, it's not
> made for scripting like say perl or other scripting languages, it simply
> was not made for this and thus obviously is not up for the task. we
> thought having an interpreter around was for interpreting scripts as the
> perl interpreter for instance does.

  That would also be news to wireshark users, as Lua is used there.  It's
also used in multiple online games as a scripting language.  Oh, and Redis
also uses Lua for scripting.  Guess we're all deluding ourselves into
thinking Lua is a programing language.

> it's a really poor design that changes with every release because something
> is broken again or poorly thought out.
> for instance: how dumb and clueless must one be to release a language that
> has only floating point arithmetic in a time were 386, 486SX and other cpus
> without an fpu were in wide usage ?

  Netscape did the exact same thing with Javascript back in the 90s, and it
still only supports floating point.  It's not neccessarily a *bad* design
choice, given at the time systems were 32-bit and one can easily do 52-bit
integer arithmatic with IEEE-754 floating point (I know there are systems
with non-IEEE-754 floating point but they tend to be rare, or were designed
prior to 1985 when IEEE-754 standard was released).  Doing that means you
only have one numeric type to support.  It's only with the rise in 64-bit
CPUs that such a design becomes problematic and why it was changed for Lua
5.3.

> >   Is this the language described by squirrel-lang.org?
> well, obvously, mr. genius, that's why i mentioned it.

  I wanted to make sure I had the right references.

> > Because if so, the
> > API seems very close to the Lua API (it downright seems Lua influenced the
> > design from what I can tell).  What is it about the Lua C API that sucks?
> > Or why is the squirrel one better?
>
> squirrel was based on Lua, the author tried to fix some of the main problems.
> a brief look on its api is not enough, read and compare it point by point with
> lua's api, then use it in some code and you will understand what i am about.
>
> >   Because from my brief look, they seem very similar.
> from a brief look lua might also look like a scripting language, but a
> brief look
> is not enough. use the squirrel api and you will see what i mean.
>
> >   In some respects yes.  In other respects no. On Linux you need to link
> > with pthreads of you use that; not so on other systems.  On Solaris you
> > need
> > to link with nt if you want to use the network API (socket(), bind(),
> > accept(), etc) but no so with other Unix systems.  On Windows, POSIX isn't
> > part of the C library (although I could be wrong, but I would find it
> > surprising).
>
> #include <unistd.h>
>
> static int Sgetuid ( lua_State * const L )
> {
>    /* getuid() always succeeds, as required by posix */
>    lua_pushinteger ( L, getuid () ) ;
>    return 1 ;
> }
>
> thats the same on linux, solaris, aix, all of the bsds, and even crap
> like macos X.
> same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
> a few. how does that complicate anything, how exactly does that
> bring in pthreads, i missed the point, could you make this more clear
> and enlighten us a bit ?

  Okay, round two.  If I have a program that makes use of pthreads, on
Solaris it comes "for free" (your terms) in libc.  On Linux, the pthreads
API is NOT in libc, so it's not "for free" in that reguard---you have to
link with libpthread.

  if I have a program that uses socket(), bind(), accept(), listen(), etc.
(the Berkeley sockets API), those calls come "for free" on Linux---they're
part of libc.  On Solaris, they are not "for free"---they are not part of
libc and you are required to link against libnt.

  On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.

> >   I don't know which Unix you are using, but the ones I've had
> > experience with never came with regex "for free" (as part of libc).

> well posix requires regex to be found in <regex.h>, so every unix has
> them, from aix to the bsds, its in the libc that one has to use anyway. so
> its there for FREE on all relevant unix platforms.

  Again, not always so in my exerience.

> >   There are Lua modules that provide such functionality but I'm guessing
> and dont work for any recent lua version ? like the regex module ?
> > you want those built into the base Lua distribution.
> exactly.
>
> > different priorities; if you agree with them, use Lua.  If you don't, don't
> > use Lua.
> exactly,.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Russell Haley


On Tue, Oct 2, 2018 at 4:44 PM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Jim once stated:
> On 10/2/18, Sean Conner <[hidden email]> wrote:
> > It was thus said that the Great Jim once stated:
>                                        ^^^^^^^ why are you trying to
> ridicule me ?

  I've been using that opening line in email since the early 90s (and you
can check other messages I've sent on this list to see it in use, and this
is the first time it has received a negative repsonse to it.

I was devastated when I realized you answered everyone like that. I thought someone had finally recognized my brilliance. ;)

Russ

> >   Well, one could wrap those regex functions into a Lua module so it's
> > available for you to use.
>
> sure, that's what we will do. how about directly using squirrel that has
> that already available ?

  Because this is the *Lua* mailing list, not the *Squirrel* mailing list?

> >   Why was Lua picked in the first place if you now regret it?  Is is the
> > fact that Lua doesn't have real regex that makes it suck?  Or are there
> > other factors that make you regret the choice of Lua?
>
> we thougt it could be used as a scripting language, akin to perl, python,
> ruby and all the others.
>
> we failed to recognize that it was only designed for authors that use it
> as a config language for their c program to avoid inventing one of their
> own. Lua is obviously not designed to run scripts that do any real work.

  That's news to me, since I use Lua to process SIP messages for Verizon
Wireless.  It currently handles around 60,000,000 messages per day without
issue (and we expect that level to rise 10-fold over the next year).

> it's just a toy, a study of how a config language could look, it's not
> made for scripting like say perl or other scripting languages, it simply
> was not made for this and thus obviously is not up for the task. we
> thought having an interpreter around was for interpreting scripts as the
> perl interpreter for instance does.

  That would also be news to wireshark users, as Lua is used there.  It's
also used in multiple online games as a scripting language.  Oh, and Redis
also uses Lua for scripting.  Guess we're all deluding ourselves into
thinking Lua is a programing language.

> it's a really poor design that changes with every release because something
> is broken again or poorly thought out.
> for instance: how dumb and clueless must one be to release a language that
> has only floating point arithmetic in a time were 386, 486SX and other cpus
> without an fpu were in wide usage ?

  Netscape did the exact same thing with Javascript back in the 90s, and it
still only supports floating point.  It's not neccessarily a *bad* design
choice, given at the time systems were 32-bit and one can easily do 52-bit
integer arithmatic with IEEE-754 floating point (I know there are systems
with non-IEEE-754 floating point but they tend to be rare, or were designed
prior to 1985 when IEEE-754 standard was released).  Doing that means you
only have one numeric type to support.  It's only with the rise in 64-bit
CPUs that such a design becomes problematic and why it was changed for Lua
5.3.

> >   Is this the language described by squirrel-lang.org?
> well, obvously, mr. genius, that's why i mentioned it.

  I wanted to make sure I had the right references.

> > Because if so, the
> > API seems very close to the Lua API (it downright seems Lua influenced the
> > design from what I can tell).  What is it about the Lua C API that sucks?
> > Or why is the squirrel one better?
>
> squirrel was based on Lua, the author tried to fix some of the main problems.
> a brief look on its api is not enough, read and compare it point by point with
> lua's api, then use it in some code and you will understand what i am about.
>
> >   Because from my brief look, they seem very similar.
> from a brief look lua might also look like a scripting language, but a
> brief look
> is not enough. use the squirrel api and you will see what i mean.
>
> >   In some respects yes.  In other respects no. On Linux you need to link
> > with pthreads of you use that; not so on other systems.  On Solaris you
> > need
> > to link with nt if you want to use the network API (socket(), bind(),
> > accept(), etc) but no so with other Unix systems.  On Windows, POSIX isn't
> > part of the C library (although I could be wrong, but I would find it
> > surprising).
>
> #include <unistd.h>
>
> static int Sgetuid ( lua_State * const L )
> {
>    /* getuid() always succeeds, as required by posix */
>    lua_pushinteger ( L, getuid () ) ;
>    return 1 ;
> }
>
> thats the same on linux, solaris, aix, all of the bsds, and even crap
> like macos X.
> same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
> a few. how does that complicate anything, how exactly does that
> bring in pthreads, i missed the point, could you make this more clear
> and enlighten us a bit ?

  Okay, round two.  If I have a program that makes use of pthreads, on
Solaris it comes "for free" (your terms) in libc.  On Linux, the pthreads
API is NOT in libc, so it's not "for free" in that reguard---you have to
link with libpthread.

  if I have a program that uses socket(), bind(), accept(), listen(), etc.
(the Berkeley sockets API), those calls come "for free" on Linux---they're
part of libc.  On Solaris, they are not "for free"---they are not part of
libc and you are required to link against libnt.

  On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.

> >   I don't know which Unix you are using, but the ones I've had
> > experience with never came with regex "for free" (as part of libc).

> well posix requires regex to be found in <regex.h>, so every unix has
> them, from aix to the bsds, its in the libc that one has to use anyway. so
> its there for FREE on all relevant unix platforms.

  Again, not always so in my exerience.

> >   There are Lua modules that provide such functionality but I'm guessing
> and dont work for any recent lua version ? like the regex module ?
> > you want those built into the base Lua distribution.
> exactly.
>
> > different priorities; if you agree with them, use Lua.  If you don't, don't
> > use Lua.
> exactly,.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Paul E. Merrell, J.D.
Jim said:

> we thougt it could be used as a scripting language, akin to perl, python, ruby
and all the others.

> we failed to recognize that it was only designed for authors that use it as
a config language for their c program to avoid inventing one of their own.
Lua is obviously not designed to run scripts that do any real work.
it's just a toy, a study of how a config language could look, it's not made
for scripting like say perl or other scripting languages,
it simply was not made for this and thus obviously is not up for the task.
we thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.

Gee, I must have dreamed that I wrote hundreds of extending scripts in
Lua for the NoteCase Pro outliner over the last several years and that
all these other developers embed Lua as a a scripting engine for their
users. <https://sites.google.com/site/marbux/home/where-lua-is-used>

:-)

> it's a really poor design that changes with every release because something
is broken again or poorly thought out.

NoteCase Pro has upgraded to the latest Lua version with each release
since v. 5.1. In all that time, I've only needed slight tweaks in
three scripts because of Lua changes. YMMV.

Best regards,

Paul

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Roberto Ierusalimschy
In reply to this post by Sean Conner
Please, don't feed the trolls.

Thanks,

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Gé Weijers
In reply to this post by Sean Conner

On Tue, Oct 2, 2018 at 4:44 PM Sean Conner <[hidden email]> wrote:
  On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.

Well, now we have this on Windows 10:


 
--

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Tim Hill
In reply to this post by Jim


On Oct 2, 2018, at 3:57 PM, Jim <[hidden email]> wrote:


in the meantime we had a look at several script languages with a usable
c api, from older ones like forth (fth), (regina) rexx, tcl, to java script
implementations like duktape and mujs to newer inventions like ring-lang.net
but so far we have not found anything that pleases us and in the long run
we will have to implement our own solution as no existing tool is up to the
task (at least we did not found one).

Well it looks to me like you are looking for perfection. When you have completed your perfect language please let us know so we can all start using it.

ALL languages are compromises and balance conflicting requirements. You are of course free to disagree with the decisions made for Lua, but that doesn't make them “crap”, it just means the Lua authors and you disagree on those compromises.

—Tim

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Lorenzo Donati-3
In reply to this post by Sean Conner
On 03/10/2018 01:44, Sean Conner wrote:

> It was thus said that the Great Jim once stated:
>> On 10/2/18, Sean Conner <[hidden email]> wrote:
>>> It was thus said that the Great Jim once stated:
>>                                        ^^^^^^^ why are you trying to
>> ridicule me ?
>
>   I've been using that opening line in email since the early 90s (and you
> can check other messages I've sent on this list to see it in use, and this
> is the first time it has received a negative repsonse to it.
>

It has always brought me memories of Arthurian or "Tolkienian" sagas.
Nice literary touch in our cold, harsh world of bits and bytes! ;-)

[...]

>   -spc
>
>


Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

nobody
In reply to this post by Sai Manoj Kumar Yadlapati
On 2018-09-27 06:39, Sai Manoj Kumar Yadlapati wrote:
> Hi all,
>
> Lua supports its own version of regular expression matching. But it
> doesn't have the | (pipe symbol) support and the quantifier support -
> a{1,5} meaning a can occur anywhere from 1 to 5 times.
>
> Both of these are present in PCRE. I am curious to know why these are
> not supported. Is it not supported intentionally or was it never
> considered?

There's been plenty said on this already, but one thing is missing:  In
Lua, the substitution doesn't have to be a string – it can be a table or
even a function.  (I *think* that's not possible with PCRE – never used
it, just looked at the manpages.)  What this means is that your pattern
only needs to be an approximate pre-filter and doesn't have to match
_exactly_ what you want.  Some examples…

Matching a bunch of fixed words? (a common use of | in REs), e. g.
/(TODO|NYI|BUG)/FIXME/ – do a fuzzy match (say, "%u%u%u+" a.k.a.
("%u"):rep(3).."+" or even just "%u+"), do the details with the table.

   str:gsub( "%u+", { TODO = "FIXME", NYI = "FIXME", BUG = "FIXME" } )

(nil means leave as-is, only if there's a value the match actually gets
substituted.  So any other matches don't matter.)

A table isn't enough?  Use a function.  It can do arbitrary filtering
and decide not to do anything (return nil), it can recursively match on
the match, ... so just do the same approximation trick. The a{1,5}
example can probably be done by matching "a+" and then checking the
match length in the function (it could even split the match internally
and then treat it as multiple consecutive matches… but that might get
too complicated, so…)

There's also LPeg, which does whole grammars, can produce arbitrary
structured data, happily does the same "run matches through a function"
trick, etc. ...and if I counted correctly, it's still ~12% the size of
PCRE. (Lol.)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: OR, quantifier support in Lua patterns

Victor Krapivensky
In reply to this post by Lorenzo Donati-3
On Sat, Sep 29, 2018 at 10:38:30AM +0200, Lorenzo Donati wrote:
> On 27/09/2018 06:39, Sai Manoj Kumar Yadlapati wrote:
> > Hi all,
> >
> > Lua supports its own version of regular expression matching.
> > But it doesn't have the | (pipe symbol) support and the quantifier support
> > - a{1,5} meaning a can occur anywhere from 1 to 5 times.
> >
> > Both of these are present in PCRE. I am curious to know why these are not
> > supported.Is it not supported intentionally or was it never considered?

> >
> > Thanks
> > Sai Manoj
> >
>
> To reinforce what Andrew said in his reply: please note that Lua patterns
> are NOT regular expressions. That is they haven't got the same expressive
> power as regexes, and that's /by design/. The goal was/is to keep Lua size
> small.
>
> I can't say if implementing alternation (i.e. that OR operator) will
> increase Lua size by much, but I suspect it will.

I am not sure why everybody seems to believe that (non-PCRE) regular
expression engine has to be complex. See
https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
400 lines of C (probably less rewritten in a "modern" style).

>
> -- Lorenzo
>

12