[ANN] Lua 5.2.1 (work1) now available

classic Classic list List threaded Threaded
109 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[ANN] Lua 5.2.1 (work1) now available

Luiz Henrique de Figueiredo
Lua 5.2.1 (work1) is now available at
        http://www.lua.org/work/lua-5.2.1-work1.tar.gz

MD5 bc6b953ee54b7af31f4bc1f2df05fe8f  -
SHA1 6d9c63f615d2e03b3565e0064599d24aaac49096  -

Lua 5.2.1 introduces better handling of string collisions based on a
random seed. This work version is meant to let the community assess
the usefulness and the effectiveness of this experimental feature.

The complete diffs from Lua 5.2.0 to 5.2.1 are available at
        http://www.lua.org/work/diffs-lua-5.2.0-lua-5.2.1-work1.txt

We thank everyone for their feedback on Lua 5.2 till now.

All feedback welcome. Thanks.
--lhf


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

liam mail
On 21 March 2012 18:37, Luiz Henrique de Figueiredo
<[hidden email]> wrote:

> Lua 5.2.1 (work1) is now available at
>        http://www.lua.org/work/lua-5.2.1-work1.tar.gz
>
> MD5     bc6b953ee54b7af31f4bc1f2df05fe8f  -
> SHA1    6d9c63f615d2e03b3565e0064599d24aaac49096  -
>
> Lua 5.2.1 introduces better handling of string collisions based on a
> random seed. This work version is meant to let the community assess
> the usefulness and the effectiveness of this experimental feature.
>
> The complete diffs from Lua 5.2.0 to 5.2.1 are available at
>        http://www.lua.org/work/diffs-lua-5.2.0-lua-5.2.1-work1.txt
>
> We thank everyone for their feedback on Lua 5.2 till now.
>
> All feedback welcome. Thanks.
> --lhf
>
>


Are there any tests for these experimental changes?

Liam

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

liam mail
On 21 March 2012 20:11, liam mail <[hidden email]> wrote:

> On 21 March 2012 18:37, Luiz Henrique de Figueiredo
> <[hidden email]> wrote:
>> Lua 5.2.1 (work1) is now available at
>>        http://www.lua.org/work/lua-5.2.1-work1.tar.gz
>>
>> MD5     bc6b953ee54b7af31f4bc1f2df05fe8f  -
>> SHA1    6d9c63f615d2e03b3565e0064599d24aaac49096  -
>>
>> Lua 5.2.1 introduces better handling of string collisions based on a
>> random seed. This work version is meant to let the community assess
>> the usefulness and the effectiveness of this experimental feature.
>>
>> The complete diffs from Lua 5.2.0 to 5.2.1 are available at
>>        http://www.lua.org/work/diffs-lua-5.2.0-lua-5.2.1-work1.txt
>>
>> We thank everyone for their feedback on Lua 5.2 till now.
>>
>> All feedback welcome. Thanks.
>> --lhf
>>
>>
>
>
> Are there any tests for these experimental changes?
>
> Liam

I should add that I realise these are internal changes yet do not know
where the test suite is for the released 5.2 version.

Liam

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Roberto Ierusalimschy
> > Are there any tests for these experimental changes?
> >
> > Liam
>
> I should add that I realise these are internal changes yet do not know
> where the test suite is for the released 5.2 version.

The tests are the same as for 5.2.0. However, what we are most
interested is about the performance of this new version against 5.2.0 in
real programs (in respect to string manipulation).

The main change in this experimental version is that only strings
smaller than a certain limit get internalized. That can slow down a
little programs that use long strings as keys, but on the other hand
can speed up a little programs that use long strings not as keys. The
default limit is 32 bytes for 32-bit machines and 64 bytes for 64-bit
machines:

#define LUA_MAXSHORTLEN         (8 * sizeof(void*))

It may be interesting to try different definitions (e.g., 32 bytes
for all architectures).

-- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Miles Bader-2
Roberto Ierusalimschy <[hidden email]> writes:
> The main change in this experimental version is that only strings
> smaller than a certain limit get internalized. That can slow down a
> little programs that use long strings as keys, but on the other hand
> can speed up a little programs that use long strings not as keys. The
> default limit is 32 bytes for 32-bit machines and 64 bytes for 64-bit
> machines:

I imagine that one problem is going to be programs that use
memoization, in which case the keys are often pretty arbitrary
(whereas "typical" use of string keys probably favors short strings)
_and_ expected to be fast.  this seems to be a pretty popular
technique in Lua...

-miles

--
/\ /\
(^.^)
(")")
*This is the cute kitty virus, please copy this into your sig so it can spread.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Roberto Ierusalimschy
> I imagine that one problem is going to be programs that use
> memoization, in which case the keys are often pretty arbitrary
> (whereas "typical" use of string keys probably favors short strings)
> _and_ expected to be fast.  this seems to be a pretty popular
> technique in Lua...

Unlike short strings, long strings rarely are literals in the program.
People tend not to write t.thisIsAVeryLongStringButBarelyLongEnough.
Long strings usually are produced somehow, through computations or
input. So, the overhead of the indexing may be diluted (and
partially compensated) by all these other costs.

Of course, my answer (and your message) is just speculation. That iw
why we would like to see the behavior in real programs (not artificial
benchmarks).

Moreover, a small overhead may be acceptable as a price for solving
the "hash complexity attack" (that people will worry about despite all
contrary evidence).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

liam mail
In reply to this post by Roberto Ierusalimschy
On 21 March 2012 20:44, Roberto Ierusalimschy <[hidden email]> wrote:
>> > Are there any tests for these experimental changes?
>> >
>> > Liam
>>
>> I should add that I realise these are internal changes yet do not know
>> where the test suite is for the released 5.2 version.
>
> The tests are the same as for 5.2.0.

Yes but erm I do know where these now live, "yet do not know where the
test suite is for the released 5.2 version"

Could someone please send me a link of rc versions of 5.2.0 from rc2
with test suites?  I am trying to track a bug in my own code which is
present in 5.2.0 yet not in the rc2 which is the latest rc I have.

Thanks

Liam

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Luiz Henrique de Figueiredo
> Could someone please send me a link of rc versions of 5.2.0 from rc2
> with test suites?

All work versions are available at
        http://www.lua.org/work/old/

The tests suite for 5.2 is available at
        http://www.lua.org/tests/5.2/

It's the same for all rcs.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

liam mail
On 21 March 2012 22:38, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
> http://www.lua.org/work/old/

Thank you. I am sure there is a link somewhere on the site to these
pages, I just could not find it.

Liam

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Luiz Henrique de Figueiredo
> > http://www.lua.org/work/old/
>
> Thank you. I am sure there is a link somewhere on the site to these
> pages, I just could not find it.

There is no link, no. But now you know. :-)

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Enrico Colombini
In reply to this post by Roberto Ierusalimschy
On 21/03/2012 21.44, Roberto Ierusalimschy wrote:
> #define LUA_MAXSHORTLEN         (8 * sizeof(void*))

Shouldn't it perhaps be better to put it in luaconf.h? Or is it in
llimits.h because it's supposed to be changed very rarely?

A second point: does "should not be larger than 255" mean there is no
way to disable it and restore the previous behavior?
Maybe "0=all strings are interned" could be useful in case some program
had trouble with the new system (e.g. I can think of some sort of text
processing, such as a translation helper that uses chunks of already
translated text as table indexes).
Assuming no underlying change or new optimization prevents this, of course.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Hans Hagen
On 22-3-2012 09:13, Enrico Colombini wrote:

> On 21/03/2012 21.44, Roberto Ierusalimschy wrote:
>> #define LUA_MAXSHORTLEN (8 * sizeof(void*))
>
> Shouldn't it perhaps be better to put it in luaconf.h? Or is it in
> llimits.h because it's supposed to be changed very rarely?
>
> A second point: does "should not be larger than 255" mean there is no
> way to disable it and restore the previous behavior?
> Maybe "0=all strings are interned" could be useful in case some program
> had trouble with the new system (e.g. I can think of some sort of text
> processing, such as a translation helper that uses chunks of already
> translated text as table indexes).
> Assuming no underlying change or new optimization prevents this, of course.

Indeed. I'm pretty sure that I use very long strings as index all over
the place. But does this patch indeed prevent long strings to be unique
indices? For me the interning of strings is one of the charms of Lua.

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Enrico Colombini
On 22/03/2012 9.42, Hans Hagen wrote:
> Indeed. I'm pretty sure that I use very long strings as index all over
> the place. But does this patch indeed prevent long strings to be unique
> indices? For me the interning of strings is one of the charms of Lua.

As this change could have a (potentially significantly) different impact
on different programs, maybe it could be useful to be able to
enable/disable it at program launch (or Lua state creation) time.

I'm especially thinking about Lua used as system-wide generic language,
as opposed to ad-hoc tools that can be configured and recompiled as needed.
Not all applications are at risk from key collision abuse; it seems a
pity to limit the power of an interesting and useful feature of the
language.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Rebel Neurofog
In reply to this post by Luiz Henrique de Figueiredo
> Lua 5.2.1 (work1) is now available at
>        http://www.lua.org/work/lua-5.2.1-work1.tar.gz

First, the are some disadvantages I suppose:
1. No more simple pointer comparison in C to test string equality.
Following example shows why:
#include "lua.h"
#include "lauxlib.h"
#include <stdio.h>

int main ()
{
    lua_State *L = luaL_newstate ();
    char big_string[1024];
    memset (big_string, 'c', 1023);
    big_string[1023] = '\0';
    int i;
    for (i = 0; i < 10; i++)
        fprintf (stderr, "%p - this is supposed to be the same\n",
lua_pushstring (L, big_string));
    return 0;
}

Simple pointer comparison is needed for fast string constant to C enum
resolving.
This is very fast trick when string is pushed into Lua registry and
returned pointer
is saved into binary key matching structure created right after lua_newstate ().
Pointer to that key matching structure is resolved after getting
allocator user data
with lua_getallocf (). I may write such library if someone is
interested (I'm planning
to implement the method in my project but didn't get to this yet).
2. Such change may break reliance on fast comparisons of huge string
(say check for buffer equality in text editor). So this may hardly modify
optimization approaches by shifting bottle necks significantly.

Here's my vision:
1. Lua strings are currently perfect and should not be changed.
Experienced Lua programmers have detailed understanding where and how
(e. g., "table.concat ()")
Lua strings should be used and where shouldn't.
2. Situations with byte arrays (especially mutable) are usually worked out by
creating project-specific user-data based modules to work with.
Here's my way for example:
http://devel.nomrhis.net/Client_API_reference/modules/bytea
Of course it's not designed for mainstream Lua.
4. I think there should be standard common mainstream Lua module
(and also sandbox-safe as 'string', 'math' and 'table') for processing
mutable byte array
with fixed length given at creation time.
5. It is probably possible to implement reallocation of data block
given by lua_newuserdata ().
Although it may give problems with multithreaded Lua...

Conclusion:
Lua strings have significant advantages which shouldn't be broken by
intension to create
absolute universal data type for all things.
Instead standard mutable byte array should be implemented on top of user data.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Xavier Wang
2012/3/22 Rebel Neurofog <[hidden email]>:

>> Lua 5.2.1 (work1) is now available at
>>        http://www.lua.org/work/lua-5.2.1-work1.tar.gz
>
> First, the are some disadvantages I suppose:
> 1. No more simple pointer comparison in C to test string equality.
> Following example shows why:
> #include "lua.h"
> #include "lauxlib.h"
> #include <stdio.h>
>
> int main ()
> {
>    lua_State *L = luaL_newstate ();
>    char big_string[1024];
>    memset (big_string, 'c', 1023);
>    big_string[1023] = '\0';
>    int i;
>    for (i = 0; i < 10; i++)
>        fprintf (stderr, "%p - this is supposed to be the same\n",
> lua_pushstring (L, big_string));
>    return 0;
> }
>
> Simple pointer comparison is needed for fast string constant to C enum
> resolving.
> This is very fast trick when string is pushed into Lua registry and
> returned pointer
> is saved into binary key matching structure created right after lua_newstate ().
> Pointer to that key matching structure is resolved after getting
> allocator user data
> with lua_getallocf (). I may write such library if someone is
> interested (I'm planning
> to implement the method in my project but didn't get to this yet).
> 2. Such change may break reliance on fast comparisons of huge string
> (say check for buffer equality in text editor). So this may hardly modify
> optimization approaches by shifting bottle necks significantly.
>
> Here's my vision:
> 1. Lua strings are currently perfect and should not be changed.
> Experienced Lua programmers have detailed understanding where and how
> (e. g., "table.concat ()")
> Lua strings should be used and where shouldn't.
> 2. Situations with byte arrays (especially mutable) are usually worked out by
> creating project-specific user-data based modules to work with.
> Here's my way for example:
> http://devel.nomrhis.net/Client_API_reference/modules/bytea
> Of course it's not designed for mainstream Lua.
> 4. I think there should be standard common mainstream Lua module
> (and also sandbox-safe as 'string', 'math' and 'table') for processing
> mutable byte array
> with fixed length given at creation time.
> 5. It is probably possible to implement reallocation of data block
> given by lua_newuserdata ().
> Although it may give problems with multithreaded Lua...
>
> Conclusion:
> Lua strings have significant advantages which shouldn't be broken by
> intension to create
> absolute universal data type for all things.
> Instead standard mutable byte array should be implemented on top of user data.
>


Yes, I also don't like this solution. I think add a new mutable string
type is better. the original string is just as Symbol, and new string
is Byte array. thus will produce a mechanism to implement safe program
such like http server.

the new string can be based one userdata or a new type of string. to
make a new string type based work1 is easy: just make all string as
usual, and makes std library return new string type. Lua can offer C
API to create/manipulate the new string type. or just add a
lua_pushbuffer and a new type LUA_TBUFFER and let remains as usual.
and can make a new 'buffer' module to modify the buffer type. just
like my buffer module ( http://github.com/starwing/lbuffer ).

buffer can have hash field, if you compare buffer or use buffer as the
key of table, just calculate hash for buffer, using random seed to
calculate several bytes of buffer (but not all). buffer's pointer are
not the same and you can not compare buffer with pointer. lua_tobuffer
will return char* but not const char*

to add buffer type in work1 is not hard. it's worth to try it :-)

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Roberto Ierusalimschy
In reply to this post by Enrico Colombini
> I'm especially thinking about Lua used as system-wide generic
> language, as opposed to ad-hoc tools that can be configured and
> recompiled as needed.
> Not all applications are at risk from key collision abuse; it seems
> a pity to limit the power of an interesting and useful feature of
> the language.

What feature are being limited? How?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Roberto Ierusalimschy
In reply to this post by Hans Hagen
> Indeed. I'm pretty sure that I use very long strings as index all
> over the place. But does this patch indeed prevent long strings to
> be unique indices? For me the interning of strings is one of the
> charms of Lua.

This patch does not change anything in the semantics of Lua. The
interning of strings always has been and continues to be invisible to
the programmer. It only afects performance, in both directions.
(There is a cost for interning a string.)

So, the question is: thus the savings in not interning some strings
compensates the losses in performance in indexing some strings? The only
way that I see to answer that is profiling real programs.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Enrico Colombini
In reply to this post by Roberto Ierusalimschy
On 22/03/2012 16.44, Roberto Ierusalimschy wrote:
> What feature are being limited? How?

The ability to efficiently use long strings as table keys (assuming I
understood correctly the implications of not being able to restore the
previous behavior, not even by changing a header and recompiling).
Not all programs need long string keys to be efficient (i.e. interned),
but for those needing it (now or in the future) it would be useful to
retain the choice.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Roberto Ierusalimschy
In reply to this post by Enrico Colombini
> On 21/03/2012 21.44, Roberto Ierusalimschy wrote:
> >#define LUA_MAXSHORTLEN         (8 * sizeof(void*))
>
> Shouldn't it perhaps be better to put it in luaconf.h? Or is it in
> llimits.h because it's supposed to be changed very rarely?

We can change that. (Anyway, that declaration should be guarded
by an #if !defined(LUA_MAXSHORTLEN).)


> A second point: does "should not be larger than 255" mean there is
> no way to disable it and restore the previous behavior?

The complete part is "should not be larger than 255, TO ALLOW FUTURE
CHANGES". At least for now it can be any value.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua 5.2.1 (work1) now available

Enrico Colombini
In reply to this post by Roberto Ierusalimschy
On 22/03/2012 16.57, Roberto Ierusalimschy wrote:
> This patch does not change anything in the semantics of Lua. The
> interning of strings always has been and continues to be invisible to
> the programmer. It only afects performance, in both directions.
> (There is a cost for interning a string.)

Yes, that is clear.

> So, the question is: thus the savings in not interning some strings
> compensates the losses in performance in indexing some strings? The only
> way that I see to answer that is profiling real programs.

That's a valid point and something that certainly needs to be done; I'm
looking at the matter from a slightly different viewpoint: many things
are configurable in Lua (either at compile time or at runtime) to
achieve a better efficiency for a particolar purpose; it would be useful
to still have the possibility to intern all strings.

--
   Enrico

1234 ... 6