Lua 5.4-work1 with first class arrays (implementation & benchmarks)

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen
Hello,

I've been burning some midnight oil lately. Anyway, here are the results of my experiment with adding arrays to Lua. The experiment demonstrates significant performance increase in some cases, O(1) implementation for array length operator and how it fixes the issue with holes. The performance of tables is not affected negatively by the patch.

The text is best read directly from Github because of email formatting issues:

I sincerely hope this work is useful for the development of Lua in the future.

Any feedback, critique, questions, comments etc. are welcome!

Petri

--

This is a experimental patch for Lua 5.4-work1 that adds first class support for arrays. The goal of the patch is to not be a production ready solution -- there are surely several bugs with the implementation. For example, the Lua C API and metatables have not been tested at all. It should be merely treated as a proof of concept that could be used to evaluate the general feasibility of arrays in Lua.

Arrays are implemented as a subtype of tables. In fact, the implementation reuses the table data structures inside Lua VM. Internally an array is a table with only integer keys starting at index 1 and all key-values are stored in the array part of the table data structure.

An array does not suffer from the problem with holes, a well known issue with Lua tables and the '#' length operator, because an array always has an explicit size. Insertions to an array potentially enlarges the array, so that if a new key is outside the bounds of the array, the array is resized to fit the new element. To shrink an array, the code has to be explicit about it by using the table.resize() or table.remove() functions. An array can be considered to be a dense table (as opposed to sparse regular Lua tables). 

Elements of an array are always stored sequentially in memory, so value retrieval and setting happen in constant time, provided that the array access is within the bounds of the array. The '#' length operator for arrays is O(1). Together these have positive performance implications compared to regular tables.

The implementation has been carefully designed to not negatively affect the performance of regular Lua tables (see benchmarks and implementation notes at the end).

-- arrays are constructed using [...] syntax
local a = [1, 2, 3, [4, 5]]

-- an array can contain nils without problems
local a = [1, nil, 2, 3, nil]
print(#a) --> 5

-- caveat: array indices can only be positive (non-zero) integers
local a = []
a[-1] = 1 --> error: invalid array index
a.foo = 1 --> error: invalid array index

-- caveat: array are dense by nature and array constructors don't support named or sparse elements
-- use tables if you need them
local a = [1, [3] = true] --> syntax error
local a = [1, b = true] --> syntax error

-- arrays grow to fit new keys automatically
local a = []
a[10] = true
print(#a) --> 10

-- table.insert() works on arrays as well as regular tables
local a = [1, 2]
table.insert(a, 3) --> a = [1, 2, 3]

-- setting a value to nil never resizes an array
local a = [1,2,3]
print(#a) --> 3
a[3] = nil
print(#a) --> 3

-- table.resize() can be used to explicitly resize an array
local a = []
table.resize(a, 10)
print(#a) --> 10

-- table.remove() also resizes the array
local a = [1, 2, 3]
table.remove(a, 1) --> a = [2, 3]

-- table.pack() is not needed for a version of Lua with arrays, as nils  
-- can be stored naturally in an array
local a = table.pack(1, 2, 3) -- not needed
local a = [1, nil, 3] -- use this instead

-- table.unpack() works with arrays as you'd expect
table.unpack([1, nil, 3]) --> 1, nil, 3

-- there is no difference between pairs() and ipairs(); both iterate all integer keys of an array in numeric order
for i,v in ipairs([1,nil,3,nil]) do print(i, v) end
--> (1 1) (2 nil) (3 3) (4 nil)

-- syntactic sugar for function call syntax f{...} -> f({...}) does not have
-- an equivalent for arrays because the grammar would be ambiguous
local a = fun[1] -- indexing table 'fun'
local a = fun([1]) -- call function 'fun' with array as an argument


Benchmarks

A number of benchmarks stressing table and array operations were run on unmodified and patched version of Lua 5.4. The benchmarks were repeated 4 times and the best result was chosen in each case out of these 4 runs. See the following table. In the "Unmodified Tables" column are the results for unmodified Lua 5.4-work1. The "Patched Tables" column shows the results of the benchmarks for patched Lua but still using regular tables. The "Patches Arrays" column shows the results for the benchmarks when using arrays.

The benchmarks show that there is no noticeable performance difference between the unmodified and patched version of Lua when using tables. The small differences in execution times vary from run to run because of varying system load, base address randomization of the Lua executable and other factors. The first two columns show that the addition of an array subtype for tables does not negatively impact the performance of regular tables. When using arrays, "Push" and "Length" benchmarks show significant performance increases, and "Insert", "Scatter-Write" and "Scatter-read" are also clearly faster. The major performance increase of "Push" and "Length" is mainly due to the O(1) implementation of the '#' operator.

Benchmark        Unmodified    Patched   Patched
                 Tables        Tables    Arrays

Insert           0.352s        0.359s    0.319s
Write            0.470s        0.478s    0.456s
Read             0.429s        0.423s    0.420s
Push             2.268s        2.323s    0.419s
Scatter-write    0.217s        0.203s    0.180s
Scatter-read     0.200s        0.208s    0.170s
Length           1.999s        1.969s    0.124s



About the benchmarks

The benchmarks were run on macOS 10.13.3 with a Intel Core i7 CPU running at 2.3 GHz and with 8 GB 1600 MHz DDR3 RAM. For benchmarking Lua API checks were disabled and the code was compiled with the same version of the C compiler using the same options (-O2). If you want to repeat the benchmarks, the unmodified Lua sources can be found in this repository in the "unmodified" branch.

Insert: Starting with an empty table, this does a large number of table/array insertions to sequential indices. The point of this benchmark is to test the efficiency of growing the table/array data structure.

Write/read: These benchmarks stress setting and getting existing indices of a table/array.

Push: Does a large number of inserts using the "t[#t+1] = value" Lua idiom.

Scatter-write/scatter-read: Does a large number of random accesses to a table/array.

Length: Tests the performance of the '#' operator.

The code for benchmarks can be found in tests/benchmark1.lua.



Summary and conclusions

The main contributions of this work are:

* The addition of a new high performance array subtype for tables, which does not negatively affect the performance of regular tables (within measurement precision).

* The arrays don't suffer from the issue with holes (also see Opinions below how the hole issue can be resolved in the future for regular tables).

* "[...]" syntax for arrays.

* Arrays have been implemented in a way that is fully backwards compatible, meaning that old Lua programs can be run without modification with the patched Lua interpreter.

The work shows that Lua could benefit from an array subtype for tables.



Opinions

From a purely theoretical point of view, the inclusion of a new subtype adds some weight to the language, but from a more pragmatic view, the increased performance and fixing the hole issue overweights the theoretical issue. Moving forward, if arrays would be adopted to mainstream Lua, the '#' operator could be deprecated for tables and eventually made an array only feature. Of course, the '#' could still be implemented using a metatable for tables if needed. This would be the final nail in coffin for the hole issue.

With the addition of arrays another small wart in the language, the 'n' field of the table returned by table.pack(), could be eliminated by changing table.pack() to return an array. Since the array has a well defined length, the 'n' field would not be required. This would however break backwards compatibility, so this could be an opt-in feature at first.

In addition to the other benefits, in my opinion the inclusion of an array type would clarify the intent of Lua code, because the code explicitly uses the "[..]" syntax when creating an array.



Implementation details

The implementation adds two new fields to the Table structure: "truearray" and "sizeused". True array is a single boolean that does not increase the memory consumption of Table struct, because of C struct packing and alignment rules. "Sizeused" is used to track the used size of the array as reported by '#', and it adds 4 or 8 bytes (depending whether Lua is compiled as a 32 or 64-bit application) to the size of the struct, but this does not seem to affect the CPU cache hit ratio or slow down the interpreter.

The implementation has been carefully designed to not increase the size of the main VM loop, which could negatively affect performance. Particularly no new opcodes have been added to the VM. Incrementing "sizeused" when setting array elements is implemented without branching and CPU cache usage has also been taken into account.

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Roberto Ierusalimschy
> Any feedback, critique, questions, comments etc. are welcome!

I am particularly puzzled by this result:

> Benchmark        Unmodified    Patched   Patched
>                  Tables        Tables    Arrays
> Scatter-read     0.200s        0.208s    0.170s

If I undestood correctly the code, the Patched Arrays implementation
uses exactly the same code as Unmodified Tables for reads. Can you
explain this difference?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen

On 22 Mar 2018, at 20.29, Roberto Ierusalimschy <[hidden email]> wrote:

>> Any feedback, critique, questions, comments etc. are welcome!
>
> I am particularly puzzled by this result:
>
>> Benchmark        Unmodified    Patched   Patched
>>                 Tables        Tables    Arrays
>> Scatter-read     0.200s        0.208s    0.170s
>
> If I undestood correctly the code, the Patched Arrays implementation
> uses exactly the same code as Unmodified Tables for reads. Can you
> explain this difference?

Without diving deeper into it, I would guess that in this case some elements of the table go into the hash part (the data is sparse because of random access pattern in the benchmark). Look ups from the hash map is slower than reading from an array which is always stored linearly.

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen

> On 22 Mar 2018, at 21.17, Petri Häkkinen <[hidden email]> wrote:
>
>
> On 22 Mar 2018, at 20.29, Roberto Ierusalimschy <[hidden email]> wrote:
>
>>> Any feedback, critique, questions, comments etc. are welcome!
>>
>> I am particularly puzzled by this result:
>>
>>> Benchmark        Unmodified    Patched   Patched
>>>                Tables        Tables    Arrays
>>> Scatter-read     0.200s        0.208s    0.170s
>>
>> If I undestood correctly the code, the Patched Arrays implementation
>> uses exactly the same code as Unmodified Tables for reads. Can you
>> explain this difference?
>
> Without diving deeper into it, I would guess that in this case some elements of the table go into the hash part (the data is sparse because of random access pattern in the benchmark). Look ups from the hash map is slower than reading from an array which is always stored linearly.
>
Clarification: what I mean is that the table has been generated by writing values to random indices in the previous benchmark. I don’t know the details fully how Lua builds the array and hash parts of the table in this case. I’m only assuming some elements would fall out of the array part. Does this explanation make any sense?

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Egor Skriptunoff-2
In reply to this post by Petri Häkkinen
On Thu, Mar 22, 2018 at 12:17 PM, Petri Häkkinen wrote:
here are the results of my experiment with adding arrays to Lua.Any feedback, critique, questions, comments etc. are welcome!

Wow!  That's strong arrays!

Will concatenation operator be available out-of-the-box?
[1, nil] .. [nil, 4]  --> [1, nil, nil, 4]

How about default arrays' metatable?
([1, nil, 3]):sub(-2)  --> [nil, 3]
([]):append(42, 33, nil):remove(2)  --> [42, nil]

Now consuming all Lua VM memory is much easier than before :-)
([])[1e8]=0

How to stop writing [[ when I want to create 2D-array?
[ [1], 2] -- ok
[[1], 2] -- syntax error
[[1],[2]] -- syntactically correct, but not a 2D-array! :-)


Could we sometimes create an array without actually creating an array?
function f(...)
   local arr = [...]
   -- if the code of this function doesn't modify arr (and doesn't pass arr somewhere else)
   -- then data should be stored on the stack, no array actually should be created
   -- access to arr[k] and #arr should be emulated
   for k = 1, #arr do print(k, arr[k]) end
end
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen

> On 23 Mar 2018, at 20.59, Egor Skriptunoff <[hidden email]> wrote:
> Wow!  That's strong arrays!
>
> Will concatenation operator be available out-of-the-box?
> [1, nil] .. [nil, 4]  --> [1, nil, nil, 4]

Thanks for the interest in this experimental patch!

First of all, I have no plans to further work or maintain this proof of concept fork of Lua. I made it merely so that others could evaluate the feasibility of having an array type in Lua, and also demonstrate how it could be efficiently implemented (without slowing down regular tables).

I don’t have a strong opinion whether arrays would need a concat operator by default, but since they’re not available for tables either and arrays are basically tables with some special semantics for insertion/removal, I’m not sure concat operator would be consistent to have by default.

> How about default arrays' metatable?
> ([1, nil, 3]):sub(-2)  --> [nil, 3]
> ([]):append(42, 33, nil):remove(2)  --> [42, nil]

I would prefer to have regular functions that work on both tables and arrays. For example table.remove() works on both tables and arrays.

Just a personal preference of mine... I’m not a fan of the object oriented ‘:’ call syntax at all. I think OOP in general is a huge mistake, waste of programmer time and CPU cycles and can easily lead to overengineered code that is hard to understand and maintain.
 
> How to stop writing [[ when I want to create 2D-array?
> [ [1], 2] -- ok
> [[1], 2] -- syntax error
> [[1],[2]] -- syntactically correct, but not a 2D-array! :-)

Yep, this is a caveat, but as you showed, can be worked around by adding a space between ‘[‘ so not a huge problem.

I wish Lua could have chosen another syntax for multiline strings though. For example

"""this is a
multiline string"""

would have been more logical choice and easier to type, but I understand there are issues with escaping inside the string (mental note to myself: check how other languages handle """ as part of the string).

> Could we sometimes create an array without actually creating an array?
> function f(...)
>    local arr = [...]
>    -- if the code of this function doesn't modify arr (and doesn't pass arr somewhere else)
>    -- then data should be stored on the stack, no array actually should be created
>    -- access to arr[k] and #arr should be emulated
>    for k = 1, #arr do print(k, arr[k]) end
> end

This is not really related to arrays (tables could benefit from the same optimization). These kind of code generation optimizations would increase the complexity of Lua implementation greatly and are generally not done for simplicity.

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Egor Skriptunoff-2
On Sat, Mar 24, 2018 at 1:46 PM, Petri Häkkinen wrote:

> Will concatenation operator be available out-of-the-box?
> [1, nil] .. [nil, 4]  --> [1, nil, nil, 4]

I don’t have a strong opinion whether arrays would need a concat operator by default, but since they’re not available for tables either and arrays are basically tables with some special semantics for insertion/removal,

I can't imagine possible definition of concatenation for dictionaries.
What should be the result of {a=1, b=2}..{a=3, c=4} ?
But for strong arrays, concatenation is a very natural operation.

 
> [[1],[2]] -- syntactically correct, but not a 2D-array! :-)

Yep, this is a caveat, but as you showed, can be worked around by adding a space between ‘[‘ so not a huge problem.

I wish Lua could have chosen another syntax for multiline strings though. For example

"""this is a
multiline string"""

would have been more logical choice and easier to type, but I understand there are issues with escaping inside the string

Lua string literal syntax has great and very useful ability to create unlimited nesting without escaping:
[[a]]
[=[ b [[a]] b ]=]
[==[ c [=[ b [[a]] b ]=] c ]==]
...
Three-quote syntax doesn't have this feature

But maybe it would be better to deprecate [[ string literal syntax.
Only [=[  [==[  [===[ ... should work for strings.
Otherwise a lot of people would stumble upon matrix-like syntax problem
m = [[c,-s],[s,c]]


Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen

On 24 Mar 2018, at 13.35, Egor Skriptunoff <[hidden email]> wrote:

I can't imagine possible definition of concatenation for dictionaries.
What should be the result of {a=1, b=2}..{a=3, c=4} ?
But for strong arrays, concatenation is a very natural operation.

I tend to agree with you. Should be fairly easy to implement too.

would have been more logical choice and easier to type, but I understand there are issues with escaping inside the string

Lua string literal syntax has great and very useful ability to create unlimited nesting without escaping:
[[a]]
[=[ b [[a]] b ]=]
[==[ c [=[ b [[a]] b ]=] c ]==]
...
Three-quote syntax doesn't have this feature

I know that and this was what I was refering to with ”issues with escaping”.

The number of ” could be varied similarly to allow ””” inside the string. 

str = ””””Literal with ””” in it””””

Leading ” in the literal could be handled by having a line break after the opening ”””.
Line break would be ignored ([[ does this too).

But ” at the end of the string still poses a problem... unless perhaps the last line break would be automatically cut from the string if the previous char is a ”?

”””
”Literal string with quotes”
”””

Would produce a string with no line breaks.

Maybe this could work? More likely I mislooked something...

But maybe it would be better to deprecate [[ string literal syntax.
Only [=[  [==[  [===[ ... should work for strings.

But [=[ looks nothing like it has anything to do with strings and is really hard to type on my keyboard. Most EU kbs have buried [ and ] behind Alt Gr...

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen
Bumping this thread just this once, as I find the lack of interest on this strange...

Is no-one else interested in O(1) implementation for the # operator, native (faster than tables) array implementation and how it can be used to solve the hole issue and issues with pack/unpack?

I think an array type is the right way to go for Lua, considering it solves many issues and also gives a nice perf boost.

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Agathoklis D. Chatzimanikas
Hi Petri,

On Tue, Mar 27, at 07:38 Petri Häkkinen wrote:
> Bumping this thread just this once, as I find the lack of interest on this strange...

Silence means interest here :-).

> Is no-one else interested in O(1) implementation for the # operator, native (faster than tables) array implementation and how it can be used to solve the hole issue and issues with pack/unpack?
>
> I think an array type is the right way to go for Lua, considering it solves many issues and also gives a nice perf boost.

It will be implemented of course. The time has come.

> Petri

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Dibyendu Majumdar
In reply to this post by Petri Häkkinen
On 27 March 2018 at 17:38, Petri Häkkinen <[hidden email]> wrote:
> Bumping this thread just this once, as I find the lack of interest on this strange...
>
> Is no-one else interested in O(1) implementation for the # operator, native (faster than tables) array implementation and how it can be used to solve the hole issue and issues with pack/unpack?
>
> I think an array type is the right way to go for Lua, considering it solves many issues and also gives a nice perf boost.
>

Hi, I think hoping for an array subtype in Lua is probably going to
result in disappointment. The table as the sole data structure is a
defining feature of Lua; not only that, the semantics associated with
table iteration and nils, makes it impossible to introduce such a
subtype without break compatibility with existing programs. Combining
an array type with a singleton 'undefined' value may be an option, but
didn't Roberto say in one of his posts:

> Once 'empty' and 'undef' are values, you break all those properties.

> I think we cannot use a value to represent the absence of a value, no
> matter how many no-value values we add. It is a contradiction. Several
> languages followed this path, always with bad results.

So it seems that the idea of a singleton 'undefined' value is also not
acceptable.

Of course we could see a new Lua language definition in 6.0 that makes
a clean break from 5.x as Hisham suggested. But if I were asked to bet
on it I would probably say that an array subtype will not happen.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Duane Leslie
In reply to this post by Petri Häkkinen
> On 28 Mar 2018, at 03:38, Petri Häkkinen <[hidden email]> wrote:
>
> Bumping this thread just this once, as I find the lack of interest on this strange...

I'm not interested, I don't think there's enough advantage to justify the effort.  The split between the array part and the hash part for sparse storage is a clever compromise, and if you're only intending to make an array without holes or with very few then you get a standard array anyway.

The issue of length is a quirk of the language but I can't see any generic way to keep a more accurate version without cost (i.e. largest key, but this is hard in the case of deletion).  If you really need a fast accurate length you just store it manually.  I understand this is a regular tripping point for new users, but really it should just be a case of "once burnt, twice shy".

I'm much more interested in seeing tuples added to the language.  I think it solves the parameter packing (arguments and return) issue much more elegantly than arrays, but I'm currently stuck on the issue of key equality in tables (which arrays wouldn't have anyway).  I haven't finished investigating the best options for interning them yet, but I tinker when I have time.

Regards,

Duane.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Doug Currie
In reply to this post by Petri Häkkinen
The drawback I noticed was increasing the size of the Table structure. Since the hash part is not used, perhaps there's a way to share the hashtable control fields and "Sizeused" to avoid growing the structure?  I suspect the assessment of the impact of sharing fields on code complexity and performance may be a big job.

Overall I like the idea of an array type better than yet another variation of nil.




On Tue, Mar 27, 2018 at 12:38 PM, Petri Häkkinen <[hidden email]> wrote:
Bumping this thread just this once, as I find the lack of interest on this strange...

Is no-one else interested in O(1) implementation for the # operator, native (faster than tables) array implementation and how it can be used to solve the hole issue and issues with pack/unpack?

I think an array type is the right way to go for Lua, considering it solves many issues and also gives a nice perf boost.

Petri

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Eric Man-3
Overall I like the idea of an array type better than yet another variation of nil.

Ditto.

On Wed, Mar 28, 2018 at 8:56 AM, Doug Currie <[hidden email]> wrote:
The drawback I noticed was increasing the size of the Table structure. Since the hash part is not used, perhaps there's a way to share the hashtable control fields and "Sizeused" to avoid growing the structure?  I suspect the assessment of the impact of sharing fields on code complexity and performance may be a big job.

Overall I like the idea of an array type better than yet another variation of nil.




On Tue, Mar 27, 2018 at 12:38 PM, Petri Häkkinen <[hidden email]> wrote:
Bumping this thread just this once, as I find the lack of interest on this strange...

Is no-one else interested in O(1) implementation for the # operator, native (faster than tables) array implementation and how it can be used to solve the hole issue and issues with pack/unpack?

I think an array type is the right way to go for Lua, considering it solves many issues and also gives a nice perf boost.

Petri


Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Petri Häkkinen
Thanks to all for the answers! My responses inlined below:

On 28 Mar 2018, at 0.02, Dibyendu Majumdar <[hidden email]> wrote:

> The table as the sole data structure is a
> defining feature of Lua; not only that, the semantics associated with
> table iteration and nils, makes it impossible to introduce such a
> subtype without break compatibility with existing programs.

Tables are a defining feature of Lua... until it is no more so. Similarly single numeric type was a defining feature of Lua, until integers were added. Mind you, “arrays” are still internally tables, just without the hash part and with an explicit length. Conceptually the leap from tables to arrays is small.

I think I've shown how an array subtype can be added without breaking compatibility with existing programs. Can you show how the patch breaks an existing program, please?


On 28 Mar 2018, at 0.56, Duane Leslie <[hidden email]> wrote:

> The issue of length is a quirk of the language but I can't see any generic way to keep a more accurate version without cost

The original email outlined how this would work. The patch implements # in constant time for arrays (note that this does not negatively affect the performance of tables — see benchmarks). Once you have arrays you no longer need # operator for tables. So # for tables could be first deprecated and ultimately removed. At this point you have # which works for arrays, is extremely fast, and does not have issues with holes. Problem solved!

(Naturally ‘#’ could still be implemented using metatables for tables, if so desired.)

Also, I think you are underestimating the value of a fast # operator. For example, a benchmark doing "t[#t+1] = value" runs about 5 times faster with arrays and constant time length operator. And the # operator alone runs about 16 times faster on arrays (on a big dataset; the speed of ‘#’ for tables varies depending on the table size) . That’s a sizable speed difference that should show on real programs, not just benchmarks.


On 28 Mar 2018, at 0.56, Doug Currie <[hidden email]> wrote:

> The drawback I noticed was increasing the size of the Table structure. Since the hash part is not used,
> perhaps there's a way to share the hashtable control fields and "Sizeused" to avoid growing the structure?
> I suspect the assessment of the impact of sharing fields on code complexity and performance may be a big job.

In practice, the impact of increasing the size of Table data structure by 4 or 8 bytes (depending on 32/64 bitness of the app) was discussed in the article. The benchmarks show that it does not negatively affect performance and the increase in memory consumption is very small. For example, a single element in a table needs two TValues for storage (key & value), i.e. 32 bytes per key-value. So the added "sizeused" field increases memory consumption as much as a 1/4th of a key-value.

To get high performance "sizeused" needs to be a separate field in Table. Hashing it into other fields would be almost impossible without destroying the performance, I’m afraid.

Petri


Reply | Threaded
Open this post in threaded view
|

AW: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

michaelflad
In reply to this post by Petri Häkkinen
Hi there,

I've last year chosen Lua to be the host language for my primary work in the
near future (game development) and I'd be more than interested in a better
native array support.
Right now I just use 0 instead of nil for empty spaces and I manually track
the length of all my arrays that might be dynamic in whatever way as I don't
want to pay for a full iteration each time I want to know the size.

Another big downside besides performance and the handling of empty slots is
actually that I have to create my arrays by sequentially filling them after
construction, potentially triggering multiple reallocs and in the end, due
the nature of dynamic arrays, with a lot of unused space. Given I primarily
work on sim/city builder types of games I have thousands of active agents in
my maps constantly creating new arrays for their paths etc. so that's
probably what hurts me most.

That said, at the moment, I use two existing frameworks and for one I don't
have any sources (Corona) and the other one uses LuaJIT so no chance of
getting new features soon anyway. But my usage of Lua goes back about 20
years (3.2) and I hope there's another 20 to come, so I'm still very
interested in the future of the language.

> -----Ursprüngliche Nachricht-----
> Von: [hidden email] <[hidden email]> Im Auftrag
> von Petri Häkkinen
> Gesendet: Dienstag, 27. März 2018 18:38
> An: Lua mailing list <[hidden email]>
> Betreff: Re: Lua 5.4-work1 with first class arrays (implementation &
> benchmarks)
>
> Bumping this thread just this once, as I find the lack of interest on this
> strange...
>
> Is no-one else interested in O(1) implementation for the # operator,
native
> (faster than tables) array implementation and how it can be used to solve
the
> hole issue and issues with pack/unpack?
>
> I think an array type is the right way to go for Lua, considering it
solves many
> issues and also gives a nice perf boost.
>
> Petri


Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Dibyendu Majumdar
In reply to this post by Petri Häkkinen
On 28 March 2018 at 06:00, Petri Häkkinen <[hidden email]> wrote:
> I think I've shown how an array subtype can be added without breaking compatibility with existing programs. Can you show how the patch breaks an existing program, please?
>

Hi, this issue was discussed at length already?

The simplest example is standard iteration using for / pairs().
Existing code does not expect nils to appear. What is problematic is
that you can't tell if some existing code will break - because it is a
semantics thing - syntactically it all looks fine.

Regards

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Duane Leslie
In reply to this post by Petri Häkkinen


> On 28 Mar 2018, at 16:00, Petri Häkkinen <[hidden email]> wrote:
>
>> On 28 Mar 2018, at 0.56, Duane Leslie <[hidden email]> wrote:
>>
>> The issue of length is a quirk of the language but I can't see any generic way to keep a more accurate version without cost
>
> (Naturally ‘#’ could still be implemented using metatables for tables, if so desired.)

I have to admit I haven't read your code, only the Readme, but I fail to see the difference between your proposed operation and the `.n` field and some very basic metamethods and extending the library functions which you require anyway.  When I spoke about *generic* I meant a length value that was always equal to the largest index in the table.

> Also, I think you are underestimating the value of a fast # operator. For example, a benchmark doing "t[#t+1] = value" runs about 5 times faster with arrays and constant time length operator.

How about compared to using `.n` or the local variable idiom (i.e. `t[next], next = value, next+1`).  `#t` is known to be slow, so where you need performance you cache it or manage it manually.

I don't know if I am convinced that automatically caching this value and having functions to manipulate it is enough of a boon to justify a new primary data type.

Regards,

Duane.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

pocomane
In reply to this post by Petri Häkkinen
On Wed, Mar 28, 2018 at 7:00 AM, Petri Häkkinen <[hidden email]> wrote:
> The original email outlined how this would work. The patch implements # in constant time for arrays (note that this does not negatively affect the performance of tables — see benchmarks). Once you have arrays you no longer need # operator for tables. So # for tables could be first deprecated and ultimately removed. At this point you have # which works for arrays, is extremely fast, and does not have issues with holes. Problem solved!

If # for table is removed, I probably will re-implement it by myself.
The point is: I often found the current # behaviour very useful, not
something to fight against. E.g. I can write `t[1+#t]=x` and be sure
that the `x` is added in a free space!

Moreover, about your benchmark, I think that comparing # with the
access to a stucture field is a bit unfair. In lua you will never use
# on tables with holes (when performance is important). I would
compare it against a the common .n idiom, or a full
Pure-Lua-Array-Type solution.

And, finally, a question for all the list: how much in your opinion is
the performance important? A proper implemented pure-lua-array can do
everything in O(1), so the gain of a native solution will be always a
constant factor. If this factor is, let's say 10%, it worth? And 20%?
100%?

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

Dirk Laurie-2
2018-03-28 9:35 GMT+02:00 pocomane <[hidden email]>:

> If # for table is removed, I probably will re-implement it by myself.
> The point is: I often found the current # behaviour very useful, not
> something to fight against. E.g. I can write `t[1+#t]=x` and be sure
> that the `x` is added in a free space!

I have often wished that table.border was available so that I could
use it to protect myself against __len metamethods.

I don't see the point of first class arrays whose elements are general
Lua values. You don't get enough extra speed. It optimizes at the
wrong place.

I do see the point of arrays of bytes, or two-byte Unicode characters,
or of floating-point numbers, etc, but those would be userdata, not
tables. Using the same method names as the string and/or table
libraries would of course be a sensible thing to do.

123