Array subtype is conceptually simplest

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Array subtype is conceptually simplest

Dirk Laurie-2
* Every Lua 5.2 program you ever wrote in which somewhere the argument
to 'ipairs' is a table with an __index metamethod, has a chance of
being broken in Lua 5.3, because that metamethod _will_ be invoked in
a way that probably is not the reason why you supplied it. It has
happened three times to me, so far, and the way in which the program
fails has on each occasion been so different that I have not really
learnt by experience. Each time, it takes me several hours of
debugging time.

* Lua 6.0 will very probably have some kind of mechanism to address
the holes-in-tables problem that haunts the table library. With the
present ipairs semantics, that mechanism may well re-break programs
that have been fixed for Lua 5.3.

The 5.4-work1 approach solved a too-ambitious problem: allowing a
significant nil in any field of a table. To fix the problems that new
users have with the table library, it is only necessary to allow that
in the range 1..#tbl.

An array subtype would solve that problem.

1. Own __len, implicit in the subtype, if you set your own __len it is
no longer an array.
2. ipairs iterates from 1 to __len, table functions use __len.
3. Value returned by __len is fixed when the table is created and
can't be changed by assignment, only by functions in the table library
(insert, remove, setlen).

Reply | Threaded
Open this post in threaded view
|

AW: Array subtype is conceptually simplest

michaelflad
> -----Ursprüngliche Nachricht-----
> Von: [hidden email] <[hidden email]> Im Auftrag
> von Dirk Laurie
> Gesendet: Donnerstag, 18. Oktober 2018 09:42
> An: Lua mailing list <[hidden email]>
> Betreff: Array subtype is conceptually simplest
>
> An array subtype would solve that problem.
>
> 1. Own __len, implicit in the subtype, if you set your own __len it is no longer
> an array.
> 2. ipairs iterates from 1 to __len, table functions use __len.
> 3. Value returned by __len is fixed when the table is created and can't be
> changed by assignment, only by functions in the table library (insert, remove,
> setlen).

In addition, (3) would also allow to only allocate the actual required
amount of memory for the array instead of, worst case, use almost
twice of that.


Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Gé Weijers


On Thu, Oct 18, 2018 at 1:16 AM <[hidden email]> wrote:
In addition, (3) would also allow to only allocate the actual required
amount of memory for the array instead of, worst case, use almost
twice of that.


Doubling the size keeps the amount of work required to append an element to the array part of a table from becoming quadratic. Appending an element takes O(1) works amortized

Creating a set of routines that implement the semantics suggested is not that hard. There are many ways to do it. A suggestion: store the length of an 'array' in a table using weak keys:


local array_length = setmetatable({}, {
    __mode = 'k',
    __index = function(t, a)
        return error("table not found in array_length")
    end,
})

Creating an 'array':

local array_mt = {
    __len = function(t) return array_length[t] end,
}

function new_array(n, ...)
    local a = setmetatable({...}, array_mt)
    array_length[a] = n
    return a
end

A replacement for the iterator:

local function apairs(a)
    local n = #a
    local i = 0
    return function()
        if i < n then
            i = i + 1
            return i, a[i]
        end
        return nil
    end
end


Do we really need to expand the language?


--
--

Reply | Threaded
Open this post in threaded view
|

AW: Array subtype is conceptually simplest

michaelflad

An array that doesn’t automatically but intentionally resize is easily extended

into one, with the automatic/current behaviour, and it needs very little additional

code.

But given the current version, you can’t do it the other way around, you’re

forced to waste the resources while there could be a version where it’s optional

and you, as the developer, should be the one knowing which one better fits your

current needs.

Also besides the waste of memory it also can trigger a series of reallocs while you

initially fill your array.

 

And yes, this is one of the very few parts where I think the language could

and should be extended, as it’s such a fundamental part and in (of course

my personal experience) way more than 50% of the situations where an

array is the data struct of choice, it’s actual (max)size is/can be known upfront.

 

 

Von: [hidden email] <[hidden email]> Im Auftrag von Gé Weijers
Gesendet: Samstag, 20. Oktober 2018 02:18
An: [hidden email]
Betreff: Re: Array subtype is conceptually simplest

 

 

On Thu, Oct 18, 2018 at 1:16 AM <[hidden email]> wrote:

In addition, (3) would also allow to only allocate the actual required
amount of memory for the array instead of, worst case, use almost
twice of that.

 

Doubling the size keeps the amount of work required to append an element to the array part of a table from becoming quadratic. Appending an element takes O(1) works amortized

Reply | Threaded
Open this post in threaded view
|

Re: AW: Array subtype is conceptually simplest

Petri Häkkinen

On 20 Oct 2018, at 9.06, <[hidden email]> <[hidden email]> wrote:

An array that doesn’t automatically but intentionally resize is easily extended

into one, with the automatic/current behaviour, and it needs very little additional

code.

But given the current version, you can’t do it the other way around, you’re

forced to waste the resources while there could be a version where it’s optional

and you, as the developer, should be the one knowing which one better fits your

current needs.

Also besides the waste of memory it also can trigger a series of reallocs while you

initially fill your array.

 

And yes, this is one of the very few parts where I think the language could

and should be extended, as it’s such a fundamental part and in (of course

my personal experience) way more than 50% of the situations where an

array is the data struct of choice, it’s actual (max)size is/can be known upfront.


I also think arrays are so common they could benefit from language level support.

In case you missed it, some time ago I wrote a proof of concept patch for array subtype + new syntax for array construction. The patch outlined one way to implement true arrays without making non-array code run any slower. There’s also some benchmarks which show significant perf improvement over regular Lua tables.

There was some discussion on this list about it, check out the archives if you’re interested.

Write up & code here:

https://github.com/petrihakkinen/lua-array

Cheers,

Petri

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Tim Hill
In reply to this post by Dirk Laurie-2


> On Oct 18, 2018, at 12:41 AM, Dirk Laurie <[hidden email]> wrote:
>
> * Every Lua 5.2 program you ever wrote in which somewhere the argument
> to 'ipairs' is a table with an __index metamethod, has a chance of
> being broken in Lua 5.3, because that metamethod _will_ be invoked in
> a way that probably is not the reason why you supplied it. It has
> happened three times to me, so far, and the way in which the program
> fails has on each occasion been so different that I have not really
> learnt by experience. Each time, it takes me several hours of
> debugging time.
>
> * Lua 6.0 will very probably have some kind of mechanism to address
> the holes-in-tables problem that haunts the table library. With the
> present ipairs semantics, that mechanism may well re-break programs
> that have been fixed for Lua 5.3.
>
> The 5.4-work1 approach solved a too-ambitious problem: allowing a
> significant nil in any field of a table. To fix the problems that new
> users have with the table library, it is only necessary to allow that
> in the range 1..#tbl.
>
> An array subtype would solve that problem.
>
> 1. Own __len, implicit in the subtype, if you set your own __len it is
> no longer an array.
> 2. ipairs iterates from 1 to __len, table functions use __len.
> 3. Value returned by __len is fixed when the table is created and
> can't be changed by assignment, only by functions in the table library
> (insert, remove, setlen).
>

+1 to all this, though I’m not sure why the value of __len must be fixed (nor why the table library is privileged in this regard).

I’d also add a helper function table.setlen() (array.setlen() ???) that sets the length of an array():

t = table.setlen(t, l)
Sets the __len length of table t to l and returns the table. If l is nil or not specified, the length is set to #t. If the table does not have a metatable, an empty one is added automatically(*).

This allows a compact way to create arrays from constructors:

arr = table.setlen({12, 14, 99})

(*) I’m still worried about the expense of a meltable to hold __len. table.setlent() could magically create the necessary metatable if the table does not have one (and would HAVE to if it supports my constructor approach). As Roberto noted, older versions of Lua used a single weak table to hold the lengths of all tables, but this also seems slow and cumbersome. I suggested a dedicated field in the table struct, but others have been concerned this would tax ALL tables.

—Tim




Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Luiz Henrique de Figueiredo
In reply to this post by Dirk Laurie-2
Wouldn't an array *library* be a simpler solution?

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

pocomane
In reply to this post by Dirk Laurie-2
On Thu, Oct 18, 2018 at 9:42 AM Dirk Laurie <[hidden email]> wrote:

>
> An array subtype would solve that problem.
>
> 1. Own __len, implicit in the subtype, if you set your own __len it is
> no longer an array.
> 2. ipairs iterates from 1 to __len, table functions use __len.
> 3. Value returned by __len is fixed when the table is created and
> can't be changed by assignment, only by functions in the table library
> (insert, remove, setlen).
>

Some questions.

- Are you trying to address the problems stated in
http://lua-users.org/lists/lua-l/2018-03/msg00239.html with this
proposal? If yes, can you clarify the  mechanim to solve them?

- Why do you use the "Array subtype" semantic? It seems to me that
your proposal can be applied to ANY table, like a sort of "Protected
.n field". [1]

- At my understanding, your proposal implies that
`atable[1+getlen(atable)] = [[new content]]`, does not allow to
iterate over the 'new contents' in subsequent loops (with iparis or
__len). Is this right?

Note:

[1] I attach a thoughtless-patch that does something similar:
- A length is kept in a new field of the table struct (intialized to 0)
- Len-operator/opcode return it
- The length is updated every time something is written in a positive
integer key (nil also). The maximum index of all the writings is
stored.
- ipairs loops from 1 to this length, also if there are holes
- The table.setlen function can be used to set the value of the new
field. It does not change anything else.
The execution time increases  by ~3% (when updating the len) while the
memory size increases by ~7% (when storing empty tables). I think both
can be improved with more conscientious changes.

Left base folder: C:\WIP\tmp\lua-5.4.0-work2
Right base folder: C:\WIP\tmp\lua-5.4.0-work2-PATCH
--- src\lapi.c 2018-10-23 16:39:34.000000000 +0200
+++ src\lapi.c 2018-10-23 16:39:37.000000000 +0200
@@ -390,13 +390,13 @@
 LUA_API lua_Unsigned lua_rawlen (lua_State *L, int idx) {
   const TValue *o = index2value(L, idx);
   switch (ttypetag(o)) {
     case LUA_TSHRSTR: return tsvalue(o)->shrlen;
     case LUA_TLNGSTR: return tsvalue(o)->u.lnglen;
     case LUA_TUSERDATA: return uvalue(o)->len;
-    case LUA_TTABLE: return luaH_getn(hvalue(o));
+    case LUA_TTABLE: return hvalue(o)->LENPATCH;
     default: return 0;
   }
 }


 LUA_API lua_CFunction lua_tocfunction (lua_State *L, int idx) {
@@ -831,20 +831,20 @@
 LUA_API void lua_rawset (lua_State *L, int idx) {
   Table *t;
   TValue *slot;
   lua_lock(L);
   api_checknelems(L, 2);
   t = gettable(L, idx);
+  if (ttisinteger(s2v(L->top - 2))) LENPATCH_UPDATE(t,
ivalue(s2v(L->top - 2)));
   slot = luaH_set(L, t, s2v(L->top - 2));
   setobj2t(L, slot, s2v(L->top - 1));
   invalidateTMcache(t);
   luaC_barrierback(L, obj2gco(t), s2v(L->top - 1));
   L->top -= 2;
   lua_unlock(L);
 }
-

 LUA_API void lua_rawseti (lua_State *L, int idx, lua_Integer n) {
   Table *t;
   lua_lock(L);
   api_checknelems(L, 1);
   t = gettable(L, idx);
--- src\lbaselib.c 2018-10-23 16:39:34.000000000 +0200
+++ src\lbaselib.c 2018-10-23 16:39:37.000000000 +0200
@@ -260,20 +260,22 @@
     lua_pushvalue(L, 1);  /* argument 'self' to metamethod */
     lua_call(L, 1, 3);  /* get 3 values from metamethod */
   }
   return 3;
 }

+#include "lstate.h"

 /*
 ** Traversal function for 'ipairs'
 */
 static int ipairsaux (lua_State *L) {
   lua_Integer i = luaL_checkinteger(L, 2) + 1;
   lua_pushinteger(L, i);
-  return (lua_geti(L, 1, i) == LUA_TNIL) ? 1 : 2;
+  lua_geti(L, 1, i);
+  return (i > hvalue(s2v(L->ci->func + 1))->LENPATCH ? 1 : 2 );
 }


 /*
 ** 'ipairs' function. Returns 'ipairsaux', given "table", 0.
 ** (The given "table" may not be a table.)
--- src\lobject.h 2018-10-23 16:39:34.000000000 +0200
+++ src\lobject.h 2018-10-23 16:39:37.000000000 +0200
@@ -686,12 +686,13 @@
   unsigned int alimit;  /* "limit" of 'array' array */
   TValue *array;  /* array part */
   Node *node;
   Node *lastfree;  /* any free position is before this position */
   struct Table *metatable;
   GCObject *gclist;
+  unsigned int LENPATCH;
 } Table;


 /*
 ** Macros to manipulate keys inserted in nodes
 */
--- src\ltable.c 2018-10-23 16:39:34.000000000 +0200
+++ src\ltable.c 2018-10-23 16:39:37.000000000 +0200
@@ -580,12 +580,13 @@
   GCObject *o = luaC_newobj(L, LUA_TTABLE, sizeof(Table));
   Table *t = gco2t(o);
   t->metatable = NULL;
   t->flags = cast_byte(~0);
   t->array = NULL;
   t->alimit = 0;
+  t->LENPATCH = 0;
   setnodevector(L, t, 0);
   return t;
 }


 void luaH_free (lua_State *L, Table *t) {
--- src\ltable.h 2018-10-23 16:39:34.000000000 +0200
+++ src\ltable.h 2018-10-23 16:39:37.000000000 +0200
@@ -50,8 +50,9 @@

 #if defined(LUA_DEBUG)
 LUAI_FUNC Node *luaH_mainposition (const Table *t, const TValue *key);
 LUAI_FUNC int luaH_isdummy (const Table *t);
 #endif

+#define LENPATCH_UPDATE(v,k) (v->LENPATCH < k ? v->LENPATCH = k : 0)

 #endif
--- src\ltablib.c 2018-10-23 16:39:34.000000000 +0200
+++ src\ltablib.c 2018-10-23 16:39:37.000000000 +0200
@@ -399,26 +399,38 @@
     lua_settop(L, 2);  /* make sure there are two arguments */
     auxsort(L, 1, (IdxT)n, 0);
   }
   return 0;
 }

+#include "lstate.h"
+
+static int setlen (lua_State *L) {
+  if (!ttistable(s2v(L->ci->func + 1)))
+    return luaL_error(L, "argument #1 must be a table");
+  lua_Integer i = luaL_checkinteger(L, 2);
+  lua_pushinteger(L, i);
+  hvalue(s2v(L->ci->func + 1))->LENPATCH = i;
+  return 0;
+}
+
 /* }====================================================== */


 static const luaL_Reg tab_funcs[] = {
   {"concat", tconcat},
   {"insert", tinsert},
   {"pack", tpack},
   {"unpack", tunpack},
   {"remove", tremove},
   {"move", tmove},
   {"sort", sort},
+  {"setlen", setlen},
   {NULL, NULL}
 };


 LUAMOD_API int luaopen_table (lua_State *L) {
   luaL_newlib(L, tab_funcs);
   return 1;
 }

--- src\lvm.c 2018-10-23 16:39:34.000000000 +0200
+++ src\lvm.c 2018-10-23 16:39:37.000000000 +0200
@@ -589,13 +589,13 @@
   const TValue *tm;
   switch (ttypetag(rb)) {
     case LUA_TTABLE: {
       Table *h = hvalue(rb);
       tm = fasttm(L, h->metatable, TM_LEN);
       if (tm) break;  /* metamethod? break switch to call it */
-      setivalue(s2v(ra), luaH_getn(h));  /* else primitive len */
+      setivalue(s2v(ra), h->LENPATCH);  /* else primitive len */
       return;
     }
     case LUA_TSHRSTR: {
       setivalue(s2v(ra), tsvalue(rb)->shrlen);
       return;
     }
@@ -876,13 +876,12 @@
 }

 #define vmdispatch(o) switch(o)
 #define vmcase(l) case l:
 #define vmbreak break

-
 void luaV_execute (lua_State *L, CallInfo *ci) {
   LClosure *cl;
   TValue *k;
   StkId base;
   const Instruction *pc;
   int trap;
@@ -1034,12 +1033,13 @@
             ? (n = ivalue(rb), luaV_fastgeti(L, s2v(ra), n, slot))
             : luaV_fastget(L, s2v(ra), rb, slot, luaH_get)) {
           luaV_finishfastset(L, s2v(ra), slot, rc);
         }
         else
           Protect(luaV_finishset(L, s2v(ra), rb, rc, slot));
+        if (ttisinteger(rb)) LENPATCH_UPDATE(hvalue(s2v(ra)), ivalue(rb));
         vmbreak;
       }
       vmcase(OP_SETI) {
         const TValue *slot;
         int c = GETARG_B(i);
         TValue *rc = RKC(i);
@@ -1048,12 +1048,13 @@
         }
         else {
           TValue key;
           setivalue(&key, c);
           Protect(luaV_finishset(L, s2v(ra), &key, rc, slot));
         }
+        LENPATCH_UPDATE(hvalue(s2v(ra)), c);
         vmbreak;
       }
       vmcase(OP_SETFIELD) {
         const TValue *slot;
         TValue *rb = KB(i);
         TValue *rc = RKC(i);
@@ -1770,12 +1771,13 @@
           luaH_resizearray(L, h, last);  /* preallocate it at once */
         for (; n > 0; n--) {
           TValue *val = s2v(ra + n);
           setobj2t(L, &h->array[last - 1], val);
           last--;
+          LENPATCH_UPDATE(h, h->LENPATCH + 1);
           luaC_barrierback(L, obj2gco(h), val);
         }
         vmbreak;
       }
       vmcase(OP_CLOSURE) {
         Proto *p = cl->p->p[GETARG_Bx(i)];
         LClosure *ncl = getcached(p, cl->upvals, base);  /* cached closure */

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Dirk Laurie-2
In reply to this post by Luiz Henrique de Figueiredo
Op Di., 23 Okt. 2018 om 17:56 het Luiz Henrique de Figueiredo
<[hidden email]> geskryf:
>
> Wouldn't an array *library* be a simpler solution?
>

Since you quote no previous post, I assume that you are replying to my
original post, which started:

> Every Lua 5.2 program you ever wrote in which somewhere the argument
> to 'ipairs' is a table with an __index metamethod, has a chance of
> being broken in Lua 5.3, because that metamethod _will_ be invoked in
> a way that probably is not the reason why you supplied it.

So no, an array library would not be a solution at all, unless
accompanied by an "apairs" function in the standard library.

Up to Lua 5.2, 'ipairs' gave you a subset of what 'pairs' gave you.
The simultaneous decision to make it possible for 'ipairs' to deliver
a pair that 'pairs' can't see, and to abolish the __ipairs metamethod
that could have avoided that possibility, was a breaking change.
Linters should warn you against "ipairs" (maybe some of them already
do?)

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Pierre Chapuis
On Wed, Oct 24, 2018, at 09:53, Dirk Laurie wrote:
> Op Di., 23 Okt. 2018 om 17:56 het Luiz Henrique de Figueiredo
> <[hidden email]> geskryf:
> >
> > Wouldn't an array *library* be a simpler solution?

> So no, an array library would not be a solution at all, unless
> accompanied by an "apairs" function in the standard library.

If the array library is a standard library then such a function could
be included in it (`array.apairs(my_array)`, maybe with a method as
well: `my_array:apairs()`).

To add my (very simple) opinion, I don't care about the features at
all, as long as there is a *standard* way to detect that something
should be treated as a sequence / array / whatever, so we can all
stop using weird codebase-specific tricks when serializing / deserializing
data to / from any format that looks like JSON.

--
Pierre Chapuis

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Sean Conner
It was thus said that the Great Pierre Chapuis once stated:

> On Wed, Oct 24, 2018, at 09:53, Dirk Laurie wrote:
> > Op Di., 23 Okt. 2018 om 17:56 het Luiz Henrique de Figueiredo
> > <[hidden email]> geskryf:
> > >
> > > Wouldn't an array *library* be a simpler solution?
>
> > So no, an array library would not be a solution at all, unless
> > accompanied by an "apairs" function in the standard library.
>
> If the array library is a standard library then such a function could
> be included in it (`array.apairs(my_array)`, maybe with a method as
> well: `my_array:apairs()`).
>
> To add my (very simple) opinion, I don't care about the features at
> all, as long as there is a *standard* way to detect that something
> should be treated as a sequence / array / whatever, so we can all
> stop using weird codebase-specific tricks when serializing / deserializing
> data to / from any format that looks like JSON.

  Here's how I handle that situation in my CBOR library [1]:

        * If the table has a defined length (that is, #t > 0) it is treated
          as a sequence and stored as a CBOR ARRAY.

        * Otherwise, it is treated as an assiciative array and stored as a
          CBOR MAP

  The user can override this by setting a __tocbor metamethod on the table
and explicitely encoding the table as a CBOR ARRAY or MAP (by calling the
appropriate method directly).  That was the most straightforward method I
found for handling this.

  -spc (The __tocbor is also honored on any userdata passed to it)

[1] https://github.com/spc476/CBOR
        Also available via LuaRocks as org.conman.cbor

Reply | Threaded
Open this post in threaded view
|

Re: Array subtype is conceptually simplest

Pierre Chapuis
On Fri, Nov 2, 2018, at 21:15, Sean Conner wrote:
>   Here's how I handle that situation in my CBOR library [1]:

Sure, there are lots of ways to deal with it. The problem is
they're not obvious and depend on the library.