Re: New array type? (was: 'table' as fallback for tables)

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

William Ahern
On Thu, Jul 07, 2016 at 05:01:09PM -0700, Ross Berteig wrote:

> On 7/7/2016 4:05 PM, William Ahern wrote:
> >On Fri, Jul 08, 2016 at 12:36:30AM +0200, Dirk Laurie wrote:
> >>Suppose that t represents the rooms in an hotel. Guests arrive
> >>and leave. You evaluate #t. It tells you where to find an empty
> >>room. You put a guest in that room. Next guest comes. Again
> >>#t finds an empty room. Why is it bad if that room happens to
> >>have a lower number? But it would be bad if the guarantee
> >>that t[#t+1] is empty did not hold.
> >
> >So you've put them in the room. Then what? You can't find them again. You
> >can't do `for i=1,#t do ... end` and expect to find them.
>
> But what if you don't need to remember. In the hotel case, once they have a
> key and are in a room, for the most part the rest of the hotel only cares if
> that room is occupied (clean it during the day, don't assign another guest
> to it, etc.) or not.
>
> Besides, the hotel is itself a fixed size array of floors (often with 13
> missing aka nil) each containing a fixed array of rooms. Extending a hotel
> is a big deal, involving lots of money and other resources, and mediated by
> building permits and inspections.

But the hotel in this case isn't fixed sized, and least not in a way that
you can reliably depend on. You can't use #t to find the size; nor to find
the last occupied room; nor even to reliably find _any_ vacant room within
some fixed size, even where more than half the rooms are empty.

When you check them into the hotel, #t might 100. Immediately after checking
them into room 101, #t might now evaluate to 50 because the insertion caused
a rehash, and the first found frontier might shift backward. So how do you
know to clean rooms 1..101 and not 1..10000 or 1..1000000? You'd have to
keep some accounting on the side to know which rooms are occupied, how many,
etc. But it would be completely superfluous. If you have that accounting,
there are simpler alternatives and so there's little utility in relying
alone on t[#t+1] being an empty slot. The even simpler solution, as I said,
if you just want a place to house an object is by doing t[o] = true.

The only way to keep an array compact when removing arbitrary indexes is to
pop the last index and use it to fill the hole. For insertion you always
insert at the end, in which case you're relying on the array being a proper
sequence. You don't even need auxiliary accounting for that.

> So iterating over all rooms is already an O(n) task, and normally doesn't
> stumble over the missing 13th floor.
>
> >You could remember the index somewhere, but then you've only created a level
> >of indirection. Wherever you store the index you could have kept the object
> >itself. You if you wanted to anchor the object in memory, it would be
> >simpler and more performant to store it as the key.
> >
> >Hmmmm... I guess one use would be for implementating luaL_ref. I'm not sure
> >how useful that would be in Lua code. Maybe I just need to think about it
> >longer.
>
> IIRC, that is exactly how luaL_ref() was implemented. At least in a
> high-order pseudo-code description. The integer returned is used to look up
> the saved object, and needed to be something (like an integer) that would be
> easy to return to C code as plain data. Using #t+1 to get it is cheap, easy,
> and reliable. And looking it up later is also cheap and easy.
>

Right, and that makes sense in C code where you use the integer index to
indirectly reference another object. But the reason that's an elegant
solution is because C can't use a reference to the Lua object directly; the
Lua stack API doesn't permit that, and even if so the garbage collector
couldn't trace such a reference for liveness detection. luaL_ref is an
out-of-the-box solution to anchoring for use by C code, because C code has
unique constraints.

I can't think of a use case where that makes sense in Lua code. You don't
have the same kind of anchoring problems because in Lua code you don't need
such indirection. The only similar situation would be caching or
memoization, but I can't think of a scenario where relying on the #t+1
guarantee alone is either more concise or more performant.

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Dirk Laurie-2
2016-07-08 4:01 GMT+02:00 William Ahern <[hidden email]>:

> I can't think of a scenario where relying on the #t+1
> guarantee alone is either more concise or more performant.

Suppose you have coroutines 'producer' and 'consumer' accessing
the same table. 'consumer' has some capricious way of choosing
an item and removing it. 'producer' has no way of knowing what
'consumer' has been doing and now needs to put a new item in.

    t[#t+1] = newitem

seems to me to be unbeatably concise, and needs only O(log n)
time, but with the present Lua documentation is unsound code.
You can make it sound by sacrificing both conciseness and
performance:

   local k=#t+1; while t[k] do k=k+1 end; t[k]=newitem

which may take O(n) time.

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

steve donovan
On Fri, Jul 8, 2016 at 9:07 AM, Dirk Laurie <[hidden email]> wrote:
>     t[#t+1] = newitem
>
> seems to me to be unbeatably concise, and needs only O(log n)
> time, but with the present Lua documentation is unsound code.

Hm, did not Roberto say that this should always work? Or is the joker
__len? Surely any sensible person would implement __len so that this
still works, or throw an error (fixed size).

(I remember when this was considered not unbeatably concise enough, so
e.g. the 't[#] = newitem' proposal)

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Thomas Jericke
In reply to this post by Dirk Laurie-2
On 07/08/2016 09:07 AM, Dirk Laurie wrote:

> 2016-07-08 4:01 GMT+02:00 William Ahern <[hidden email]>:
>
>> I can't think of a scenario where relying on the #t+1
>> guarantee alone is either more concise or more performant.
> Suppose you have coroutines 'producer' and 'consumer' accessing
> the same table. 'consumer' has some capricious way of choosing
> an item and removing it. 'producer' has no way of knowing what
> 'consumer' has been doing and now needs to put a new item in.
>
>      t[#t+1] = newitem
>
> seems to me to be unbeatably concise, and needs only O(log n)
> time, but with the present Lua documentation is unsound code.
> You can make it sound by sacrificing both conciseness and
> performance:
>
>     local k=#t+1; while t[k] do k=k+1 end; t[k]=newitem
>
> which may take O(n) time.
>
Why not use a binary search in your example code?

Also, while I think your example is legit, I don't think it's such a
common use case that it deserves its own operator.
The current implementation of # could easily just be moved to a function
of the table library.
--
Thomas

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Dirk Laurie-2
2016-07-08 11:51 GMT+02:00 Thomas Jericke <[hidden email]>:

> On 07/08/2016 09:07 AM, Dirk Laurie wrote:
>>
>> 2016-07-08 4:01 GMT+02:00 William Ahern <[hidden email]>:
>>
>>> I can't think of a scenario where relying on the #t+1
>>> guarantee alone is either more concise or more performant.
>>
>> Suppose you have coroutines 'producer' and 'consumer' accessing
>> the same table. 'consumer' has some capricious way of choosing
>> an item and removing it. 'producer' has no way of knowing what
>> 'consumer' has been doing and now needs to put a new item in.
>>
>>      t[#t+1] = newitem
>>
>> seems to me to be unbeatably concise, and needs only O(log n)
>> time, but with the present Lua documentation is unsound code.
>> You can make it sound by sacrificing both conciseness and
>> performance:
>>
>>     local k=#t+1; while t[k] do k=k+1 end; t[k]=newitem
>>
>> which may take O(n) time.
>>

> Why not use a binary search in your example code?

Actually, #t does.

> Also, while I think your example is legit, I don't think it's such a common
> use case that it deserves its own operator.

It happens to _have_ its own operator, but that feature is no
longer documented.

> The current implementation of # could easily just be moved to
> a function of the table library.


Darn! These long threads let things slip under the carpet.

About 30 messages ago, I posted this:

~~~
1. Make the intuitive Lua 5.0 behaviour (to use the largest numeric
key of tbl as #tbl) the default. No presently documented property of
the # operator is violated.

2. Instead of 'n', which offends some people, use a secret key
which a modified 'next' (and therefore 'pairs') will not show.

3. New table functions:

table.frontier(tbl[,k])  Returns the k-th frontier of tbl, that is,
an integer n>=0 such that t[n+1] is nil but either n=0 or t[n]
is not nil. If k is omitted, any frontier of t may be returned.

table.getlength(tbl)  Returns the value, if any, to be returned
by #tbl when no __len metamethod has been specified.

table.setlength(tbl,n) Sets n as the value to be returned by
#tbl when no __len metamethod has been specified.

The current O(log n) but nondeterministic #t is obtained by

   setmetatable(tbl,table.frontier)
~~~

So I agree with you on this point :-)

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Soni "They/Them" L.
In reply to this post by Thomas Jericke


On 08/07/16 06:51 AM, Thomas Jericke wrote:
>
> Also, while I think your example is legit, I don't think it's such a
> common use case that it deserves its own operator.
> The current implementation of # could easily just be moved to a
> function of the table library.

#"a string"

vs

table.len("a string")

> --
> Thomas
>

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Thomas Jericke
In reply to this post by Dirk Laurie-2

-----Original Message-----

> From: "Dirk Laurie" <[hidden email]>
> To: "Lua mailing list" <[hidden email]>
> Date: 08-07-2016 13:22
> Subject: Re: New array type? (was: 'table' as fallback for tables)
>
> 2016-07-08 11:51 GMT+02:00 Thomas Jericke <[hidden email]>:
> > On 07/08/2016 09:07 AM, Dirk Laurie wrote:
> >>
> >> 2016-07-08 4:01 GMT+02:00 William Ahern <[hidden email]>:
> >>
> >>> I can't think of a scenario where relying on the #t+1
> >>> guarantee alone is either more concise or more performant.
> >>
> >> Suppose you have coroutines 'producer' and 'consumer' accessing
> >> the same table. 'consumer' has some capricious way of choosing
> >> an item and removing it. 'producer' has no way of knowing what
> >> 'consumer' has been doing and now needs to put a new item in.
> >>
> >>      t[#t+1] = newitem
> >>
> >> seems to me to be unbeatably concise, and needs only O(log n)
> >> time, but with the present Lua documentation is unsound code.
> >> You can make it sound by sacrificing both conciseness and
> >> performance:
> >>
> >>     local k=#t+1; while t[k] do k=k+1 end; t[k]=newitem
> >>
> >> which may take O(n) time.
> >>
>
> > Why not use a binary search in your example code?
>
> Actually, #t does.

I am not talking about # but about your example code:

local k=#t+1; while t[k] do k=k+1 end; t[k]=newitem

could be replaced by

local k=#t+1
while t[k] ~= nil do k=k*2 end
local i = k/2
while k - i > 1 do
  local m = (k+i)//2
  if t[m] == nil then k = m else i = m
end
t[k+1]=newitem

This is still O(log n)

(It may has typos in this code)

>
> > Also, while I think your example is legit, I don't think it's such a common
> > use case that it deserves its own operator.
>
> It happens to _have_ its own operator, but that feature is no
> longer documented.

I thought we are talking about hypothetical future Lua versions here.

>
> > The current implementation of # could easily just be moved to
> > a function of the table library.
>
>
> Darn! These long threads let things slip under the carpet.
>

It is really too large for me to remember every post, sorry.

> About 30 messages ago, I posted this:
>
> ~~~
> 1. Make the intuitive Lua 5.0 behaviour (to use the largest numeric
> key of tbl as #tbl) the default. No presently documented property of
> the # operator is violated.
>
> 2. Instead of 'n', which offends some people, use a secret key
> which a modified 'next' (and therefore 'pairs') will not show.
>
> 3. New table functions:
>
> table.frontier(tbl[,k])  Returns the k-th frontier of tbl, that is,
> an integer n>=0 such that t[n+1] is nil but either n=0 or t[n]
> is not nil. If k is omitted, any frontier of t may be returned.
>
> table.getlength(tbl)  Returns the value, if any, to be returned
> by #tbl when no __len metamethod has been specified.
>
> table.setlength(tbl,n) Sets n as the value to be returned by
> #tbl when no __len metamethod has been specified.
>
> The current O(log n) but nondeterministic #t is obtained by
>
>    setmetatable(tbl,table.frontier)
> ~~~
>
> So I agree with you on this point :-)

You even have a nice name for it :-)

--
Thomas



Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Thomas Jericke
In reply to this post by Soni "They/Them" L.
-----Original Message-----

> From: "Soni L." <[hidden email]>
> To: "Lua mailing list" <[hidden email]>
> Date: 08-07-2016 16:43
> Subject: Re: New array type? (was: 'table' as fallback for tables)
>
> On 08/07/16 06:51 AM, Thomas Jericke wrote:
> >
> > Also, while I think your example is legit, I don't think it's such a
> > common use case that it deserves its own operator.
> > The current implementation of # could easily just be moved to a
> > function of the table library.
>
> #"a string"
>
> vs
>
> table.len("a string")
>
> > --
> > Thomas
> >
>
> --
> Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.

I really don't know what you try to say to me :-/
--
Thomas



Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Sean Conner
It was thus said that the Great Thomas Jericke once stated:

> > From: "Soni L." <[hidden email]>
> > On 08/07/16 06:51 AM, Thomas Jericke wrote:
> > >
> > > The current implementation of # could easily just be moved to a
> > > function of the table library.
> >
> > #"a string"
> >
> > vs
> >
> > table.len("a string")
> >
> I really don't know what you try to say to me :-/

  I think Soni is concerned that you were trying to remove the '#' operator
entirely.  

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Thomas Jericke
-----Original Message-----

> From: "Sean Conner" <[hidden email]>
> To: "Lua mailing list" <[hidden email]>
> Date: 09-07-2016 11:17
> Subject: Re: New array type? (was: 'table' as fallback for tables)
>
> It was thus said that the Great Thomas Jericke once stated:
> > > From: "Soni L." <[hidden email]>
> > > On 08/07/16 06:51 AM, Thomas Jericke wrote:
> > > >
> > > > The current implementation of # could easily just be moved to a
> > > > function of the table library.
> > >
> > > #"a string"
> > >
> > > vs
> > >
> > > table.len("a string")
> > >
> > I really don't know what you try to say to me :-/
>
>   I think Soni is concerned that you were trying to remove the '#' operator
> entirely.  
>
>   -spc
>

If that is the case, no, I am not trying to do that.

--
Thomas

 


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Roberto Ierusalimschy
In reply to this post by Javier Guerra Giraldez
> On 7 July 2016 at 19:20, Roberto Ierusalimschy <[hidden email]> wrote:
> > At least the implementation still guarantees that; it is only a
> > documentation matter...
>
>
> BTW, why was that guarantee removed?

There was a thought that explaining the guarantee made the '#' still
more difficult to be understood. It would be easier to simply say "it
does not work with holes" than trying to explain that it did work with
holes, but not exactly in the way you might expect. Currently, I think
that those that do not understand '#' (and most others) do not bother to
read the manual, so what we say would make no difference to them, but
the guarantee could be useful for those that read the manual.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Dirk Laurie-2
2016-07-11 21:43 GMT+02:00 Roberto Ierusalimschy <[hidden email]>:

>> On 7 July 2016 at 19:20, Roberto Ierusalimschy <[hidden email]> wrote:
>> > At least the implementation still guarantees that; it is only a
>> > documentation matter...
>>
>>
>> BTW, why was that guarantee removed?
>
> There was a thought that explaining the guarantee made the '#' still
> more difficult to be understood. It would be easier to simply say "it
> does not work with holes" than trying to explain that it did work with
> holes, but not exactly in the way you might expect. Currently, I think
> that those that do not understand '#' (and most others) do not bother to
> read the manual, so what we say would make no difference to them, but
> the guarantee could be useful for those that read the manual.

Since we already have an explanation of the concept "frontier" for
a string in §6.4.1, it would be conceptually easiest to make the
present algorithm available under the name table.frontier. Then the
explanation of #t would merely be "If t has no __len metamethod,
#t is the result of the function also supplied as table.frontier(t)."

The documentation of table.frontier is the right place to explain the
subtleties of the procedure. Not only is the language description
kept uncluttered but the table library is the one place where people
really need to reminded about it.

Ir is of course not quite correct to say that t[#t] is not nil but t[#t+1]
is. The truth is that rawget(t,#t) is not nil but rawget(t,#t+1) is.

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Roberto Ierusalimschy
> Ir is of course not quite correct to say that t[#t] is not nil but t[#t+1]
> is. The truth is that rawget(t,#t) is not nil but rawget(t,#t+1) is.

Not exactly. It is easy to redefine 'rawget' :-)  (If the idea is to
be pedantic...)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Rena
In reply to this post by Dirk Laurie-2

On Jul 12, 2016 12:29 AM, "Dirk Laurie" <[hidden email]> wrote:
>
> 2016-07-11 21:43 GMT+02:00 Roberto Ierusalimschy <[hidden email]>:
> >> On 7 July 2016 at 19:20, Roberto Ierusalimschy <[hidden email]> wrote:
> >> > At least the implementation still guarantees that; it is only a
> >> > documentation matter...
> >>
> >>
> >> BTW, why was that guarantee removed?
> >
> > There was a thought that explaining the guarantee made the '#' still
> > more difficult to be understood. It would be easier to simply say "it
> > does not work with holes" than trying to explain that it did work with
> > holes, but not exactly in the way you might expect. Currently, I think
> > that those that do not understand '#' (and most others) do not bother to
> > read the manual, so what we say would make no difference to them, but
> > the guarantee could be useful for those that read the manual.
>
> Since we already have an explanation of the concept "frontier" for
> a string in §6.4.1, it would be conceptually easiest to make the
> present algorithm available under the name table.frontier. Then the
> explanation of #t would merely be "If t has no __len metamethod,
> #t is the result of the function also supplied as table.frontier(t)."
>
> The documentation of table.frontier is the right place to explain the
> subtleties of the procedure. Not only is the language description
> kept uncluttered but the table library is the one place where people
> really need to reminded about it.
>
> Ir is of course not quite correct to say that t[#t] is not nil but t[#t+1]
> is. The truth is that rawget(t,#t) is not nil but rawget(t,#t+1) is.
>

Unless t is empty.

1234