Re: New array type? (was: 'table' as fallback for tables)

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Peter Aronoff
"Soni L." <[hidden email]> wrote:
> You need to think of Lua tables as C strings.

Two comments:

1. I have no idea at this point if this is sarcasm or not. But either way,
   isn’t this tantamount to saying that they are a mistake in the
   language’s design?
2. Your reply style (quote everything and add your replies immediately
   below whatever you’re replying to without any extra space) is pretty
   painful to read. Would you consider adding in some linebreaks, please?
   (As an example, I had to scan this reply three or four times before
   I could find your reply. Trimming down what you’re quoting to the
   relevant bit and adding some whitespace would go a long way.) Thanks for
   considering it.

P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Soni "They/Them" L.


On 07/07/16 09:22 AM, Peter Aronoff wrote:
> "Soni L." <[hidden email]> wrote:
>> You need to think of Lua tables as C strings.
> Two comments:
>
> 1. I have no idea at this point if this is sarcasm or not. But either way,
>     isn’t this tantamount to saying that they are a mistake in the
>     language’s design?

I don't consider C to be a mistake, no. C strings and Lua tables are
both NUL/nil-terminated sequences. The main difference is that the
algorithm for Lua table length is O(log n) depending on the
implementation, while the one for C string length is always O(n).

C strings are good for iteration, but poor for length calculation.
They're also very efficient in terms of space, however this doesn't
apply to Lua tables.

Unlike C strings, Lua tables are a general-purpose mutable data
structure, and this is where nil-terminated really shines. Lua tables
are efficient whether you use them as arrays or as maps, this wouldn't
be possible if they weren't nil-terminated. If they weren't nil-terminated:

- #t would return a number directly. This is the only good point.
- t[#t+1]=v would:
     - check if the key is an integer (1 op)
     - check if the value is not nil (1 op)
     - check if the key is greater than the current length (2 ops, as it
has to retrieve the current length)
     - set the length to the value of the key (1 op)
- t[#t]=nil would:
     - check if the key is an integer (1 op)
     - check if the value is nil (1 op)
     - check if the key is equal to the current length (2 ops)
     - check if how much before the key is empty (O(n))
         - imagine setting t[#t-10] to t[#t-1] to nil, then setting
t[#t] to nil.
     - set the length to the index of the last nonnil followed by nil (1
op, we find this in the previous step)
- t[huge_number]=v would potentially cause Lua to run out of memory.
- table.remove and table.insert would have to do all checks t[#t+1]=v
and t[#t]=nil have to do, as well as being O(n) by themselves, as they
have to shift elements around.

Alternatively, length could be fixed, having to be manually set. That
would be even less intuitive than what we currently have.

Overall, this would either cause a huge slowdown in the general case for
little to no benefit, or be more of a pain than what we currently have.
What we currently have is a pretty good compromise between intuitiveness
and efficiency.

> 2. Your reply style (quote everything and add your replies immediately
>     below whatever you’re replying to without any extra space) is pretty
>     painful to read. Would you consider adding in some linebreaks, please?
>     (As an example, I had to scan this reply three or four times before
>     I could find your reply. Trimming down what you’re quoting to the
>     relevant bit and adding some whitespace would go a long way.) Thanks for
>     considering it.

Sorry about that.

>
> P

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Peter Aronoff
"Soni L." <[hidden email]> wrote:
> I don't consider C to be a mistake, no.

Thanks for your detailed reply here. I had in mind general complaints about
how difficult C strings have turned out to be in terms of use and security,
but I take your point that they do several things well—especially if we’re
concerned with efficiency. Similarly for Lua tables.  (For whatever it is
worth, I’ve been bit a few times by accidental nils, but for the most part
I’ve found it to be fine. It’s something that I have to keep in mind, but
every language has those areas.)

> > 2. Your reply style (quote everything and add your replies immediately
> >     below whatever you’re replying to without any extra space) is pretty
> >     painful to read.
>
> Sorry about that.

Thank you so much for taking my request in the spirit it was offered.
I really appreciate it.

Best, P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Sean Conner
In reply to this post by Dibyendu Majumdar
It was thus said that the Great Dibyendu Majumdar once stated:
>
> Perhaps the easy solution might be to only support fixed length arrays,
> and thereby allow NILs as valid values.

  For me, would elminate one of the reasons I like using Lua.  This is a
step back towards C, and if I want to program in C, I know where to find it.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Joseph Manning
In reply to this post by steve donovan
On 2016-Jul-07 (Thu) at 09:08 (+0200), steve donovan wrote:

>> It would be cool if #t was O(1), however.

Steve,

   In theory, yes indeed, this would be nice and neat and clean.

   In practice, however, computing #t is blazingly fast.  In tests
   that I did some time ago ( Lua 5.2 ), with t being a sequence of
   10,000 elements, #t was computed 1,000,000 time in just 0.1 seconds.

   And that was on my little 2008-era netbook.

   Fast enough for me :-)

Joseph

------------------------------------------------------------------------
Joseph Manning / Computer Science / UCC Cork Ireland / [hidden email]
------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Soni "They/Them" L.


On 07/07/16 01:15 PM, Joseph Manning wrote:

> On 2016-Jul-07 (Thu) at 09:08 (+0200), steve donovan wrote:
>
>>> It would be cool if #t was O(1), however.
> Steve,
>
>     In theory, yes indeed, this would be nice and neat and clean.
>
>     In practice, however, computing #t is blazingly fast.  In tests
>     that I did some time ago ( Lua 5.2 ), with t being a sequence of
>     10,000 elements, #t was computed 1,000,000 time in just 0.1 seconds.
>
>     And that was on my little 2008-era netbook.
>
>     Fast enough for me :-)
>
> Joseph
>
> ------------------------------------------------------------------------
> Joseph Manning / Computer Science / UCC Cork Ireland / [hidden email]
> ------------------------------------------------------------------------
>
I don't think ppl see the true power of logn.
To put it simply, I don't think ppl see that log2(10000) = 14-ish. (14
ops for 10000 elements)

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Roberto Ierusalimschy
In reply to this post by Peter Aronoff
> Thanks for your detailed reply here. I had in mind general complaints about
> how difficult C strings have turned out to be in terms of use and security,
> but I take your point that they do several things well—especially if we’re
> concerned with efficiency. [...]

It is very easy to find several flaws in C strings, until you consider
all the alternatives.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Philipp Janda
Am 07.07.2016 um 19:11 schröbte Roberto Ierusalimschy:
>> Thanks for your detailed reply here. I had in mind general complaints about
>> how difficult C strings have turned out to be in terms of use and security,
>> but I take your point that they do several things well—especially if we’re
>> concerned with efficiency. [...]
>
> It is very easy to find several flaws in C strings, until you consider
> all the alternatives.

Pretty much every programming language since C has done a better job
about strings. One obvious example: C++ strings, which can grow,
calculate the size in O(1), and contain NUL characters. You can even
have bounds checking on access ...

Am I the only one who has re-implemented something like C++ strings in C
for storing binary data or for building strings piecemeal?

Anyway, even Lua doesn't use C-style strings for strings.

>
> -- Roberto
>

Philipp




Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Dirk Laurie-2
In reply to this post by Joseph Manning
2016-07-07 18:15 GMT+02:00 Joseph Manning <[hidden email]>:
> On 2016-Jul-07 (Thu) at 09:08 (+0200), steve donovan wrote:
>
>>> It would be cool if #t was O(1), however.
>
> Steve,
>
>    In theory, yes indeed, this would be nice and neat and clean.
>
>    In practice, however, computing #t is blazingly fast.

The problem that people have with # is not its speed, it is
the non-determinism. I.e. fixed size is conceptually easy,
the first frontier is easy, the largest numeric index is easy,
but some unpredicatable unspecified frontier is sophisticated.

The current equivocation, that # is undefined when the table
is not a sequence, sells it short. The property documented
in the Lua 5.1 manual, that t[#t+1] is guaranteed to be an
empty slot, is a useful one, no matter how many newbies
are puzzled by it.

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Soni "They/Them" L.
In reply to this post by Philipp Janda


On 07/07/16 03:00 PM, Philipp Janda wrote:

> Am 07.07.2016 um 19:11 schröbte Roberto Ierusalimschy:
>>> Thanks for your detailed reply here. I had in mind general
>>> complaints about
>>> how difficult C strings have turned out to be in terms of use and
>>> security,
>>> but I take your point that they do several things well—especially if
>>> we’re
>>> concerned with efficiency. [...]
>>
>> It is very easy to find several flaws in C strings, until you consider
>> all the alternatives.
>
> Pretty much every programming language since C has done a better job
> about strings. One obvious example: C++ strings, which can grow,
> calculate the size in O(1), and contain NUL characters. You can even
> have bounds checking on access ...

And cannot be handed over to the OS without first being validated.
Remember exploits involving java and NUL and filenames?

>
> Am I the only one who has re-implemented something like C++ strings in
> C for storing binary data or for building strings piecemeal?
>
> Anyway, even Lua doesn't use C-style strings for strings.
>
>>
>> -- Roberto
>>
>
> Philipp
>
>
>
>

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Roberto Ierusalimschy
In reply to this post by Dirk Laurie-2
> The current equivocation, that # is undefined when the table
> is not a sequence, sells it short. The property documented
> in the Lua 5.1 manual, that t[#t+1] is guaranteed to be an
> empty slot, is a useful one, no matter how many newbies
> are puzzled by it.

At least the implementation still guarantees that; it is only a
documentation matter...

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Javier Guerra Giraldez
On 7 July 2016 at 19:20, Roberto Ierusalimschy <[hidden email]> wrote:
> At least the implementation still guarantees that; it is only a
> documentation matter...


BTW, why was that guarantee removed?

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Coda Highland
On Thu, Jul 7, 2016 at 11:22 AM, Javier Guerra Giraldez
<[hidden email]> wrote:
> On 7 July 2016 at 19:20, Roberto Ierusalimschy <[hidden email]> wrote:
>> At least the implementation still guarantees that; it is only a
>> documentation matter...
>
>
> BTW, why was that guarantee removed?

Probably to allow __len to override it.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Peter Aronoff
In reply to this post by Philipp Janda
Philipp Janda <[hidden email]> wrote:
> Am I the only one who has re-implemented something like C++ strings in
> C for storing binary data or for building strings piecemeal?

No, I’ve done it as well, and there are countless C string libraries for
this purpose. I wasn’t going to argue with Roberto, but I’m glad you said
something.

P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Philipp Janda
In reply to this post by Soni "They/Them" L.
Am 07.07.2016 um 20:13 schröbte Soni L.:
>
> On 07/07/16 03:00 PM, Philipp Janda wrote:
>>
>> Pretty much every programming language since C has done a better job
>> about strings. One obvious example: C++ strings, which can grow,
>> calculate the size in O(1), and contain NUL characters. You can even
>> have bounds checking on access ...
>
> And cannot be handed over to the OS without first being validated.

How do you hand over an `std::string` to the OS?

> Remember exploits involving java and NUL and filenames?

Not that particular one, but there have been others e.g. in Browsers,
where the displayed URL wasn't the real one, or in certificates, where
the CA didn't check the owner of the real domain ...


Philipp



Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

William Ahern
In reply to this post by Dirk Laurie-2
On Thu, Jul 07, 2016 at 08:06:20PM +0200, Dirk Laurie wrote:

> 2016-07-07 18:15 GMT+02:00 Joseph Manning <[hidden email]>:
> > On 2016-Jul-07 (Thu) at 09:08 (+0200), steve donovan wrote:
> >
> >>> It would be cool if #t was O(1), however.
> >
> > Steve,
> >
> >    In theory, yes indeed, this would be nice and neat and clean.
> >
> >    In practice, however, computing #t is blazingly fast.
>
> The problem that people have with # is not its speed, it is
> the non-determinism. I.e. fixed size is conceptually easy,
> the first frontier is easy, the largest numeric index is easy,
> but some unpredicatable unspecified frontier is sophisticated.

How did "largest numeric index" work when deleting the last index, like when
popping a stack data structure. Did you also assign nil to erase it? If so,
isn't that an inconsistency--assigning nil inside of an array is okay, but
assigning nil to the last index has special truncation behavior? That seems
worse than the current behavior. (I don't mean to imply I have any problems
with the current semantics.)

> The current equivocation, that # is undefined when the table
> is not a sequence, sells it short. The property documented
> in the Lua 5.1 manual, that t[#t+1] is guaranteed to be an
> empty slot, is a useful one, no matter how many newbies
> are puzzled by it.

But for a non-sequence, #t might evaluate to a lesser index after executing
t[#t + 1] = true. What use is the guarantee that t[#t + 1] is empty if you
lose track of that value after the fact?

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

William Ahern
In reply to this post by Soni "They/Them" L.
On Thu, Jul 07, 2016 at 01:21:46PM -0300, Soni L. wrote:

> On 07/07/16 01:15 PM, Joseph Manning wrote:
> >On 2016-Jul-07 (Thu) at 09:08 (+0200), steve donovan wrote:
> >
> >>>It would be cool if #t was O(1), however.
> >Steve,
> >
> >    In theory, yes indeed, this would be nice and neat and clean.
> >
> >    In practice, however, computing #t is blazingly fast.  In tests
> >    that I did some time ago ( Lua 5.2 ), with t being a sequence of
> >    10,000 elements, #t was computed 1,000,000 time in just 0.1 seconds.
> >
> >    And that was on my little 2008-era netbook.
> >
> >    Fast enough for me :-)
> >
> I don't think ppl see the true power of logn.
> To put it simply, I don't think ppl see that log2(10000) = 14-ish. (14 ops
> for 10000 elements)

For large arrays the real penalty is memory latency. I was researching using
SIMD instructions to optimize the binary search and found this post which
explains the problem very well:

  http://databasearchitects.blogspot.com/2015/09/trying-to-speed-up-binary-search.html

TL;DR: For larger arrays the fixed costs go up; in the above experiments, by
a factor of 4. A project could theoretically end up with bifurcated behavior
where evaluating #t on large arrays can be much slower than would be
expected given the performance on smaller arrays.

I don't think it's much of a deal for interpreted Lua, though. I think it'd
only be a worthwhile concern if the operation were JITd, but notably LuaJIT
doesn't bother (AFAICT) with memoization. Frameworks like Torch don't keep
large arrays in Lua tables. And simplicity has significant performance
benefits.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Dirk Laurie-2
In reply to this post by William Ahern
2016-07-07 23:36 GMT+02:00 William Ahern <[hidden email]>:
>
> How did "largest numeric index" work when deleting the last index,
> like when popping a stack data structure.

Lua 3.2 to 5.1 offered two array models, both of which allowed
holes.

1. If the table has 'n', that is its length as far for sort,, tinsert and
tremove. The latter two respectively increment and decrement n.
2. If it does not have 'n', the actual largest numeric index is
laboriously calculated by traversing the entire table each time
you call a table function.

> Did you also assign nil to erase it? If so, isn't that an inconsistency

Yes, it is. That's why you should use tremove (table.remove)
instead of assigning nil, since it updates t.n.

> But for a non-sequence, #t might evaluate to a lesser index
> after executing t[#t + 1] = true. What use is the guarantee that
> t[#t + 1] is empty if you lose track of that value after the fact?

Suppose that t represents the rooms in an hotel. Guests arrive
and leave. You evaluate #t. It tells you where to find an empty
room. You put a guest in that room. Next guest comes. Again
#t finds an empty room. Why is it bad if that room happens to
have a lower number? But it would be bad if the guarantee
that t[#t+1] is empty did not hold.

Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

William Ahern
On Fri, Jul 08, 2016 at 12:36:30AM +0200, Dirk Laurie wrote:
> 2016-07-07 23:36 GMT+02:00 William Ahern <[hidden email]>:
<snip>

> > But for a non-sequence, #t might evaluate to a lesser index
> > after executing t[#t + 1] = true. What use is the guarantee that
> > t[#t + 1] is empty if you lose track of that value after the fact?
>
> Suppose that t represents the rooms in an hotel. Guests arrive
> and leave. You evaluate #t. It tells you where to find an empty
> room. You put a guest in that room. Next guest comes. Again
> #t finds an empty room. Why is it bad if that room happens to
> have a lower number? But it would be bad if the guarantee
> that t[#t+1] is empty did not hold.

So you've put them in the room. Then what? You can't find them again. You
can't do `for i=1,#t do ... end` and expect to find them.

You could remember the index somewhere, but then you've only created a level
of indirection. Wherever you store the index you could have kept the object
itself. You if you wanted to anchor the object in memory, it would be
simpler and more performant to store it as the key.

Hmmmm... I guess one use would be for implementating luaL_ref. I'm not sure
how useful that would be in Lua code. Maybe I just need to think about it
longer.


Reply | Threaded
Open this post in threaded view
|

Re: New array type? (was: 'table' as fallback for tables)

Ross Berteig
On 7/7/2016 4:05 PM, William Ahern wrote:

> On Fri, Jul 08, 2016 at 12:36:30AM +0200, Dirk Laurie wrote:
>> Suppose that t represents the rooms in an hotel. Guests arrive
>> and leave. You evaluate #t. It tells you where to find an empty
>> room. You put a guest in that room. Next guest comes. Again
>> #t finds an empty room. Why is it bad if that room happens to
>> have a lower number? But it would be bad if the guarantee
>> that t[#t+1] is empty did not hold.
>
> So you've put them in the room. Then what? You can't find them again. You
> can't do `for i=1,#t do ... end` and expect to find them.

But what if you don't need to remember. In the hotel case, once they
have a key and are in a room, for the most part the rest of the hotel
only cares if that room is occupied (clean it during the day, don't
assign another guest to it, etc.) or not.

Besides, the hotel is itself a fixed size array of floors (often with 13
missing aka nil) each containing a fixed array of rooms. Extending a
hotel is a big deal, involving lots of money and other resources, and
mediated by building permits and inspections.

So iterating over all rooms is already an O(n) task, and normally
doesn't stumble over the missing 13th floor.

> You could remember the index somewhere, but then you've only created a level
> of indirection. Wherever you store the index you could have kept the object
> itself. You if you wanted to anchor the object in memory, it would be
> simpler and more performant to store it as the key.
>
> Hmmmm... I guess one use would be for implementating luaL_ref. I'm not sure
> how useful that would be in Lua code. Maybe I just need to think about it
> longer.

IIRC, that is exactly how luaL_ref() was implemented. At least in a
high-order pseudo-code description. The integer returned is used to look
up the saved object, and needed to be something (like an integer) that
would be easy to return to C code as plain data. Using #t+1 to get it is
cheap, easy, and reliable. And looking it up later is also cheap and easy.

I'd hate to see that feature of # on a plain table be lost.

--
Ross Berteig                               [hidden email]
Cheshire Engineering Corp.           http://www.CheshireEng.com/
+1 626 303 1602

1234