The many lengths of Lua

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

The many lengths of Lua

Soni "They/Them" L.
Lua has many lengths. From string lengths to table lengths, from number
lengths to sequence lengths, from sequence lengths to proper sequence
lengths. They are the many lengths of Lua.

The # operator returns string lengths and table lengths. It is the
standard length operator, and it's what you usually use to get the
length of an object.

The string.len() function returns string lengths. It typechecks the
argument to make sure it's a string, but otherwise returns the same
value as #.

There are various types of table length. Some of them are sequences,
some of them are not. Some have nothing to do with length, but rather
with count.

The simplest length is the one provided by ipairs(). It only iterates
the "proper sequence" part of a table. That is, it iterates the set of
contiguous positive integer keys of a table.[1]
When in doubt, this is the length you should rely on. Don't do `for
i=1,#t`, but instead use `for i,v in ipairs(t)`.

Another simple length is the one provided by pairs(). This is actually a
count. If for every iteration of pairs() you increment a counter, you'll
end up with the number of keys in a table. It is rarely used, but can be
useful sometimes.

If you want a manual table length, the simplest way to do it is probably
to just use an `n` field. While Lua supports this usage, the standard
library doesn't, so you have to deal with it manually. While the
standard library doesn't natively support the `n` field, some functions,
such as table.pack(), may emit it.

Another length option is the highest key of a table. You can get this
length by combining pairs() and math.max(). It can be useful in some
niche applications but it's quite slow, so consider a manual length (see
previous paragraph) instead.

By combining pairs() with type(), you can get the number of non-integer
keys in a table. While this count does exist, I have never seen it used
in practice.

The length operator, #, can also be applied to tables. If your table has
a positive integer key, and there's a smaller possitive integer that is
not a key (i.e. the value associated with it is `nil`), then this
shouldn't be used. When using a table without manual length, this
operator is usually faster than any other method[2], but it does have
the aforementioned drawback. This table length is the only table length
that is unspecified for some tables.

Finally, you can also use your own length algorithm. Use this if you
want fast runtime, but the drawbacks of the length operator make it
unsuitable for your use-case.

And these are the many lengths of Lua!

[1] - I'm not sure how many people know this, but this property
(stopping on first `nil`) is actually described in the manual. That is,
ipairs() on a table with "holes" is actually well-defined.
[2] - The Lua manual doesn't guarantee O(log n) time complexity for the
length operator. If you want guaranteed O(log n) runtime, use your own
length algorithm. This means # could have O(n) time complexity (i.e.
equivalent to ipairs()), or even O(m) where m = the number of keys in
the table (i.e. equivalent to pairs() + math.max()).

PS: Sorry for the wall of text.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Aapo Talvensaari
On 14 September 2016 at 22:00, Soni L. <[hidden email]> wrote:
The string.len() function returns string lengths. It typechecks the argument to make sure it's a string, but otherwise returns the same value as #.

And then there is also:
Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Russell Haley
In reply to this post by Soni "They/Them" L.
Thanks for the great deep dive! I'm going to save this email for reference. 

Russ

Sent from my BlackBerry 10 smartphone on the Virgin Mobile network.
  Original Message  
From: Soni L.
Sent: Wednesday, September 14, 2016 3:00 PM
To: Lua mailing list
Reply To: Lua mailing list
Subject: The many lengths of Lua

Lua has many lengths. From string lengths to table lengths, from number
lengths to sequence lengths, from sequence lengths to proper sequence
lengths. They are the many lengths of Lua.

The # operator returns string lengths and table lengths. It is the
standard length operator, and it's what you usually use to get the
length of an object.

The string.len() function returns string lengths. It typechecks the
argument to make sure it's a string, but otherwise returns the same
value as #.

There are various types of table length. Some of them are sequences,
some of them are not. Some have nothing to do with length, but rather
with count.

The simplest length is the one provided by ipairs(). It only iterates
the "proper sequence" part of a table. That is, it iterates the set of
contiguous positive integer keys of a table.[1]
When in doubt, this is the length you should rely on. Don't do `for
i=1,#t`, but instead use `for i,v in ipairs(t)`.

Another simple length is the one provided by pairs(). This is actually a
count. If for every iteration of pairs() you increment a counter, you'll
end up with the number of keys in a table. It is rarely used, but can be
useful sometimes.

If you want a manual table length, the simplest way to do it is probably
to just use an `n` field. While Lua supports this usage, the standard
library doesn't, so you have to deal with it manually. While the
standard library doesn't natively support the `n` field, some functions,
such as table.pack(), may emit it.

Another length option is the highest key of a table. You can get this
length by combining pairs() and math.max(). It can be useful in some
niche applications but it's quite slow, so consider a manual length (see
previous paragraph) instead.

By combining pairs() with type(), you can get the number of non-integer
keys in a table. While this count does exist, I have never seen it used
in practice.

The length operator, #, can also be applied to tables. If your table has
a positive integer key, and there's a smaller possitive integer that is
not a key (i.e. the value associated with it is `nil`), then this
shouldn't be used. When using a table without manual length, this
operator is usually faster than any other method[2], but it does have
the aforementioned drawback. This table length is the only table length
that is unspecified for some tables.

Finally, you can also use your own length algorithm. Use this if you
want fast runtime, but the drawbacks of the length operator make it
unsuitable for your use-case.

And these are the many lengths of Lua!

[1] - I'm not sure how many people know this, but this property
(stopping on first `nil`) is actually described in the manual. That is,
ipairs() on a table with "holes" is actually well-defined.
[2] - The Lua manual doesn't guarantee O(log n) time complexity for the
length operator. If you want guaranteed O(log n) runtime, use your own
length algorithm. This means # could have O(n) time complexity (i.e.
equivalent to ipairs()), or even O(m) where m = the number of keys in
the table (i.e. equivalent to pairs() + math.max()).

PS: Sorry for the wall of text.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.



Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Steve Litt
In reply to this post by Soni "They/Them" L.
On Wed, 14 Sep 2016 16:00:03 -0300
"Soni L." <[hidden email]> wrote:

> Lua has many lengths. From string lengths to table lengths, from
> number lengths to sequence lengths, from sequence lengths to proper
> sequence lengths. They are the many lengths of Lua.
>
> The # operator returns string lengths and table lengths. It is the
> standard length operator, and it's what you usually use to get the
> length of an object.
>
> The string.len() function returns string lengths. It typechecks the
> argument to make sure it's a string, but otherwise returns the same
> value as #.
>
> There are various types of table length. Some of them are sequences,
> some of them are not. Some have nothing to do with length, but rather
> with count.
>
> The simplest length is the one provided by ipairs(). It only iterates
> the "proper sequence" part of a table. That is, it iterates the set
> of contiguous positive integer keys of a table.[1]
> When in doubt, this is the length you should rely on. Don't do `for
> i=1,#t`, but instead use `for i,v in ipairs(t)`.
>
> Another simple length is the one provided by pairs(). This is
> actually a count. If for every iteration of pairs() you increment a
> counter, you'll end up with the number of keys in a table. It is
> rarely used, but can be useful sometimes.
>
> If you want a manual table length, the simplest way to do it is
> probably to just use an `n` field. While Lua supports this usage, the
> standard library doesn't, so you have to deal with it manually. While
> the standard library doesn't natively support the `n` field, some
> functions, such as table.pack(), may emit it.
>
> Another length option is the highest key of a table. You can get this
> length by combining pairs() and math.max(). It can be useful in some
> niche applications but it's quite slow, so consider a manual length
> (see previous paragraph) instead.
>
> By combining pairs() with type(), you can get the number of
> non-integer keys in a table. While this count does exist, I have
> never seen it used in practice.
>
> The length operator, #, can also be applied to tables. If your table
> has a positive integer key, and there's a smaller possitive integer
> that is not a key (i.e. the value associated with it is `nil`), then
> this shouldn't be used. When using a table without manual length,
> this operator is usually faster than any other method[2], but it does
> have the aforementioned drawback. This table length is the only table
> length that is unspecified for some tables.
>
> Finally, you can also use your own length algorithm. Use this if you
> want fast runtime, but the drawbacks of the length operator make it
> unsuitable for your use-case.
>
> And these are the many lengths of Lua!
>
> [1] - I'm not sure how many people know this, but this property
> (stopping on first `nil`) is actually described in the manual. That
> is, ipairs() on a table with "holes" is actually well-defined.
> [2] - The Lua manual doesn't guarantee O(log n) time complexity for
> the length operator. If you want guaranteed O(log n) runtime, use
> your own length algorithm. This means # could have O(n) time
> complexity (i.e. equivalent to ipairs()), or even O(m) where m = the
> number of keys in the table (i.e. equivalent to pairs() + math.max()).
>
> PS: Sorry for the wall of text.
>


The description and list of lengths given by Soni L above should be
copied, verbatim, into some easy to find and easy to search Lua
documentation. There have been many times I could have used Soni's
description.

SteveT

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Igor Ehrlich
>The simplest length is the one provided by ipairs().
>It only iterates the "proper sequence" part of a table.
>That is, it iterates the >set of contiguous positive integer keys of a table.[1]
>When in doubt, this is the length you should rely on. Don't do `for i=1,#t`, but instead use `for i,v in ipairs(t)`.

From what I've seen so far, term 'rely on' is not completely compatible with Lua, and this is an important thing to know for any lua developer, especially novice one. Let me elaborate a little bit, so you get the idea of what I'm talking about.

It is possible that surrounding code will hijack pairs/ipairs. What should I do if I doubt whether or not the next/ipairs_aux iterators are really what I expect them to be? Will they work as I expect them to work? The safest approach here would've been to re-implement them locally and call local implementations. After that, I'd post these implementations to this list and insist to add them in the next lua release.
Unfortunately, this does not seem to be feasible in pure lua. This might be done using C code, but that seems to be an overkill.

What I really can do, is I can include a number of simple tests in my library code, testing if pairs/ipairs behave like I expect them to behave (with some realistic probability ofc). You know, a couple of simple runners and a dozen test cases. Then I run these tests on the module initialization, and, if they pass, I save the actual function object of next/ipairs_aux (as returned by pairs/ipairs) locally and call it via my own wrapper [1]. Surely, it will add some overhead on library initialization, but I'll be 100% sure on what length I rely on.

But wait, I can simply write some tests. Suddenly, that changes a lot.

Best regards,
Igor A. Ehrlich

[1] I understand that we might go down that road a little bit further, but I'm not sadistic enough, and I assume that the surrounding code does not intervene in the library code itself.

On Thu, Sep 15, 2016 at 7:25 AM, Steve Litt <[hidden email]> wrote:
On Wed, 14 Sep 2016 16:00:03 -0300
"Soni L." <[hidden email]> wrote:

> Lua has many lengths. From string lengths to table lengths, from
> number lengths to sequence lengths, from sequence lengths to proper
> sequence lengths. They are the many lengths of Lua.
>
> The # operator returns string lengths and table lengths. It is the
> standard length operator, and it's what you usually use to get the
> length of an object.
>
> The string.len() function returns string lengths. It typechecks the
> argument to make sure it's a string, but otherwise returns the same
> value as #.
>
> There are various types of table length. Some of them are sequences,
> some of them are not. Some have nothing to do with length, but rather
> with count.
>
> The simplest length is the one provided by ipairs(). It only iterates
> the "proper sequence" part of a table. That is, it iterates the set
> of contiguous positive integer keys of a table.[1]
> When in doubt, this is the length you should rely on. Don't do `for
> i=1,#t`, but instead use `for i,v in ipairs(t)`.
>
> Another simple length is the one provided by pairs(). This is
> actually a count. If for every iteration of pairs() you increment a
> counter, you'll end up with the number of keys in a table. It is
> rarely used, but can be useful sometimes.
>
> If you want a manual table length, the simplest way to do it is
> probably to just use an `n` field. While Lua supports this usage, the
> standard library doesn't, so you have to deal with it manually. While
> the standard library doesn't natively support the `n` field, some
> functions, such as table.pack(), may emit it.
>
> Another length option is the highest key of a table. You can get this
> length by combining pairs() and math.max(). It can be useful in some
> niche applications but it's quite slow, so consider a manual length
> (see previous paragraph) instead.
>
> By combining pairs() with type(), you can get the number of
> non-integer keys in a table. While this count does exist, I have
> never seen it used in practice.
>
> The length operator, #, can also be applied to tables. If your table
> has a positive integer key, and there's a smaller possitive integer
> that is not a key (i.e. the value associated with it is `nil`), then
> this shouldn't be used. When using a table without manual length,
> this operator is usually faster than any other method[2], but it does
> have the aforementioned drawback. This table length is the only table
> length that is unspecified for some tables.
>
> Finally, you can also use your own length algorithm. Use this if you
> want fast runtime, but the drawbacks of the length operator make it
> unsuitable for your use-case.
>
> And these are the many lengths of Lua!
>
> [1] - I'm not sure how many people know this, but this property
> (stopping on first `nil`) is actually described in the manual. That
> is, ipairs() on a table with "holes" is actually well-defined.
> [2] - The Lua manual doesn't guarantee O(log n) time complexity for
> the length operator. If you want guaranteed O(log n) runtime, use
> your own length algorithm. This means # could have O(n) time
> complexity (i.e. equivalent to ipairs()), or even O(m) where m = the
> number of keys in the table (i.e. equivalent to pairs() + math.max()).
>
> PS: Sorry for the wall of text.
>


The description and list of lengths given by Soni L above should be
copied, verbatim, into some easy to find and easy to search Lua
documentation. There have been many times I could have used Soni's
description.

SteveT




--
Best regards,
Igor A. Ehrlich
Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Daurnimator
On 15 September 2016 at 18:18, Igor Ehrlich <[hidden email]> wrote:
> It is possible that surrounding code will hijack pairs/ipairs. What should I
> do if I doubt whether or not the next/ipairs_aux iterators are really what I
> expect them to be? Will they work as I expect them to work? The safest
> approach here would've been to re-implement them locally and call local
> implementations. After that, I'd post these implementations to this list and
> insist to add them in the next lua release.

IMO you should assume that if the user changed them, they changed them
for a reason.

e.g. In C code you don't go around checking functions do what they're
documented to do just in case someone used LD_PRELOAD.

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Igor Ehrlich
>IMO you should assume that if the user changed them, they changed them
>for a reason.
The user could've changed them for whatever reason, he might not even know that my library uses them. My task is to ensure that my library works as expected in as many cases as possible.

Here's a simple example. Let's assume that I want to write a library that exports a single function 'table_maxkey', which is dedicated to finding the biggest numeric key value in a table. Test case looks like

local max_idx = table_maxkey({1,2,8}) -- should return '8'

So I prepare this library (we all know what will be inside), write a README.md, post it on Github, and there's this one user that tries to use it, tries to reproduce the case above and gets the return value of "42". Yes, he did what he did with his own hands, but it is my library that works improperly here. I do not comply to my own interface, see?
How do I fix it? Well, I can't write a disclaimer "this library only works properly with default implementations of all library functions", which would've been an all-around safest way to restore the compliance (in case I plan to add something in future, you know), but no one will use such library, right? Here's another idea, I might write "this library only works properly with default implementations of pairs and next library functions". Yes, that works like charm, but again, assume that I support and extend my library, add new interfaces which will eventually depend on all library functions. My disclaimer, once written properly, will probably be bigger than the code itself. Plus, I need either a handful of hand labor or some good automation to support the compliance.

>e.g. In C code you don't go around checking functions do what they're
>documented to do just in case someone used LD_PRELOAD.
I just googled "prevent dll spoofing" and got, quote, "About 77,800 results". Plus the unique content by keywords 'dll shimming', 'dll hijacking' and also linux/bsd/macos-specific content that is not googled by keyword 'dll', see for yourself. I think that's something to start with?

Best regards,
Igor A. Ehrlich
 

On Thu, Sep 15, 2016 at 11:43 AM, Daurnimator <[hidden email]> wrote:
On 15 September 2016 at 18:18, Igor Ehrlich <[hidden email]> wrote:
> It is possible that surrounding code will hijack pairs/ipairs. What should I
> do if I doubt whether or not the next/ipairs_aux iterators are really what I
> expect them to be? Will they work as I expect them to work? The safest
> approach here would've been to re-implement them locally and call local
> implementations. After that, I'd post these implementations to this list and
> insist to add them in the next lua release.

IMO you should assume that if the user changed them, they changed them
for a reason.

e.g. In C code you don't go around checking functions do what they're
documented to do just in case someone used LD_PRELOAD.




--
Best regards,
Igor A. Ehrlich
Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Martin
On 16-09-15 02:26 AM, Igor Ehrlich wrote:
> local max_idx = table_maxkey({1,2,8}) -- should return '8'

IMO it should return 3 of have another name.

Personally I do not wish to verify any utility functions. For example

local original_pairs = _G.pairs
_G.pairs = function(...)
  print('pairs_called')
  print_stacktrace()
  return original_pairs(...)
end

or even

_G.pairs = _G.error

And what are you going to do in these cases, refuse to execute? Then
that malicious user will blame your code anyway because it still does
not work.

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Karel Tuma-2
In reply to this post by Soni "They/Them" L.
Excerpts from Soni L.'s message of 2016-09-14 21:00:03 +0200:
> ...

Immortalized.

http://lua-users.org/wiki/LenghtsInLua

Let the edit wars begin (it does not need to be this oblong),
but otherwise a decent writeup.

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Coda Highland
On Thu, Sep 15, 2016 at 12:16 PM, Karel Tuma <[hidden email]> wrote:

> Excerpts from Soni L.'s message of 2016-09-14 21:00:03 +0200:
>> ...
>
> Immortalized.
>
> http://lua-users.org/wiki/LenghtsInLua
>
> Let the edit wars begin (it does not need to be this oblong),
> but otherwise a decent writeup.
>

Let it begin with fixing the spelling of the article name. ;)

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Russell Haley
You beat me by 30 seconds!

Russ

Sent from my BlackBerry 10 smartphone on the Virgin Mobile network.
  Original Message  
From: Coda Highland
Sent: Thursday, September 15, 2016 3:19 PM
To: Lua mailing list
Reply To: Lua mailing list
Subject: Re: The many lengths of Lua

On Thu, Sep 15, 2016 at 12:16 PM, Karel Tuma <[hidden email]> wrote:

> Excerpts from Soni L.'s message of 2016-09-14 21:00:03 +0200:
>> ...
>
> Immortalized.
>
> http://lua-users.org/wiki/LenghtsInLua
>
> Let the edit wars begin (it does not need to be this oblong),
> but otherwise a decent writeup.
>

Let it begin with fixing the spelling of the article name. ;)

/s/ Adam


Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Peter Aronoff
In reply to this post by Karel Tuma-2
Karel Tuma <[hidden email]> wrote:
> Excerpts from Soni L.'s message of 2016-09-14 21:00:03 +0200:
> > ...
>
> Immortalized.
>
> http://lua-users.org/wiki/LenghtsInLua

For the sake of future Googlers, can we copy that over to this URL?

http://lua-users.org/wiki/LengthsInLua

Sorry, but it’s an unfortunate typo!

Best, P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: The many lengths of Lua

Thijs Schreijer

> On 15 Sep 2016, at 21:33, Peter Aronoff <[hidden email]> wrote:
>
> Karel Tuma <[hidden email]> wrote:
>> Excerpts from Soni L.'s message of 2016-09-14 21:00:03 +0200:
>>> ...
>>
>> Immortalized.
>>
>> http://lua-users.org/wiki/LenghtsInLua
>
> For the sake of future Googlers, can we copy that over to this URL?
>
> http://lua-users.org/wiki/LengthsInLua
>
> Sorry, but it’s an unfortunate typo!
>
> Best, P

I think the ‘missing’ length would be `select(“#", …)`

Thijs