Bug report in length of simple table

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug report in length of simple table

Tomas.Lavicka

Program:

   Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio

   > a={}

   > a[1]=1

   > a[2]=nil

   > a[3]=3

   > a[4]=4

   > print(#a)

   4

should returns value 1 but because a[2] is nil, it should return value 1 according book “Programming in Lua (Third edition)” – there is hole in sequence.

 

Tomas Lavicka

Ostrava, Czech Republic

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Marc Balmer

Am 13.09.2016 um 10:35 schrieb [hidden email]:

Program:
   Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
   > a={}
   > a[1]=1
   > a[2]=nil
   > a[3]=3
   > a[4]=4
   > print(#a)
   4
should returns value 1 but because a[2] is nil, it should return value 1 according book “Programming in Lua (Third edition)” – there is hole in sequence.

No, this is not a bug.  The length operator #, when used on tables, is only defined for sequences.

Above table is not a sequence.  See http://www.lua.org/manual/5.3/manual.html#3.4.7 for details.

btw, "Programming Lua, Fourth Edition" is out, you might consider an "update"...  Especially since
the new edition contains a nice explanation in chapter 5.3 which the length operator on tables
with nil's is such a nice source of confusion...



Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Alexey Malyshev
13.09.2016 11:53, Marc Balmer пишет:

Am 13.09.2016 um 10:35 schrieb [hidden email]:

Program:
   Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
   > a={}
   > a[1]=1
   > a[2]=nil
   > a[3]=3
   > a[4]=4
   > print(#a)
   4
should returns value 1 but because a[2] is nil, it should return value 1 according book “Programming in Lua (Third edition)” – there is hole in sequence.

No, this is not a bug.  The length operator #, when used on tables, is only defined for sequences.

Above table is not a sequence.  See http://www.lua.org/manual/5.3/manual.html#3.4.7 for details.

btw, "Programming Lua, Fourth Edition" is out, you might consider an "update"...  Especially since
the new edition contains a nice explanation in chapter 5.3 which the length operator on tables
with nil's is such a nice source of confusion...



I think this explanation must be in official doc if it's a source of confusion.

print(# {[1] = 10, [2] = 20, [3] = nil, [4] = nil, [5] = 40} ) -- is not sequence ?
print(# {10, 20, nil, nil, 40} ) -- is sequence ?  From doc: Note that a table like {10, 20, nil, 40}  is not a sequence...

output:

2
5

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Oliver Kroth


Am 13.09.2016 um 11:37 schrieb Malma:

>
> I think this explanation must be in official doc if it's a source of
> confusion.
>
> print(# {[1] = 10, [2] = 20, [3] = nil, [4] = nil, [5] = 40} ) -- is
> not sequence ?
> print(# {10, 20, nil, nil, 40} ) -- is sequence ?  From doc: Note that
> a table like {10, 20, nil, 40}  is not a sequence...
>
> output:
>
> 2
> 5
>
The # operator computes the number of elements of a sequence in a Lua table
# returns a valid result only for sequences.

A Lua table contains a sequence if and only if all integer keys greater
zero are contiguous; i.e. there must be no missing numbers.

If there are gaps, the table contains no sequence.

Th table may contains values for other keys, including 0, negative, or
non-integer numbers, this does not affect the sequence.

Note that this definition is completely independent of the
implementation detail has part / array part; one does not need to care
about that.
It's only about integer keys.

nil can't be stored in a table. Assigning nil to a table key removes the
key. Use another value to tag special (e.g.missing) entries, like false

{10, 20, nil, nil, 40} contains NOT a sequence
{10, 20, false false, 40] contains a sequence
{ [0]="null", 10, 20, 30, a="alpha",  [math.pi] = "pi" } contains a sequence

--
Oliver

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Duncan Cross
In reply to this post by Tomas.Lavicka
On Tue, Sep 13, 2016 at 9:35 AM,  <[hidden email]> wrote:

> Program:
>
>    Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
>
>    > a={}
>
>    > a[1]=1
>
>    > a[2]=nil
>
>    > a[3]=3
>
>    > a[4]=4
>
>    > print(#a)
>
>    4
>
> should returns value 1 but because a[2] is nil, it should return value 1
> according book “Programming in Lua (Third edition)” – there is hole in
> sequence.
>
>
>
> Tomas Lavicka
>
> Ostrava, Czech Republic

Here's how you should think of it: #a will return ANY non-negative
integer n, where a[n] is not nil and a[n+1] is nil, OR if a[1] is nil,
#a MAY also return 0.

So if there are holes, then that means there are multiple,
equally-valid values for #a, and it should be considered completely
arbitrary which one you will get. (Even if it appears consistent, in
your own experiments.)

It is essentially your responsibility to make sure there is only one
valid value for n, in order to maintain a coherent "length" for a
particular table.

(All of this is only talking about default behavior, ignoring tables
that have the __len metamethod defined.)

-Duncan

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Peter Aronoff
In reply to this post by Alexey Malyshev
Malma <[hidden email]> wrote:
> I think this explanation must be in official doc if it's a source of
> confusion.

Are you saying that it *should* be in the official documentation or that it
*is* in the official documentation?

> print(# {[1] = 10, [2] = 20, [3] = nil, [4] = nil, [5] = 40} ) -- is not
> sequence ?

Correct. That is not a sequence. As it says in #3.4.7, {10, 20, nil, 40} is
not a sequence “because it has the key 4 but does not have the key 3. In
your table above, there is a key 5, but no key 3 or 4. Hence, not
a sequence.

> print(# {10, 20, nil, nil, 40} ) -- is sequence ?  From doc: Note that
> a table like {10, 20, nil, 40} is not a sequence...

You’re asking, but you also cite the relevant bit of the documentation, so
I think that you know the answer already: {10, 20, nil, nil, 40} is not
a sequence. For the reason given above from the manual.

Here’s one that I consider a bit odd—and I’m testing my own understanding
now. The following *is* a sequence, according to the definitions in Lua’s
manual: {10, 20, nil}. That’s a sequence, and it’s length is 2. It’s
a sequence because there is an n such that “the set of [the table’s]
positive numeric keys is equal to {1..n} for some non-negative integer n.”
And that n (here 2) is it’s length. (Someone please correct me if I’m
wrong.) Presumably this means that {[1] = 10, [2] = 20, [3] = nil} is also
a sequence, although with a more verbose constructor.

The information *is* in the manual, although you are certainly not the
first person to wish that it were either clearer or less brief or both.

P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Christian N.
Am 13.09.2016 um 15:23 schrieb Peter Aronoff:
> Here’s one that I consider a bit odd—and I’m testing my own understanding
> now. The following *is* a sequence, according to the definitions in Lua’s
> manual: {10, 20, nil}. That’s a sequence, and it’s length is 2. It’s
> a sequence because there is an n such that “the set of [the table’s]
> positive numeric keys is equal to {1..n} for some non-negative integer n.”
> And that n (here 2) is it’s length. (Someone please correct me if I’m
> wrong.) Presumably this means that {[1] = 10, [2] = 20, [3] = nil} is also
> a sequence, although with a more verbose constructor.
>

Since a nil-value is indistinguishable from an unset key in a table, any
trailing nils should be insignificant. That is, {10, 20, nil} is
equivalent to {10, 20}. Using explicit keys should not change that.


‒ Christian

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Peter Aronoff
"Christian N." <[hidden email]> wrote:
> Since a nil-value is indistinguishable from an unset key in a table, any
> trailing nils should be insignificant. That is, {10, 20, nil} is
> equivalent to {10, 20}. Using explicit keys should not change that.

That’s what I thought, but it’s good to confirm.

Thanks, Peter
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

RE: Bug report in length of simple table

Tomas.Lavicka
In reply to this post by Oliver Kroth


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Oliver Kroth
Sent: Tuesday, September 13, 2016 12:06
To: [hidden email]
Subject: Re: Bug report in length of simple table



Am 13.09.2016 um 11:37 schrieb Malma:

>
> I think this explanation must be in official doc if it's a source of
> confusion.
>
> print(# {[1] = 10, [2] = 20, [3] = nil, [4] = nil, [5] = 40} ) -- is
> not sequence ?
> print(# {10, 20, nil, nil, 40} ) -- is sequence ?  From doc: Note that
> a table like {10, 20, nil, 40}  is not a sequence...
>
> output:
>
> 2
> 5
>
The # operator computes the number of elements of a sequence in a Lua table # returns a valid result only for sequences.

A Lua table contains a sequence if and only if all integer keys greater zero are contiguous; i.e. there must be no missing numbers.

If there are gaps, the table contains no sequence.

Th table may contains values for other keys, including 0, negative, or non-integer numbers, this does not affect the sequence.

Note that this definition is completely independent of the implementation detail has part / array part; one does not need to care about that.
It's only about integer keys.

nil can't be stored in a table. Assigning nil to a table key removes the key. Use another value to tag special (e.g.missing) entries, like false

{10, 20, nil, nil, 40} contains NOT a sequence {10, 20, false false, 40] contains a sequence { [0]="null", 10, 20, 30, a="alpha",  [math.pi] = "pi" } contains a sequence

--
Oliver

-----------------------------------------------------------------------------------

I am new to lua and I want to understand the language. According book (Programming in Lua, 3rd edition):
"More precisely, a sequence is a table where the numeric keys comprise a set 1, . . . , n for some n." (page 24)
So table {10, nil, 30, 40} contains sequence {10} (n=1), which length is 1. Is there any other definition?
It is very confusing and operator # can be unusable for tables, because in many cases I need to check sequences and I cannot trust this operator. It should return or 1 (book definition) or if holes in sequence break the sequence, it should return 0. Otherwise I will be pushed for every sequence check to check all array fields manually in some kind of loop.

Tomas Lavicka
(please excuse my bad english - I am not english native - but I hope that you understand, what I mean)
Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Coda Highland
On Tue, Sep 13, 2016 at 11:32 PM,  <[hidden email]> wrote:
> I am new to lua and I want to understand the language. According book (Programming in Lua, 3rd edition):
> "More precisely, a sequence is a table where the numeric keys comprise a set 1, . . . , n for some n." (page 24)
> So table {10, nil, 30, 40} contains sequence {10} (n=1), which length is 1. Is there any other definition?
> It is very confusing and operator # can be unusable for tables, because in many cases I need to check sequences and I cannot trust this operator. It should return or 1 (book definition) or if holes in sequence break the sequence, it should return 0. Otherwise I will be pushed for every sequence check to check all array fields manually in some kind of loop.
>
> Tomas Lavicka
> (please excuse my bad english - I am not english native - but I hope that you understand, what I mean)

The definition in PiL 3rd Edition is written in a way that the
interpretation is ambiguous. The official definition in the Lua 5.2
manual clarifies on that to say "We use the term sequence to denote a
table where the set of all positive numeric keys is equal to {1..n}
for some integer n, which is called the length of the sequence (see
§3.4.6)."

That is to say, for a table to be a sequence, not only does it have to
have 1..n be non-nil, but it can't have any other positive numeric
keys.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Oliver Kroth
In reply to this post by Tomas.Lavicka


Am 14.09.2016 um 08:32 schrieb [hidden email]:

>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Oliver Kroth
> Sent: Tuesday, September 13, 2016 12:06
> To: [hidden email]
> Subject: Re: Bug report in length of simple table
>
>
>
> Am 13.09.2016 um 11:37 schrieb Malma:
>> I think this explanation must be in official doc if it's a source of
>> confusion.
>>
>> print(# {[1] = 10, [2] = 20, [3] = nil, [4] = nil, [5] = 40} ) -- is
>> not sequence ?
>> print(# {10, 20, nil, nil, 40} ) -- is sequence ?  From doc: Note that
>> a table like {10, 20, nil, 40}  is not a sequence...
>>
>> output:
>>
>> 2
>> 5
>>
> The # operator computes the number of elements of a sequence in a Lua table # returns a valid result only for sequences.
>
> A Lua table contains a sequence if and only if all integer keys greater zero are contiguous; i.e. there must be no missing numbers.
>
> If there are gaps, the table contains no sequence.
>
> Th table may contains values for other keys, including 0, negative, or non-integer numbers, this does not affect the sequence.
>
> Note that this definition is completely independent of the implementation detail has part / array part; one does not need to care about that.
> It's only about integer keys.
>
> nil can't be stored in a table. Assigning nil to a table key removes the key. Use another value to tag special (e.g.missing) entries, like false
>
> {10, 20, nil, nil, 40} contains NOT a sequence {10, 20, false false, 40] contains a sequence { [0]="null", 10, 20, 30, a="alpha",  [math.pi] = "pi" } contains a sequence
>
> --
> Oliver
>
> -----------------------------------------------------------------------------------
>
> I am new to lua and I want to understand the language. According book (Programming in Lua, 3rd edition):
> "More precisely, a sequence is a table where the numeric keys comprise a set 1, . . . , n for some n." (page 24)
> So table {10, nil, 30, 40} contains sequence {10} (n=1), which length is 1. Is there any other definition?
> It is very confusing and operator # can be unusable for tables, because in many cases I need to check sequences and I cannot trust this operator. It should return or 1 (book definition) or if holes in sequence break the sequence, it should return 0. Otherwise I will be pushed for every sequence check to check all array fields manually in some kind of loop.
>
> Tomas Lavicka
> (please excuse my bad english - I am not english native - but I hope that you understand, what I mean)
The table {10, nil, 30, 40} does not contain a sequence in the meaning
of Lua's sequence definition.
For Lua a sequence *all* (integer) numbers up to the highest number 'n'
must be present as key.
The keys 1, 3, and 4 are present, but there is no key-value pair for the
key 2.

nil is not a value that can be stored in a table. Assigning nil to a
table entry deletes the entry from the table.
This is different to, e.g. JavaScript, where one has to explicitly
remove the key from an object.

--
Oliver

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Jonathan Goble
In reply to this post by Tomas.Lavicka
On Wed, Sep 14, 2016 at 2:32 AM,  <[hidden email]> wrote:
> I am new to lua and I want to understand the language. According book (Programming in Lua, 3rd edition):
> "More precisely, a sequence is a table where the numeric keys comprise a set 1, . . . , n for some n." (page 24)
> So table {10, nil, 30, 40} contains sequence {10} (n=1), which length is 1. Is there any other definition?
> It is very confusing and operator # can be unusable for tables, because in many cases I need to check sequences and I cannot trust this operator. It should return or 1 (book definition) or if holes in sequence break the sequence, it should return 0. Otherwise I will be pushed for every sequence check to check all array fields manually in some kind of loop.

It behaves the way it does for efficiency.

The length algorithm, as I understand it (from previous explanations
on this list), does not and should not check every key-value pair in
the table, as you suggest, to determine whether or not a valid
sequence is present. That would be an O(n) operation, and nobody wants
that. The actual algorithm is (I think) closer to O(log n), and checks
pairs of positive integer keys according to a sensible pattern until
it finds an `n` where `t[n]` is not nil and `t[n+1]` is nil. It then
returns `n` as the length.

Thus, the length of a table is defined if and only it is a sequence,
since with a non-sequence, getting the length could result in any
non-negative integer `n` where `t[n]` is non-nil and `t[n+1]` is nil.
The length operator guarantees only that.

So what exactly is a sequence? A sequence, as you quoted from PiL, is
"a table where the numeric keys comprise a set 1, . . . , n for some
n." "Comprise" means that all of the numeric keys have to participate
in that, but even that is not perfectly accurate. Let me tweak and
expand the wording of PiL to be even more precise:

"A sequence is a table where the set of all positive integer keys that
do not correspond to a nil value is exactly equal to the set {1..n}
where n is the highest positive integer key that does not correspond
with a nil value."

So what is the table {10, nil, 30, 40} (to use your example)? It
cannot be a sequence of length 4, because the key 2 is nil. It also
cannot be a sequence of length 1 because keys 3 and 4 have values.
Specifically, the set of all positive integer keys is {1, 3, 4}, but
the set of all integers {1..n}, where n is the highest integer key in
the table, is {1, 2, 3, 4}. As those set are not identical, it is
therefore not a sequence at all, and the length of the table is
undefined.

The bottom line is that if you want to work with sequences, you CANNOT
use nil as a value in those sequences.


P.S. In the explanation above, I repeatedly stated "positive integer
keys" rather than "numeric keys" or simply "keys". Why? Because (and
this is an important point) zero, negative, non-integer (float) and
non-numeric keys are completely irrelevant to whether a table is a
sequence. When determining whether a table is a sequence, ONLY
positive integer keys matter. All other keys are ignored for that
purpose. So, for example, the table

{10, 20, 30, 40, [math.pi] = 3.14, [0] = 'foo', [-4] = 'bar', ignore = 'me'}

is a valid sequence of length 4.

Reply | Threaded
Open this post in threaded view
|

RE: Bug report in length of simple table

Tomas.Lavicka


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Jonathan Goble
Sent: Wednesday, September 14, 2016 9:06
To: Lua mailing list
Subject: Re: Bug report in length of simple table

On Wed, Sep 14, 2016 at 2:32 AM,  <[hidden email]> wrote:
> I am new to lua and I want to understand the language. According book (Programming in Lua, 3rd edition):
> "More precisely, a sequence is a table where the numeric keys comprise
> a set 1, . . . , n for some n." (page 24) So table {10, nil, 30, 40} contains sequence {10} (n=1), which length is 1. Is there any other definition?
> It is very confusing and operator # can be unusable for tables, because in many cases I need to check sequences and I cannot trust this operator. It should return or 1 (book definition) or if holes in sequence break the sequence, it should return 0. Otherwise I will be pushed for every sequence check to check all array fields manually in some kind of loop.

It behaves the way it does for efficiency.

The length algorithm, as I understand it (from previous explanations on this list), does not and should not check every key-value pair in the table, as you suggest, to determine whether or not a valid sequence is present. That would be an O(n) operation, and nobody wants that. The actual algorithm is (I think) closer to O(log n), and checks pairs of positive integer keys according to a sensible pattern until it finds an `n` where `t[n]` is not nil and `t[n+1]` is nil. It then returns `n` as the length.

Thus, the length of a table is defined if and only it is a sequence, since with a non-sequence, getting the length could result in any non-negative integer `n` where `t[n]` is non-nil and `t[n+1]` is nil.
The length operator guarantees only that.

So what exactly is a sequence? A sequence, as you quoted from PiL, is "a table where the numeric keys comprise a set 1, . . . , n for some n." "Comprise" means that all of the numeric keys have to participate in that, but even that is not perfectly accurate. Let me tweak and expand the wording of PiL to be even more precise:

"A sequence is a table where the set of all positive integer keys that do not correspond to a nil value is exactly equal to the set {1..n} where n is the highest positive integer key that does not correspond with a nil value."

So what is the table {10, nil, 30, 40} (to use your example)? It cannot be a sequence of length 4, because the key 2 is nil. It also cannot be a sequence of length 1 because keys 3 and 4 have values.
Specifically, the set of all positive integer keys is {1, 3, 4}, but the set of all integers {1..n}, where n is the highest integer key in the table, is {1, 2, 3, 4}. As those set are not identical, it is therefore not a sequence at all, and the length of the table is undefined.

The bottom line is that if you want to work with sequences, you CANNOT use nil as a value in those sequences.


P.S. In the explanation above, I repeatedly stated "positive integer keys" rather than "numeric keys" or simply "keys". Why? Because (and this is an important point) zero, negative, non-integer (float) and non-numeric keys are completely irrelevant to whether a table is a sequence. When determining whether a table is a sequence, ONLY positive integer keys matter. All other keys are ignored for that purpose. So, for example, the table

{10, 20, 30, 40, [math.pi] = 3.14, [0] = 'foo', [-4] = 'bar', ignore = 'me'}

is a valid sequence of length 4.
---------------------------------------------------------------------------------------------------------------------------------------

OK, I get it. Thx 4 explanation. But can you also explain to me next behavior?
> table.unpack{"a",nil,"c"}
a       nil     c
> table.unpack{[1]="a",[2]=nil,[3]="c"}
a

I am still confused with sequences, but I hope I will understand it soon.

Tomas Lavicka
Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Dirk Laurie-2
In reply to this post by Tomas.Lavicka
2016-09-14 8:32 GMT+02:00  <[hidden email]>:


> It is very confusing and operator # can be unusable for tables,
> because in many cases I need to check sequences and I cannot
> trust this operator. It should return or 1 (book definition) or if holes
> in sequence break the sequence, it should return 0. Otherwise
> I will be pushed for every sequence check to check all array fields
> manually in some kind of loop.
>
> Tomas Lavicka
> (please excuse my bad english - I am not english native - but
> I hope that you understand, what I mean)

The word "should" in English can indicate uncertainty, as in
"Trying out something without reading the manual carefully
should normally give the result you expect." It can also describe
what happens in an ideal world postulated by the speaker, as
in "A program written by someone else should do what I want,
not what the author wants".

In the case of the length operator, you are not the first person
or even the hundredth who did not understand what it does
and how to use it properly. We all did. It is in fact the pons asinorum
that tests whether you are ready to travel on to the really subtle
things in Lua. About 10% of us also complained on Lua-L and
most of those also had the attitude that Lua or its documentation
is wrong. Only a select few (I was one) earn from Roberto the
response "Do not use the word 'bug' for something that you do
not understand properly."

Here is some background.

1. There is only one way to find the length of an array in O(1)
time, and that is to store it somewhere and update it when it
changes. In general one needs to know something about the
array in order to be able to do that. For this purpose, Lua offers
the __len metamethod. It says: "Dear Lua, I shall take personal
responsibility for deciding what the size of this array is.
Signed, A. User."

2. There is a way to find the length of an array in O(log n)
time if it is known that the array has the sequence property.
If you use the length operator without a metamethod, you
are saying: "Dear Lua, I know that my array is a sequence.
Please go ahead and find its length using your fast algorithm."

3. As Josh Billings put it, "The trouble with people is not what
they don't know, but what they know and it ain't so." If your
sequence ain't a sequence, #t will give you a frontier, i.e.
a filled position where the next one is empty. So you can
just keep on doing "t[#t+1]=newvalue" and never overwrite
anything already in the array. This behaviour used to be
documented, was removed because it might confuse some
people, but now that it has become clear that people get
confused anyway, we hope that the next Lua release might
restore it, since the implementation has not changed.

4. If you give Lua an array that is not a sequence, but
you thought is was, it is a bug, and the bug is not in Lua.
Preventing or even merely detecting that bug takes O(1)
would slow Lua up for everybody and is not a serious option.

I see a new message from you has just come in. Maybe
you have in the meantime discovered some things for
yourself that I have explained above. I am hitting Send
nevertheless since you will also not be the last person to
write about this.

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Jonathan Goble
In reply to this post by Tomas.Lavicka
On Wed, Sep 14, 2016 at 3:37 AM,  <[hidden email]> wrote:
> OK, I get it. Thx 4 explanation. But can you also explain to me next behavior?
>> table.unpack{"a",nil,"c"}
> a       nil     c
>> table.unpack{[1]="a",[2]=nil,[3]="c"}
> a
>
> I am still confused with sequences, but I hope I will understand it soon.

table.unpack works only with sequences. As the arguments here are not
sequences, the result of the call is undefined, meaning that different
results can sometimes be obtained from identical arguments (and those
results may or may not be meaningful).

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Dirk Laurie-2
In reply to this post by Dirk Laurie-2
2016-09-14 9:43 GMT+02:00 Dirk Laurie <[hidden email]>:
> 2016-09-14 8:32 GMT+02:00  <[hidden email]>:

> 4. If you give Lua an array that is not a sequence, but
> you thought is was, it is a bug, and the bug is not in Lua.
> Preventing or even merely detecting that bug takes O(1)
> would slow Lua up for everybody and is not a serious option.

It should be "takes O(n) time and would slow Lua up".

> I see a new message from you has just come in. Maybe
> you have in the meantime discovered some things for
> yourself that I have explained above. I am hitting Send
> nevertheless since you will also not be the last person to
> write about this.

OK, I see that you have in fact progressed quite a lot.

> But can you also explain to me next behavior?
>> table.unpack{"a",nil,"c"}
> a       nil     c
>> table.unpack{[1]="a",[2]=nil,[3]="c"}
a

In general #t in the presence of holes depends not only on
the set of key-value pairs, but also on an an internal state
of a random number generator that drives the hashing
algorithm and is different for each table and for each time
that you start up Lua.

In your example, there is also a difference in the algorithm
used to construct tha table. You can see it using the bytecode
compiler.

$ luac -p -l -l -
table.unpack{"a",nil,"c"}

main <stdin:0,0> (9 instructions at 0x25c3b60)
0+ params, 5 slots, 1 upvalue, 0 locals, 4 constants, 0 functions
    1    [1]    GETTABUP     0 0 -1    ; _ENV "table"
    2    [1]    GETTABLE     0 0 -2    ; "unpack"
    3    [1]    NEWTABLE     1 3 0
    4    [1]    LOADK        2 -3    ; "a"
    5    [1]    LOADNIL      3 0
    6    [1]    LOADK        4 -4    ; "c"
    7    [1]    SETLIST      1 3 1    ; 1
    8    [1]    CALL         0 2 1
    9    [1]    RETURN       0 1
constants (4) for 0x25c3b60:
    1    "table"
    2    "unpack"
    3    "a"
    4    "c"
locals (0) for 0x25c3b60:
upvalues (1) for 0x25c3b60:
    0    _ENV    1    0

The three elements are loaded as a list.

$ luac -p -l -l -
table.unpack{[1]="a",[2]=nil,[3]="c"}

main <stdin:0,0> (8 instructions at 0xe6ab60)
0+ params, 2 slots, 1 upvalue, 0 locals, 8 constants, 0 functions
    1    [1]    GETTABUP     0 0 -1    ; _ENV "table"
    2    [1]    GETTABLE     0 0 -2    ; "unpack"
    3    [1]    NEWTABLE     1 0 3
    4    [1]    SETTABLE     1 -3 -4    ; 1 "a"
    5    [1]    SETTABLE     1 -5 -6    ; 2 nil
    6    [1]    SETTABLE     1 -7 -8    ; 3 "c"
    7    [1]    CALL         0 2 1
    8    [1]    RETURN       0 1
constants (8) for 0xe6ab60:
    1    "table"
    2    "unpack"
    3    1
    4    "a"
    5    2
    6    nil
    7    3
    8    "c"
locals (0) for 0xe6ab60:
upvalues (1) for 0xe6ab60:
    0    _ENV    1    0

The three elements are stored one by one.

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Dirk Laurie-2
In reply to this post by Jonathan Goble
2016-09-14 10:05 GMT+02:00 Jonathan Goble <[hidden email]>:

> On Wed, Sep 14, 2016 at 3:37 AM,  <[hidden email]> wrote:
>> OK, I get it. Thx 4 explanation. But can you also explain to me next behavior?
>>> table.unpack{"a",nil,"c"}
>> a       nil     c
>>> table.unpack{[1]="a",[2]=nil,[3]="c"}
>> a
>>
>> I am still confused with sequences, but I hope I will understand it soon.
>
> table.unpack works only with sequences. As the arguments here are not
> sequences, the result of the call is undefined, meaning that different
> results can sometimes be obtained from identical arguments (and those
> results may or may not be meaningful).

The issue is lower down: table.unpack uses the length operator
to get the length of the sequence. The difference in the two outputs
is caused by the different behaviour of the length operator.

<advanced>
This turn is caused is caused by the different ways in which the tables
are stored (all in the "array" part in the first case, only one element
in the "array" part in the second case). The details are quite involved
but basically #t will not change as long as you assign to indices that
exist (even if they have value nil)

$ lua
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> t={1,nil,nil,nil,nil,nil,nil,nil,nil,10}   -- indices 1 to 10 exist in the "array part"
> #t   -- there is a non-nil at the highest "array part" index
10
> t[5]=5  -- assigning to an existing index does not change #t
> #t
10
> t[12]=12  -- non-existing index triggers resizing of "array part"
> #t     -- [5], [10] and [12] are now in "hash part"
1
</advanced>

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Roberto Ierusalimschy
In reply to this post by Dirk Laurie-2
> 3. As Josh Billings put it, "The trouble with people is not what
> they don't know, but what they know and it ain't so." If your
> sequence ain't a sequence, #t will give you a frontier, i.e.
> a filled position where the next one is empty. So you can
> just keep on doing "t[#t+1]=newvalue" and never overwrite
> anything already in the array. This behaviour used to be
> documented, was removed because it might confuse some
> people, but now that it has become clear that people get
> confused anyway, we hope that the next Lua release might
> restore it, since the implementation has not changed.

We are planning to restore it. The new explanation would go like this.
(What sounds better, "border" or "frontier"?)

--------------------------------------------------------------------------
A *border* in a table 't' is any non-negative integer such that

   (border == 0 or t[border] ~= nil) and t[border + 1] == nil

That is, a border points to any position in a table where a non-nil value
is followed by a nil value (or to 0, when position 1 is empty).

For instance, the table {1, 2, 3, 4, 5} has only one border, 5.  The
table {1, 2, 3, nil, 5} has two borders, 3 and 5.  The table {nil, 2, 3,
nil, 5, nil} has three borders, 0, 3 and 5. The table {} has one border,
0.

Note that any table has at least one border.
Note also that keys that are not a non-negative integer
do not interfere with the notion of borders.
The table {x=1, y=10} has one border, 1. The table
{[1.1] = 1, [-3] = 1, [0] = 1, 1, 2, 3} also has one border, 3.


A table with exactly one border is called a *sequence*.

The length operator #t returns a border for the table 't'.
When t is a sequence, #t returns its only border, which corresponds to
the intuitive notion of the length of the sequence.
When t is not a sequence, #t can return any of its borders. (The exact
one depends on details of the internal representation of the table.)
--------------------------------------------------------------------------

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Peter Aronoff
In reply to this post by Dirk Laurie-2
Dirk Laurie <[hidden email]> wrote:

> <advanced>
> This turn is caused is caused by the different ways in which the tables
> are stored (all in the "array" part in the first case, only one element
> in the "array" part in the second case). The details are quite involved
> but basically #t will not change as long as you assign to indices that
> exist (even if they have value nil)
>
> $ lua
> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> > t={1,nil,nil,nil,nil,nil,nil,nil,nil,10}   -- indices 1 to 10 exist in the "array part"
> > #t   -- there is a non-nil at the highest "array part" index
> 10
> > t[5]=5  -- assigning to an existing index does not change #t
> > #t
> 10
> > t[12]=12  -- non-existing index triggers resizing of "array part"
> > #t     -- [5], [10] and [12] are now in "hash part"
> 1
> </advanced>
 
While we are on this *advanced* subject, I have a question: none of this
“array part” and “hash part” stuff is guaranteed, right? That is, you
shouldn’t rely on it in your code, should you? (Not that it’s immediately
clear to me how you would even rely on it, but just to be clear.)

Thanks, P
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: Bug report in length of simple table

Dirk Laurie-2
In reply to this post by Roberto Ierusalimschy
2016-09-14 16:04 GMT+02:00 Roberto Ierusalimschy <[hidden email]>:

>> 3. As Josh Billings put it, "The trouble with people is not what
>> they don't know, but what they know and it ain't so." If your
>> sequence ain't a sequence, #t will give you a frontier, i.e.
>> a filled position where the next one is empty. So you can
>> just keep on doing "t[#t+1]=newvalue" and never overwrite
>> anything already in the array. This behaviour used to be
>> documented, was removed because it might confuse some
>> people, but now that it has become clear that people get
>> confused anyway, we hope that the next Lua release might
>> restore it, since the implementation has not changed.
>
> We are planning to restore it. The new explanation would go like this.
> (What sounds better, "border" or "frontier"?)
>
> --------------------------------------------------------------------------
> A *border* in a table 't' is any non-negative integer such that
>
>    (border == 0 or t[border] ~= nil) and t[border + 1] == nil
>
> That is, a border points to any position in a table where a non-nil value
> is followed by a nil value (or to 0, when position 1 is empty).
>
> For instance, the table {1, 2, 3, 4, 5} has only one border, 5.  The
> table {1, 2, 3, nil, 5} has two borders, 3 and 5.  The table {nil, 2, 3,
> nil, 5, nil} has three borders, 0, 3 and 5. The table {} has one border,
> 0.
>
> Note that any table has at least one border.
> Note also that keys that are not a non-negative integer
> do not interfere with the notion of borders.
> The table {x=1, y=10} has one border, 1. The table
> {[1.1] = 1, [-3] = 1, [0] = 1, 1, 2, 3} also has one border, 3.
>
>
> A table with exactly one border is called a *sequence*.
>
> The length operator #t returns a border for the table 't'.
> When t is a sequence, #t returns its only border, which corresponds to
> the intuitive notion of the length of the sequence.
> When t is not a sequence, #t can return any of its borders. (The exact
> one depends on details of the internal representation of the table.)
> --------------------------------------------------------------------------

Magnificent!

The word "frontier" is already in use for the %f pattern in strings, which
is strongly related but not the same thing. (For example, %f gives the
first available frontier which # does not). The distinction is different
enough to justify a different word. So I agree with "border".

123