Nil in table proposal

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Nil in table proposal

pocomane
Another nil-in-table proposal here. I wanted to post this as
observation to the Dirk proposal, but eventually I decided to start a
new thread to avoid confusion. Just to catch your interest before a
long post, this is the core of my proposal:

- Two tables mode: Keep-mode never deletes content; Normal-mode
deletes nil content every time it is accessed, also if it is only for
read.

Now, the explanation...

The main problems I want to address, are:

On Wed, Mar 14, 2018 at 7:24 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

> The problems I am trying to solve are these:
>
> 1) A constructor like {x, y, z} should always create a sequence with
> three elements. A constructor like {...} should always get all arguments
> passed to a function. A constructor like {f(x)} should always get
> all results returned by the function. '#' should always work on these
> tables correctly.
>
> 2) A statement like 't[#t + 1] = x' should always add one more element
> at the end of a sequence.
>
> These are what confuse people all the time, these are what start new
> rounds of discussions around '#'.

It seems that to this purpose we need somehow to store nil in table as
any other value. However, what really tables did with nil is a
next-order problem, the main concern is how users interact with table
content. For the simple access THERE IS NO DIFFERENCE between nil in
table or not:

```
local t = {}
t[a] = 1
t[a] = nil
assert(t[a] == nil)
```

Does it means that nil is in the table? Does it means that the the
value is missing? Who cares? Chose the best semantic for your problem
and go ahead.

But, there is actually a place where you can distinguish the two
cases: table iteration (# is a form of table iteration). Pratically, I
think the question is: what the following code should print?

```
for k,v in pairs(t) do
  print(v)
end
```

The first idea could be: let it acts according to a metatable flag. If
the table is in a "Normal mode" it will not print anything. If it is
in a "Keep mode" it will print nil.

However, internally, when the content should be actually deleted ?
Well, if a normal-mode table is asked to access a nil value (also just
for read), it should be coherent to remove it before anything else.
This also autmatically gives us the two kind of iteration.

Note that it is just a direct extension of what already happen: now
lua remove stuff when a nil is assigned, I am proposing to remove also
when nil is read.

Summing up, the porposal is:
- Table can act in two mode "Normal mode" and "Keep mode" according to
a flag in its metatable
- Keep-mode tables store nil and it never deletes contents
- Normal-mode tables will deletes contents every times a nil is
assigned or read (as its content).

Some details on how table works with this proposal:
- The tables constructors always put nil inside the table, but then
the table is left in normal-mode. [1]
- When iterating over keep-mode tables, nil values are returned as any
other value.
- When iterating over normal-mode tables, if a nil is found it is
deleted just before the iteration code is run, i.e. it never happen to
get a nil value in the iteration code.

Now, back to the initial issues. With this proposal we still need to
add a call like:

```
local function f() return 1,nil,2 end

local a = {f()}
set_keep_mode(a)

assert(#a == 3)

a[#a+1] = nil
assert(#a == 4)
```

Instead of `set_keep_mode` one can use a constructor function like
`local a = keep{f()}`. In any case, it is a bit verbose, but
acceptable for me. [2]

I would like to try to write a patch, but it seems to me that enabling
both table mode is not trivial in the current lua source
(LUA_NILINTABLE actually changes the definition of emptyness
everywhere). So, before to start hacking, I would like to know your
opinion.

Pocomane

[1] This is for compatibility. If it was a completely new language,
probably, I would use the keep-mode as the default.

[2]  At least for the # operator, it sould be improved with: "The
default # operator never deletes stuff". But it sounds a bit too much
as an exception to me. I do not know if it worth.

Reply | Threaded
Open this post in threaded view
|

Re: Nil in table proposal

Roberto Ierusalimschy
> It seems that to this purpose we need somehow to store nil in table as
> any other value. However, what really tables did with nil is a
> next-order problem, the main concern is how users interact with table
> content. For the simple access THERE IS NO DIFFERENCE between nil in
> table or not:
>
> ```
> local t = {}
> t[a] = 1
> t[a] = nil
> assert(t[a] == nil)
> ```
>
> Does it means that nil is in the table? Does it means that the the
> value is missing? Who cares? Chose the best semantic for your problem
> and go ahead.
>
> But, there is actually a place where you can distinguish the two
> cases: table iteration (# is a form of table iteration). [...]

This is a nice rational study of the situation. However, there are at least
two other constructions where we can distinguish the two cases:

1) Garbage collection of the corresponding key. In your example, can
'a' be collected? (We can argue that this case follows from iteration:
if 'a' appears in an iteration, it must be kept; otherwise, it could be
removed.)

2) Metamethods: is 't[a]' "empty" for the point of view of calling
metamethods? An access to t[a] should call __index? An assignment
should call __newindex? (This is big difference and independent of
iterations.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Nil in table proposal

Hisham
On 21 March 2018 at 10:17, Roberto Ierusalimschy <[hidden email]> wrote:
> 1) Garbage collection of the corresponding key. In your example, can
> 'a' be collected? (We can argue that this case follows from iteration:
> if 'a' appears in an iteration, it must be kept; otherwise, it could be
> removed.)
>
> 2) Metamethods: is 't[a]' "empty" for the point of view of calling
> metamethods? An access to t[a] should call __index? An assignment
> should call __newindex? (This is big difference and independent of
> iterations.)

Under pocomane's proposal, I would expect the behavior of metamethods
to follow from the table mode: for normal tables, the 5.3 behavior; in
keep-mode, a "stored" nil would be treated as an existing value, so
t[a] set to nil wouldn't trigger metamethods on access or update.

Another nice thing of this proposal is that for every existing Lua 5.x
program there is a statically-computable upper-bound of additional
memory that it would use if table constructors become nil-storing
normal-mode tables by default (often with probably no memory increase
at all on average, due to the delete-on-read/iterate semantics of
normal-mode).

The only downside is again the "JavaScript equality effect" [1]:
`keep{...}` would become the "right way" to collect arguments and the
obvious one would remain the "wrong way". Still an improvement over
select/table.pack, perhaps?

-- Hisham

[1] new form is introduced to do something with better semantics
(===), old one can't be removed/updated due to backwards compatibility
(==).

Reply | Threaded
Open this post in threaded view
|

Re: Nil in table proposal

pocomane
In reply to this post by Roberto Ierusalimschy
On Wed, Mar 21, 2018 at 2:17 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

> This is a nice rational study of the situation. However, there are at least
> two other constructions where we can distinguish the two cases:
>
> 1) Garbage collection of the corresponding key. In your example, can
> 'a' be collected? (We can argue that this case follows from iteration:
> if 'a' appears in an iteration, it must be kept; otherwise, it could be
> removed.)
>
> 2) Metamethods: is 't[a]' "empty" for the point of view of calling
> metamethods? An access to t[a] should call __index? An assignment
> should call __newindex? (This is big difference and independent of
> iterations.)

True, and I am actually quite curious to understand what to do with my
proposal in the scenario 2). Again, to save you a log post: I think
__index and __newindex should be called every time the content is
missing OR it is just nil.

Let's talk about __newindex. Suppose to use a metatable with
__newindex that just log the call and dipatches to rawset.

```
local t = setmetatable({}, meta)
set_normal_mode(t)
t[1] = true -- call __newindex
t[1] = nil  -- __newindex not called
t[1] = true -- call __newindex [1]

local t = setmetatable({}, meta)
set_keep_mode(t)
t[1] = true -- call __newindex
t[1] = nil  -- __newindex not called
t[1] = true -- ? [2]

local t = setmetatable({}, meta)
set_keep_mode(t)
t[1] = true -- call __newindex
t[1] = nil  -- __newindex not called
set_normal_mode(t)
t[1] = true -- ? [3]

local t = setmetatable({}, meta)
set_normal_mode(t)
t[1] = true -- call __newindex
t[1] = nil  -- __newindex not called
set_keep_mode(t)
t[1] = true -- call __newindex [4]
```

The only clear case is [1], since it must behave like the current lua.
[4] also is simple to decide: for how I put the proposal rules, the
content was removed before any set_keep_mode or assignment, so it must
"Call __newindex".

A good reasoning for [2] seems to be: you are in keep-mode, so nil
does not means "empty", so do "Not call __newindex".

For the same reason, in [3] one should "Call __newindex" as in [1].
Just because the table is in normal-mode.

But these behaviours are a bit incoherent: if [2] does "Not call
__newindex" what happended just before [3]? Why the behaviour changes?
It seems that set_normal_mode is not just a flag setting. Does it
immediately remove all the nils? Or in normal mode the nil are removed
also just before an assigment (of a non-nil value!) ? [*]

The main issue I have with all this, is that there are (other)
explicit difference between keep-mode and normal-mode. I would prefer
to keep it as much as possible an implementation detail. In other
words, it breaks the "Empty or Nil? Who cares?" feature.

For these reasons I am more prone to call __newindex in all the
previous cases. And since a similar reasoning can be done with
__index, the full rules set could be:

- Table can act in two mode "Normal mode" and "Keep mode" according to
a flag in its metatable
- Keep-mode tables store nil and it never deletes contents
- Normal-mode tables will deletes contents every times a nil is
assigned or read (as its content).
- __index and __newindex are always called if the content is missing
or it is nil, regardless the table mode.

And I still prefer to not provide any way to directly check empty vs
nil-content, neither for the keep-mode.

Pocomane

[*] Actually, I think that this last option is the second-best solution.

Reply | Threaded
Open this post in threaded view
|

Re: Nil in table proposal

pocomane
In reply to this post by Hisham
On Thu, Mar 22, 2018 at 1:07 PM, Hisham <[hidden email]> wrote:
> Under pocomane's proposal, I would expect the behavior of metamethods
> to follow from the table mode: for normal tables, the 5.3 behavior; in
> keep-mode, a "stored" nil would be treated as an existing value, so
> t[a] set to nil wouldn't trigger metamethods on access or update.

I wrote about this in the other mail, just to sum up: I think it is
better to reduce as much a possible the different behaviours between
the two modes. So I propose to call the metamethod both if the field
is missing or it is nil.

On Thu, Mar 22, 2018 at 1:07 PM, Hisham <[hidden email]> wrote:
> The only downside is again the "JavaScript equality effect" [1]:
> `keep{...}` would become the "right way" to collect arguments and the
> obvious one would remain the "wrong way". Still an improvement over
> select/table.pack, perhaps?

I found this a bit ugly too. Infact:

On Wed, Mar 21, 2018 at 2:00 PM, pocomane <[hidden email]> wrote:
> [1] This is for compatibility. If it was a completely new language,
> probably, I would use the keep-mode as the default.

Maybe in lua 6.0 ? :P

Being serious, the Roberto's obsarvation about the __gc, make me think
that using the keep-mode as default could rise the following issue.
All the tables used as keys of another one will never collected unless
one of the following happen
1) the container is deleted
2) the container is switched to normal-mode and the key/value is set to nil
3) the key/value is set to nil, then the table is switched to
normal-mode, then an iteration hit the key

The terrible thing is that the option 3) is the only avaialble If any
other reference to the key is lost (and the container needs to be
kept). This affects also the normal-mode-as-default: a table can be
switched to keep-mode and then it can be switched-back to the
normal-mode.

I am still thinking about this...