[BUG] table will be garbage collected multiple times

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[BUG] table will be garbage collected multiple times

Gregor Burghard
Hello everyone,

here is a bug for Lua >= 5.3.0:

When the finalizer method of a table resets the metatable of the same
table, it will not be deleted after finalization. That means the table
still exists and will be garbage collected again.

The following code demonstrates the bug.

https://pastebin.com/LUbnueut


pEpkey.asc (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Francisco Olarte
Gregor:

On Sun, Nov 18, 2018 at 9:51 AM Gregor Burghard <[hidden email]> wrote:
> here is a bug for Lua >= 5.3.0:
> When the finalizer method of a table resets the metatable of the same
> table, it will not be deleted after finalization. That means the table
> still exists and will be garbage collected again.

A more knowledgeable person may be give better details, but given
https://www.lua.org/manual/5.3/manual.html#2.5.1 says

<<You mark an object for finalization when you set its metatable and
the metatable has a field indexed by the string "__gc". >>

and a bit later..

<<Moreover, if the finalizer marks a finalizing object for
finalization again, its finalizer will be called again in the next
cycle where the object is unreachable>>

Seems like documented / expected behaviour to me ( a bit weird, but I
assume there is a reason, I can think of a couple of them ).

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
I agree, the example shown marks the same object for finalization in the next cycle by setting again its metatable with the MT table that has a declared __gc key mapped to the function; when it is called (the object is being finalized), the finalizer here unconditionnally marks the object for being finalized in a later cycle. So the finalizer will be called indefinitely, once at each cycle.

  1. function MT:__gc()
  2.   self.cnt = self.cnt + 1
  3.   if self.cnt == 1 then
  4.     print("finalizing...")
  5.   else
  6.     print("and again...")
  7.   end
  8.   setmetatable(self, MT) -- here the error occurs
  9. end
This should be:
  1. function MT:__gc()
  2.   self.cnt = self.cnt + 1
  3.   if self.cnt == 1 then
  4.     print("finalizing...")
  5.     setmetatable(self, MT) -- mark for later finalization
  6.   else
  7.     print("and again...")
  8.   end
  9. end
This way only the first finalization (self.cnt==1) will  displaying "finalizing..."  and mark the object to be finalized again later (by restoring its metatable); the next cycle one will cause the finalizer to be called again but then self.cnt will be 2 and you'll get the message "and again...", but now the object is not marked to be finalized again, so it will be effectively collected (you should no longer see the "and again..." message more than once after the single "finalizing..." message).

It's not very well documented, but when a finalizer gets called on an object, just before calling it, the GC first clears the associated metatable if the object being finalized is a table: in the finalizer for an object whose type is 'table' or 'userdata', if you use getmetatable(self), it's not documented clearly if either you'll get nil, or you'll get the same metatable whose "__gc" entry is now nill, something that should be better, allowing you to store the "cnt" variable inside the metatable itself along with the "__gc" variable, instead of the object being finalized).


Le dim. 18 nov. 2018 à 12:08, Francisco Olarte <[hidden email]> a écrit :
Gregor:

On Sun, Nov 18, 2018 at 9:51 AM Gregor Burghard <[hidden email]> wrote:
> here is a bug for Lua >= 5.3.0:
> When the finalizer method of a table resets the metatable of the same
> table, it will not be deleted after finalization. That means the table
> still exists and will be garbage collected again.

A more knowledgeable person may be give better details, but given
https://www.lua.org/manual/5.3/manual.html#2.5.1 says

<<You mark an object for finalization when you set its metatable and
the metatable has a field indexed by the string "__gc". >>

and a bit later..

<<Moreover, if the finalizer marks a finalizing object for
finalization again, its finalizer will be called again in the next
cycle where the object is unreachable>>

Seems like documented / expected behaviour to me ( a bit weird, but I
assume there is a reason, I can think of a couple of them ).

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

nobody
On 18/11/2018 15.34, Philippe Verdy wrote:
> It's not very well documented, but when a finalizer gets called on an
> object, just before calling it, the GC first clears the associated
> metatable if the object being finalized is a table: in the finalizer
> for an object whose type is 'table' or 'userdata', if you use
> getmetatable(self), it's not documented clearly if either you'll get
> nil, or you'll get the same metatable whose "__gc" entry is now nill,
> something that should be better, allowing you to store the "cnt"
> variable inside the metatable itself along with the "__gc" variable,
> instead of the object being finalized).

That's complete nonsense.  Any modification of the metatable would be
unsafe as these are commonly used on several objects (though not in this
example), so the collection / finalization of the first such object
would break the finalization of all other objects with the same shared
metatable.

See §2 of https://www.lua.org/manual/5.3/manual.html#2.5.1 which says:

> For an object (table or userdata) to be finalized when collected, you
> must mark it for finalization. You mark an object for finalization
> when you set its metatable and the metatable has a field indexed by
> the string `"__gc"`. Note that if you set a metatable without a
> `__gc` field and later create that field in the metatable, the object
> will not be marked for finalization.

And §3

> When a marked object becomes garbage, it is not collected immediately
> by the garbage collector. Instead, Lua puts it in a list. After the
> collection, Lua goes through that list. For each object in the list,
> it checks the object's __gc metamethod: If it is a function, Lua
> calls it with the object as its single argument; if the metamethod is
> not a function, Lua simply ignores it.

And further §5

> Because the object being collected must still be used by the
> finalizer, that object (and other objects accessible only through
> it) must be resurrected by Lua. Usually, this resurrection is
> transient, and the object memory is freed in the next
> garbage-collection cycle. However, if the finalizer stores the object
> in some global place (e.g., a global variable), then the resurrection
> is permanent. Moreover, if the finalizer marks a finalizing object
> for finalization again, its finalizer will be called again in the
> next cycle where the object is unreachable. In any case, the object
> memory is freed only in a GC cycle where the object is unreachable
> and not marked for finalization.

(I wouldn't call that "not very well documented"…)

Rehashed in other (simpler?) words:

If, when you setmetatable(), there's _anything_ non-nil at `__gc` in the
metatable, the thing gets flagged for finalization.  (This is a property
of the table/userdata, not the metatable.)

When the thing is later collected and it has the "to be finalized" bit
set, this bit is cleared and, if _at this point_ the value at `__gc` in
the metatable is a function, that function gets run.

(And no matter what it'll do, the object survives until the next
collection.  Now _usually_, the "to be finalized" bit isn't re-enabled
by the `__gc` method and so the thing will be collected normally by the
next cycle… but you can re-flag it (by again calling setmetatable()
using a metatable with a `__gc` field), and even keep it around
indefinitely in an "undead" state – it's "dead" / fully unreachable from
the rest of the Lua state (hooks don't run during `__gc`), but it can
still do arbitrary stuff with the state.)


A fun / silly use of that is to make the computer beep on every
collection cycle:

setmetatable( {}, { __gc = function(t)
   io.stderr:write("\7") ; setmetatable(t,getmetatable(t)) end }
)

(This is easy to pre-load via the `-e` / `-l` options, and might be
useful for debugging… in fact, the Lua tests do something similar, just
writing a '.' for every collection instead of making it beep.)


You might also (ab)use this to trigger bookkeeping tasks (once per GC
cycle), if you have no better way to do that.  (A fixed "every $n
invocations of a function" scheme might not work (it could fire _both_
too rarely and too often, at different times), and in certain restricted
situations (games etc.), this might be as good as it gets… but note that
this is slightly racy – _any_ allocation can trigger a GC cycle, so
protect your data structures / make sure you're not reading inconsistent
state when triggered in the middle of some change.)


And of course there's LOTS of other stuff that you can do…

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Muh Muhten
On 11/18/18, nobody <[hidden email]> wrote:
> You might also (ab)use this to trigger bookkeeping tasks (once per GC
> cycle), if you have no better way to do that.  (A fixed "every $n
> invocations of a function" scheme might not work (it could fire _both_
> too rarely and too often, at different times), and in certain restricted
> situations (games etc.), this might be as good as it gets… but note that
> this is slightly racy – _any_ allocation can trigger a GC cycle, so
> protect your data structures / make sure you're not reading inconsistent
> state when triggered in the middle of some change.)

A workaround for this can be found in classical signal-handling
techniques: the GC action can set a flag to be checked elsewhere to
determine whether or not the bookkeeping needs to be done. Of course,
whether GC cycles are the right thing to count is a different
matter...

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Andrew Gierth
In reply to this post by Philippe Verdy
>>>>> "Philippe" == Philippe Verdy <[hidden email]> writes:

 Philippe> It's not very well documented, but when a finalizer gets
 Philippe> called on an object, just before calling it, the GC first
 Philippe> clears the associated metatable if the object being finalized
 Philippe> is a table: in the finalizer for an object whose type is
 Philippe> 'table' or 'userdata', if you use getmetatable(self), it's
 Philippe> not documented clearly if either you'll get nil, or you'll
 Philippe> get the same metatable whose "__gc" entry is now nill,
 Philippe> something that should be better, allowing you to store the
 Philippe> "cnt" variable inside the metatable itself along with the
 Philippe> "__gc" variable, instead of the object being finalized).

I really don't know where you're getting this stuff, but it's all wrong.

The metatable of an object isn't changed in any way by garbage
collection, unless the __gc method chooses to do that itself (which is
sometimes a good idea, especially for userdatas where allowing method
calls on a post-finalization object(*) may be unwise). Inside the __gc
function for an object (whether table or userdata), getmetatable(self)
has the same value it had just before the object was collected, and
getmetatable(self).__gc is the __gc method being executed. Nor is the
metatable changed after the __gc method completes.

(*) - references to post-finalization objects can be easily obtained via
      the keys of ephemeron tables

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
In reply to this post by nobody
So If I uynderstand well, the metatable or table is not changed, what is changed is only the presence of the object in the "list" of objects to be finalized, which is filled at start of the mark phase with all known objects, then removed from the list when they are reached from the stack and marked as reachable

At end of the mark phase it remains a list of of unreachable objects that will ned to be finalized; then the finalization step starts which takes each object from the list and removes it, then it calls the finalizer if there's one; but is there any action in the finalizer that determines that the object will then be sweeped?

The ONLY action I see is the fact that it calls setmetatable(); you are saying that this does NOT change the metatable, strange!

But it must also make something else and will then mark the object to be not sweeped; however the call to setmetatable is not the end of the finalizer which has still not returned to the GC sweeper; the finalizer may still change the state of the metatable *after* calling setmetatable(), so it could still set or remove its "__gc" entry. And there will nothing else happening before the finalizer returns, so there will be nothing that can actually set the required bit/flag property in the object itself properly. Let's suppose that the GC then inspects the metatable at end to see if there's a __gc entry mapped to a function: how can it determine that the function called setmetatable() or changed the entry in its metatable and differentiate it from the action of a finalizer that did nothing at all? There must be an action taken by the finalizer to effectively indicate to the GC that the object must not be sweeped and marked for later finalization.

The finalizer may also resurrect that object by linking it to another "live" object (i.e. a reachable object that has already been marked) and also will not call any setmetatable, but it can also stil lset or reset the __gc entry of its existing metatable.

All we know is that an object has a "state" which is active but still not marked (possible only at start or during the marking phase, impossible during the swep phase), active but marked, dead to finalize, finalized to sweep, or resurrected (to be made active but still not marked again at end of the sweep phase). This state is not enough to determine if what a finalizer does (or does not do) will cause the object to be swept or to be finalized again later.

The only reliable info is that, just before calling the finalizer, the GC will clear the link of the object to its metatable: it is then up to the finalizer to reattach the metatable by calling setmetatable with a suitable __gc entry attached to a finalizer function (not necessarily the same function as the current finalizer itself). If there's no such call to set metatable, or if the finalizer clears the __gc entry or sets it to a non-function, and if the object has not been resurrected by the finalizer by linking it to a object with a "marked" status or an object with a "dead to finalize" status (processed later in the same sweep cycle, then the object will be swept by the GC just after the finalizer has returned.

That's what is not clearly documented: what is the effective status of the object which differentiates an object being finalized to indicate to the GC that it must not be swept after calling the finalizer? There must be an action taken by the finalizer itself, but by default if this action is not taken by the finalizer, then the finalization will be immediately followed by sweeping.

And I only see the fact for a finalizer of calling setmetatable() to set or restore the metatable which was detached from the object by the GC just before calling the finalizer, simply by clearing the internal pointer to the object's metatable, so when the finalizer will call setmetatable() to set it to a non-nil value, this will have the desired effect of indicating to the GC that the object must not be finalized

E.g.:
- a TCP network session socket that has been closed but is still kept for about one minute in FIN_WAIT state, during which that socket may still be resurrected, in order to reuse its allocated port number and allow fast restart with its existing reception/transmission windows and MTU: this can be useful for security against DOS attacks to avoid a server to eat all its port number resources, but also for privacy reason to secure all sessions
- another usage is to allow closed files to have some delays before they get flushed physically, or because the flush itself may be long and may need to be tested and retried several times, before abandoning and logging some severe errors to inform the user or the program itself that something bad happened aynchronously without forcing the close() to be blocking until flushing is fully completed.
- another usage may be to delay the power down of a previously used device (e.g. turning off a screen display after several minutes when there was no longer any new message to display), because turning on the device may be very lengthy if it was turned off immediately after a close).
- another usage may be to unallocate other OS or external resources (e.g. returning local memory used by Lua to the OS, by forcing all "weak" objects to be deallocated, including for example caches, or deleting caches stored in the filesystem that have expired a "grace delay" where they can still be reused)
- another usage would be to start a reorganization/optimization/defragmentation of the storage, or physicallly storage entries that are no longer in use: this could be I/O intensive on large volumes, and such clearing will be done after a grace period, where it will be more easily performed with lower impact by performing it sequentially instead of in random order on disk)
Basically finalizers are there to delay operations that can be postoned without blocking the program that no longer needs immediately an object. It still allows a program to reconstruct the object (notably weak" objects for caches much faster if the underlying structures were not cleared and their finalization was delayed for a grace period.

What you quoite explains is just that there are lists of objects from which candidates are extracted, but it still does not indicate clearly which action a finalizer takes to effectively change the state of the object so that the GC will not sweep it when the finalizer will return. The GC must then have already modified the state of the object (to indicate that it MUST be swept) just before calling the finalizer and the finalizer takes an optional decision to change again that state and indicate that now it MUST NOT be swept by the GC: te finalizer itself cannot change the various lists of objects maintained only by the GC itself, it cannot change its "generation" models if generations are used in Lua 5.4 to subdivide the lists of objects in smaller subsets, where GC and finalization will be faster on live objects than objects in older generations that have survived more than 1 cycle and are less likely of not needing to be swept rapidly).


Le dim. 18 nov. 2018 à 23:27, nobody <[hidden email]> a écrit :
On 18/11/2018 15.34, Philippe Verdy wrote:
> It's not very well documented, but when a finalizer gets called on an
> object, just before calling it, the GC first clears the associated
> metatable if the object being finalized is a table: in the finalizer
> for an object whose type is 'table' or 'userdata', if you use
> getmetatable(self), it's not documented clearly if either you'll get
> nil, or you'll get the same metatable whose "__gc" entry is now nill,
> something that should be better, allowing you to store the "cnt"
> variable inside the metatable itself along with the "__gc" variable,
> instead of the object being finalized).

That's complete nonsense.  Any modification of the metatable would be
unsafe as these are commonly used on several objects (though not in this
example), so the collection / finalization of the first such object
would break the finalization of all other objects with the same shared
metatable.

See §2 of https://www.lua.org/manual/5.3/manual.html#2.5.1 which says:

> For an object (table or userdata) to be finalized when collected, you
> must mark it for finalization. You mark an object for finalization
> when you set its metatable and the metatable has a field indexed by
> the string `"__gc"`. Note that if you set a metatable without a
> `__gc` field and later create that field in the metatable, the object
> will not be marked for finalization.

And §3

> When a marked object becomes garbage, it is not collected immediately
> by the garbage collector. Instead, Lua puts it in a list. After the
> collection, Lua goes through that list. For each object in the list,
> it checks the object's __gc metamethod: If it is a function, Lua
> calls it with the object as its single argument; if the metamethod is
> not a function, Lua simply ignores it.

And further §5

> Because the object being collected must still be used by the
> finalizer, that object (and other objects accessible only through
> it) must be resurrected by Lua. Usually, this resurrection is
> transient, and the object memory is freed in the next
> garbage-collection cycle. However, if the finalizer stores the object
> in some global place (e.g., a global variable), then the resurrection
> is permanent. Moreover, if the finalizer marks a finalizing object
> for finalization again, its finalizer will be called again in the
> next cycle where the object is unreachable. In any case, the object
> memory is freed only in a GC cycle where the object is unreachable
> and not marked for finalization.

(I wouldn't call that "not very well documented"…)

Rehashed in other (simpler?) words:

If, when you setmetatable(), there's _anything_ non-nil at `__gc` in the
metatable, the thing gets flagged for finalization.  (This is a property
of the table/userdata, not the metatable.)

When the thing is later collected and it has the "to be finalized" bit
set, this bit is cleared and, if _at this point_ the value at `__gc` in
the metatable is a function, that function gets run.

(And no matter what it'll do, the object survives until the next
collection.  Now _usually_, the "to be finalized" bit isn't re-enabled
by the `__gc` method and so the thing will be collected normally by the
next cycle… but you can re-flag it (by again calling setmetatable()
using a metatable with a `__gc` field), and even keep it around
indefinitely in an "undead" state – it's "dead" / fully unreachable from
the rest of the Lua state (hooks don't run during `__gc`), but it can
still do arbitrary stuff with the state.)


A fun / silly use of that is to make the computer beep on every
collection cycle:

setmetatable( {}, { __gc = function(t)
   io.stderr:write("\7") ; setmetatable(t,getmetatable(t)) end }
)

(This is easy to pre-load via the `-e` / `-l` options, and might be
useful for debugging… in fact, the Lua tests do something similar, just
writing a '.' for every collection instead of making it beep.)


You might also (ab)use this to trigger bookkeeping tasks (once per GC
cycle), if you have no better way to do that.  (A fixed "every $n
invocations of a function" scheme might not work (it could fire _both_
too rarely and too often, at different times), and in certain restricted
situations (games etc.), this might be as good as it gets… but note that
this is slightly racy – _any_ allocation can trigger a GC cycle, so
protect your data structures / make sure you're not reading inconsistent
state when triggered in the middle of some change.)


And of course there's LOTS of other stuff that you can do…

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
In reply to this post by Andrew Gierth


Le lun. 19 nov. 2018 à 02:24, Andrew Gierth <[hidden email]> a écrit :
Inside the __gc
function for an object (whether table or userdata), getmetatable(self)
has the same value it had just before the object was collected, and
getmetatable(self).__gc is the __gc method being executed. Nor is the
metatable changed after the __gc method completes.

Nothing forbids the getmetatable(self) to return the metatable of the object it had before it was collected: self is referencing the closure in which the finalizer was created and is independant of the object itself (a finalizer function may be created and set in a (meta)table long before the object is created and the m(eta)table is associated with the object by using setmetatable(object, (meta)table);

Self is not the same as the object (o) passed in the 1st parameter of the finalizer, whose metatable may still have been cleared.

As well nothing forbids getmetatable(o) to return the effective metatable of the object (o) before it was modified by the GC: the GC prepares an environment for calling the finalizer, in which the getmetatable function will be found that will still be able to return that value that the GC has kept.

But in my opinion all this is unnecessarily complicate. It would be much simpler if the finalizer indicated to the GC that the object must not be swept, by just returning a non-nil value. If the finalizer does not thing, or reaches the end of its code without using a return instruction, or if it uses "return nil", the effect is the same: the first upvalue returned will be nil and the object must then be swept.
If the finalizer just "return true", it clearly indicates that the object must be kept; the GC does not have to create a specific environment, the finalizer does not have to inspect the object state on return, does not have to track the usage of setmetatable() by the finalizer.
The GC will take its decision to sweep or keep the object only by looking at the first upvalue returned by the finalizer. This is much clearer! And the GC does not even have to change any internal state of the object before calling the finalizer, so this is also more efficient.
Finalizers could also return other interesting status for the object to keep (e.g. indicating not just the fact that it must be kept, but also, for example returning if it should be kept in the active generation or placed in an older generation (to be finalized later, but much less urgently: e.g. if it returns false it is kept in the current generation, if it returns true, it is kept in the older generation, for Lua implementations that support generations in their GC; the non-nil return value is then a hint given to the GC about what to do with the preserved object, because that object will be finalized again and again, at every GC-cycle, if the finalizer constantly returns a non-nil value when it is called ! The hint can be used to reduce the frequency of calls to this finalizer, which is acting like a coroutine running most often at unpredictable times, but indefinitely without ever really terminating as long as it returns a non-nil value or does any other action indicating to the GC that the dead object must be kept).


Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Gé Weijers
You'd think that Lua source code is unavailable if you read this thread. The finalizer implementation is fairly simple to understand:
  • if you call 'setmetatable' on an object, and the metatable has a __gc field the object is moved from the 'allgc' list to the 'finobj' list. The FINALIZEDBIT is set in the object's GC state.
  • when the object becomes unreachable it is moved back to the 'allgc' list, and the FINALIZEDBIT is reset. The finalizer is then called.
  • when the object becomes unreachable again (i.e. when it's on the 'allgc' list and there are no references to it) it's freed.
If you call 'setmetatable' in the finalizer, and that metatable has a '__gc' field the object is moved back to the 'finobj' list, so the process starts again and the object is never freed.

Observations:
  • an object is only finalized once, unless you explicitly call 'setmetatable' again to set the finalizer.
  • it's not a bug, the original poster's program explicitly requests that the object be finalized again using 'setmetatable'. The behavior is consistent with the documentation.
  • there are other things you can do in the __gc function, like storing a reference to the object somewhere so it stays reachable. The finalizer has run, but the object will have to be kept around anyway because it just became reachable again. Having the finalizer 'decide' what to do is therefor unsafe. Freeing an object whose finalizer has not run yet requires two GC cycles, at the end of the first one the finalizer runs, at the end of the second one the object is freed.
  • It's not clear to me what the use case is for changing the metatable in a finalizer to begin with. In the normal case you'd expect the object to be unreachable after the finalizer returns, so it will be freed in the next pass.
"It's not a bug, it's a feature!"

--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Andrew Gierth
>>>>> "Gé" == Gé Weijers <[hidden email]> writes:

 Gé> - It's not clear to me what the use case is for changing the
 Gé> metatable in a finalizer to begin with. In the normal case you'd
 Gé> expect the object to be unreachable after the finalizer returns, so
 Gé> it will be freed in the next pass.

When dealing with userdatas in particular, it can be useful to reset the
metatable in the finalizer to nil or to a dummy metatable, because there
is a specific case where the object is _NOT_ unreachable after the
finalizer returns: if it's a key in an ephemeron table. (The ephemeron
table entry is not removed until the object is collected on the next
pass; so it's important that either the __gc method for a userdata
leaves the object data in a state that won't crash in the event of
further access to the object, or that it removes or replaces the
metatable.)

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Gé Weijers


On Mon, Nov 19, 2018 at 12:19 PM Andrew Gierth <[hidden email]> wrote:
When dealing with userdatas in particular, it can be useful to reset the
metatable in the finalizer to nil or to a dummy metatable, because there
is a specific case where the object is _NOT_ unreachable after the
finalizer returns [...]

I just figured that out, so it does make sense to remove the metatable if your finalizer leaves your object in an unsafe state otherwise. I typically have an explicit method that releases the resources used by a userdata object (object:close() or similar), so I don't particularly care about accesses via an ephemeron table.

Thanks,

--
--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
In reply to this post by Gé Weijers
So in summary, it is the effectively fact that we call "setmetatable(o,mt)" on the object (o) passed in parameter to the finalizer that effectively changes its state (a bit in the object).
In my opinion, this is not the best way to handle it 
- this requires a specific behavior of setmetatable: it inspects the metatatable to see if there's a __gc function in it, then sets the FINALIZEDBIT if it is so, then returns to the finalizer, which may still change the metatable after this.
- the only safe behavior would be that the finalizer **returns** an effective status. For now finalizer are functions that return nothing (or nil), they could return a non-nil value (probably non-false as well, e.g. true) to indicate its desire to indicate that the FINALIZEDBIT must be CLEARED on the object. A finalizer that does nothing (an empty function) would not return anything, so the GC can still safely look at the metadata at this time, see if it has a __gc function, and if not it will set the FINALIZEDBIT (allowing the object to be swept and freed).


Le lun. 19 nov. 2018 à 19:54, Gé Weijers <[hidden email]> a écrit :
You'd think that Lua source code is unavailable if you read this thread. The finalizer implementation is fairly simple to understand:
  • if you call 'setmetatable' on an object, and the metatable has a __gc field the object is moved from the 'allgc' list to the 'finobj' list. The FINALIZEDBIT is set in the object's GC state.
  • when the object becomes unreachable it is moved back to the 'allgc' list, and the FINALIZEDBIT is reset. The finalizer is then called.
  • when the object becomes unreachable again (i.e. when it's on the 'allgc' list and there are no references to it) it's freed.
If you call 'setmetatable' in the finalizer, and that metatable has a '__gc' field the object is moved back to the 'finobj' list, so the process starts again and the object is never freed.

Observations:
  • an object is only finalized once, unless you explicitly call 'setmetatable' again to set the finalizer.
  • it's not a bug, the original poster's program explicitly requests that the object be finalized again using 'setmetatable'. The behavior is consistent with the documentation.
  • there are other things you can do in the __gc function, like storing a reference to the object somewhere so it stays reachable. The finalizer has run, but the object will have to be kept around anyway because it just became reachable again. Having the finalizer 'decide' what to do is therefor unsafe. Freeing an object whose finalizer has not run yet requires two GC cycles, at the end of the first one the finalizer runs, at the end of the second one the object is freed.
  • It's not clear to me what the use case is for changing the metatable in a finalizer to begin with. In the normal case you'd expect the object to be unreachable after the finalizer returns, so it will be freed in the next pass.
"It's not a bug, it's a feature!"

--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
This proposed behavior would have no impact oin finalizers:
- they can still use getmetatable(o) and see the unmodiied metatable of the object (no need then for the GC to temporarily set clear it or set it to a dummy (different) metatable.
- they can use setmetatable(o,mt) as they want: no quirk needed in the implementation of setmetatable (whic hthen does not need to care about the fact a finalization of the object is pending)
- all is determined ONLY when the finalizer returns.

A simple finalizer like:
  setmetatable(o, {__gc =function() return true end})
or even just:
  setmetatable(o, __gc =true)
(if we also honor a boolean value of __gc as equivalent to a function returning a boolean) would be enough to say that the object "o" must NEVER be finalized. (the alternative using a boolean would avoid the need to perform any costly calls in repeated attempts to finalize the object, such object would not even be in a finalization list, the GC would then automatically consider the object as "marked", and reachable, the object will never be freed, i.e. will remain permanent in memory)


Le mar. 20 nov. 2018 à 18:42, Philippe Verdy <[hidden email]> a écrit :
So in summary, it is the effectively fact that we call "setmetatable(o,mt)" on the object (o) passed in parameter to the finalizer that effectively changes its state (a bit in the object).
In my opinion, this is not the best way to handle it 
- this requires a specific behavior of setmetatable: it inspects the metatatable to see if there's a __gc function in it, then sets the FINALIZEDBIT if it is so, then returns to the finalizer, which may still change the metatable after this.
- the only safe behavior would be that the finalizer **returns** an effective status. For now finalizer are functions that return nothing (or nil), they could return a non-nil value (probably non-false as well, e.g. true) to indicate its desire to indicate that the FINALIZEDBIT must be CLEARED on the object. A finalizer that does nothing (an empty function) would not return anything, so the GC can still safely look at the metadata at this time, see if it has a __gc function, and if not it will set the FINALIZEDBIT (allowing the object to be swept and freed).


Le lun. 19 nov. 2018 à 19:54, Gé Weijers <[hidden email]> a écrit :
You'd think that Lua source code is unavailable if you read this thread. The finalizer implementation is fairly simple to understand:
  • if you call 'setmetatable' on an object, and the metatable has a __gc field the object is moved from the 'allgc' list to the 'finobj' list. The FINALIZEDBIT is set in the object's GC state.
  • when the object becomes unreachable it is moved back to the 'allgc' list, and the FINALIZEDBIT is reset. The finalizer is then called.
  • when the object becomes unreachable again (i.e. when it's on the 'allgc' list and there are no references to it) it's freed.
If you call 'setmetatable' in the finalizer, and that metatable has a '__gc' field the object is moved back to the 'finobj' list, so the process starts again and the object is never freed.

Observations:
  • an object is only finalized once, unless you explicitly call 'setmetatable' again to set the finalizer.
  • it's not a bug, the original poster's program explicitly requests that the object be finalized again using 'setmetatable'. The behavior is consistent with the documentation.
  • there are other things you can do in the __gc function, like storing a reference to the object somewhere so it stays reachable. The finalizer has run, but the object will have to be kept around anyway because it just became reachable again. Having the finalizer 'decide' what to do is therefor unsafe. Freeing an object whose finalizer has not run yet requires two GC cycles, at the end of the first one the finalizer runs, at the end of the second one the object is freed.
  • It's not clear to me what the use case is for changing the metatable in a finalizer to begin with. In the normal case you'd expect the object to be unreachable after the finalizer returns, so it will be freed in the next pass.
"It's not a bug, it's a feature!"

--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
As well:
  setmetatable(o, {__gc =false})
would mean that the object can be finalized immediately: it would make the object explicitly weak
It would still need to be marked: if  it's still reachable, including notably in the context where the previous statemyn is used, where it is still reachable via (o) so it cannot be finalized before (o) gets out of scope and no other references to (o) remains. So if (o) is marked by the mark phase, it cannot be put into the finalization list.

So:
  setmetatable(o, {__gc =false})
would be mostly equivalent to:
  setmetatable(o, {__gc =nil})
or:
  setmetatable(o, {})
or:
  setmetatable(o, nil)

May be we can imagine another use for __gc=false (notably with a generation-based GC: meaning don't finalize in the current generation, but keep the object in the older generation, in which case __gc will be reset automatically to nil, and then the GC running on the older generation will allow finalizing it at this time: when the object will be old)

We could also tweak the value given to __gc (or the value returned by the function) to mean we want the object to be part of specific generations identifiable as any object;

This would be useful for example to create different pools, notably for caches (Lua appliations would be able to manage efficiently their "cache eviction policy", which is something very important to avoid DOS attacks that attempt to clear caches used by concurrent threads, and to avoid time attacks similar to Meltdown, measuring the time to honor requests, which is shorter if an object is still in cache than when it is not because the object has to be reconstructed, meaning that a third party can know if an object was recently used by another thread).

To avoid Meltdown-like attacks, we must be able to restrict the cache eviction by "segregating pools in caches": different pools are allocated for different security contexts or different threads (Meltdown is not affecting just CPUs, it concerns all computing systems that manage caches, notably thoise using the very common LRU eviction policy). The GC in Lua can easily become an easy target of Meltdown and DOS attacks if the Lua-written software is used to service many users on the internet.


Le mar. 20 nov. 2018 à 18:55, Philippe Verdy <[hidden email]> a écrit :
This proposed behavior would have no impact oin finalizers:
- they can still use getmetatable(o) and see the unmodiied metatable of the object (no need then for the GC to temporarily set clear it or set it to a dummy (different) metatable.
- they can use setmetatable(o,mt) as they want: no quirk needed in the implementation of setmetatable (whic hthen does not need to care about the fact a finalization of the object is pending)
- all is determined ONLY when the finalizer returns.

A simple finalizer like:
  setmetatable(o, {__gc =function() return true end})
or even just:
  setmetatable(o, __gc =true)
(if we also honor a boolean value of __gc as equivalent to a function returning a boolean) would be enough to say that the object "o" must NEVER be finalized. (the alternative using a boolean would avoid the need to perform any costly calls in repeated attempts to finalize the object, such object would not even be in a finalization list, the GC would then automatically consider the object as "marked", and reachable, the object will never be freed, i.e. will remain permanent in memory)


Le mar. 20 nov. 2018 à 18:42, Philippe Verdy <[hidden email]> a écrit :
So in summary, it is the effectively fact that we call "setmetatable(o,mt)" on the object (o) passed in parameter to the finalizer that effectively changes its state (a bit in the object).
In my opinion, this is not the best way to handle it 
- this requires a specific behavior of setmetatable: it inspects the metatatable to see if there's a __gc function in it, then sets the FINALIZEDBIT if it is so, then returns to the finalizer, which may still change the metatable after this.
- the only safe behavior would be that the finalizer **returns** an effective status. For now finalizer are functions that return nothing (or nil), they could return a non-nil value (probably non-false as well, e.g. true) to indicate its desire to indicate that the FINALIZEDBIT must be CLEARED on the object. A finalizer that does nothing (an empty function) would not return anything, so the GC can still safely look at the metadata at this time, see if it has a __gc function, and if not it will set the FINALIZEDBIT (allowing the object to be swept and freed).


Le lun. 19 nov. 2018 à 19:54, Gé Weijers <[hidden email]> a écrit :
You'd think that Lua source code is unavailable if you read this thread. The finalizer implementation is fairly simple to understand:
  • if you call 'setmetatable' on an object, and the metatable has a __gc field the object is moved from the 'allgc' list to the 'finobj' list. The FINALIZEDBIT is set in the object's GC state.
  • when the object becomes unreachable it is moved back to the 'allgc' list, and the FINALIZEDBIT is reset. The finalizer is then called.
  • when the object becomes unreachable again (i.e. when it's on the 'allgc' list and there are no references to it) it's freed.
If you call 'setmetatable' in the finalizer, and that metatable has a '__gc' field the object is moved back to the 'finobj' list, so the process starts again and the object is never freed.

Observations:
  • an object is only finalized once, unless you explicitly call 'setmetatable' again to set the finalizer.
  • it's not a bug, the original poster's program explicitly requests that the object be finalized again using 'setmetatable'. The behavior is consistent with the documentation.
  • there are other things you can do in the __gc function, like storing a reference to the object somewhere so it stays reachable. The finalizer has run, but the object will have to be kept around anyway because it just became reachable again. Having the finalizer 'decide' what to do is therefor unsafe. Freeing an object whose finalizer has not run yet requires two GC cycles, at the end of the first one the finalizer runs, at the end of the second one the object is freed.
  • It's not clear to me what the use case is for changing the metatable in a finalizer to begin with. In the normal case you'd expect the object to be unreachable after the finalizer returns, so it will be freed in the next pass.
"It's not a bug, it's a feature!"

--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Tim Hill


> On Nov 20, 2018, at 10:17 AM, Philippe Verdy <[hidden email]> wrote:
>
> generations identifiable as any object;
>
> This would be useful for example to create different pools, notably for caches (Lua appliations would be able to manage efficiently their "cache eviction policy", which is something very important to avoid DOS attacks that attempt to clear caches used by concurrent threads, and to avoid time attacks similar to Meltdown, measuring the time to honor requests, which is shorter if an object is still in cache than when it is not because the object has to be reconstructed, meaning that a third party can know if an object was recently used by another thread).
>

In al this long thread, what specific problem are you trying to solve? It’s been established that the current GC behavior is by design and not a bug. And the various proposals around changes to __gc don’t seem to offer any new functionality.

—Tim




Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Sean Conner
It was thus said that the Great Tim Hill once stated:

>
> > On Nov 20, 2018, at 10:17 AM, Philippe Verdy <[hidden email]> wrote:
> >
> > generations identifiable as any object;
> >
> > This would be useful for example to create different pools, notably for
> > caches (Lua appliations would be able to manage efficiently their "cache
> > eviction policy", which is something very important to avoid DOS attacks
> > that attempt to clear caches used by concurrent threads, and to avoid
> > time attacks similar to Meltdown, measuring the time to honor requests,
> > which is shorter if an object is still in cache than when it is not
> > because the object has to be reconstructed, meaning that a third party
> > can know if an object was recently used by another thread).
> >
>
> In al this long thread, what specific problem are you trying to solve?
> It’s been established that the current GC behavior is by design and not a
> bug. And the various proposals around changes to __gc don’t seem to offer
> any new functionality.

  As Dirk mention, Philippe may be a disciple of Bourbaki and is attempting
to bring more formalized rigor and abstraction to Lua.

  -spc (Or maybe not ... I do not with to contradict the Great Philippe
        Verdy with false inferences)

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy


Le mer. 21 nov. 2018 à 01:30, Sean Conner <[hidden email]> a écrit :
As Dirk mention, Philippe may be a disciple of Bourbaki and is attempting
to bring more formalized rigor and abstraction to Lua.
  -spc (Or maybe not ... I do not with to contradict the Great Philippe
        Verdy with false inferences)
You made a false inference, because I don't know and don't have any contact I can remember of with this "Bourbaki".

Anyway formalism in programming languages is extremely useful, it allows finding design bugs and ambiguities, it allows to make the programming language better, more predictable, more portable, and more secure.

Even if this was not an initial goal, the success of Lua will cause the language to be scrupulously analyzed to find and solve its weaknesses or inconsistencies. Lua is not finished at its 5.3 version (or the new alpha version 5.4 currently tested...).

We can expect a major version 6.0 coming next (that will need to break some compatibility with earlier implementations outside the limits that will be documented and that are still not documented at all, or just implied informally by some existing known implementations), and certainly other versions to fix newly discovered inconsistencies or portability problems.

All programmers need precise definitions of the semantics and limits of their favorite programming language, in order to know before trying to use it, if it will solve their problem, or if their own development will fail and will be finally abandonned, or will need to be significantly rewritten from nearly zero taking into account the unsuspected limits with workarounds or some complex additional library/layer.



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Sean Conner
It was thus said that the Great Philippe Verdy once stated:
> Le mer. 21 nov. 2018 à 01:30, Sean Conner <[hidden email]> a écrit :
>
> > As Dirk mention, Philippe may be a disciple of Bourbaki and is attempting
> > to bring more formalized rigor and abstraction to Lua.
> >   -spc (Or maybe not ... I do not with to contradict the Great Philippe
> >         Verdy with false inferences)
>
> You made a false inference, because I don't know and don't have any contact
> I can remember of with this "Bourbaki".

  That wasn't me, that was Dirk who made the inference, or did you miss the
"As Dirk mentioned" part?

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Gé Weijers
In reply to this post by Philippe Verdy


On Wed, Nov 21, 2018 at 12:18 PM Philippe Verdy <[hidden email]> wrote:
 
You made a false inference, because I don't know and don't have any contact I can remember of with this "Bourbaki".

Know your French mathematicians (even if they're 'virtual'):

 


--
--

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] table will be garbage collected multiple times

Philippe Verdy
I know this one, but there was nothing in your last messages that would have told me that your refered specifically to him.  I know lot of homonyms named "Bourbaki" (including in France where this name is not uncommon).

but there are others in French Wikipedia: https://fr.wikipedia.org/wiki/Bourbaki
and many other people that don't have "their" article in French Wikipedia.

My feeling when reading your messages is that you could ahave received messages on this list from some subcriber nicknamed "Bourbaki", and that's why I said "I don't know and don't have any contact with "Bourbaki" (with the explicit quotes: I could not figure who you were speaking about).

So now I can conclude you don't like this mathematician or its thesis, or the fact that he works on formalism (with very successful results which have very practical applications in many wellknown programming languages and their implementations). I can just see that he has lot of famous supporters, so I can reasonnably trust him for his work.

Le mer. 21 nov. 2018 à 21:26, Gé Weijers <[hidden email]> a écrit :


On Wed, Nov 21, 2018 at 12:18 PM Philippe Verdy <[hidden email]> wrote:
 
You made a false inference, because I don't know and don't have any contact I can remember of with this "Bourbaki".

Know your French mathematicians (even if they're 'virtual'):

 


--
--

12