forcing userdata to be gc'd?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

forcing userdata to be gc'd?

John Dunn-2
If I have a userdata that wraps a limited system resource ( ie file handle, socket, large block of C memory, etc ), is there a way to make the userdata more likely to be garbage collected? For example, if I have a userdata wrapped socket with the following code

sock = nil
for i = 1,1000 do
  sock = socket.new() -- create new userdata wrapping a socket
end

1000 sockets will likely be created before any of the non-referenced userdatas are collected. I can solve this with require careful script coding ( having a sock:close() call that has to be called when you are done with the socket, calling collectgarbage(), etc ) but I'd like a solution that is more robust than that. It seems like a way to tag something to always be garbage collected as soon as there are no longer any references might work.

John

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

nobody
On 2017-03-29 22:51, John Dunn wrote:
> If I have a userdata that wraps a limited system resource ( ie file
> handle, socket, large block of C memory, etc ), is there a way to
> make the userdata more likely to be garbage collected?

I don't think there is built-in machinery for this.  In particular,

> It seems like a way to tag something to always be garbage collected
> as soon as there are no longer any references might work.

is essentially impossible, as you'd need to scan everything to be sure
that there are no references left, which amounts to a full collection
cycle and doesn't gain you anything.  I also cannot think of a way to
build some refcounting contraption on top with no manual cues as that
would run into the same problem.

So you cannot collect only userdata.

But clever use of `collectgarbage( "step", size )` might be good enough.
Two strategies to try:

1) Naive approach:

Every time you allocate a limited resource, do a large-ish step
(potentially based on collectgarbage "count" for percentages - but then
also take a look at the pause and step multiplier).  It's not a full
collection cycle (so it's faster), but gives a good chance that things
will be collected in a timely manner.

(For the special case "large block of C memory" this is the correct
approach:  Step size is the size of that block in kilobytes.)

2) Counting:

In the constructor increment a count, and decrement in the
destructor/__gc.  Also in the constructor, check how many you have and
do some collection based on that.  (So if you have few then don't
collect at all, whereas if you're at the hard limit you'd do a full
collection(*) and hope to get some slots back.  In between, do steps of
varying size.)  Keep in mind that __gc metamethods run at the very end
of a collection phase, so expect to see bursts of releases with long
gaps in between.

(*) Or several - if a thing with __gc holds a reference to something
that holds a reference to your limited resource (nest arbitrarily
deeply), this may take longer to resolve.  While Lua will happily
traverse them all in one collection cycle, some of the __gc methods
might accidentally (or intentionally) resurrect one of the nodes in the
chain, and then it will take another full cycle to progress further.
(But don't loop until at least one was freed - if all are live that's an
endless loop.)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Francisco Olarte
In reply to this post by John Dunn-2
John:

On Wed, Mar 29, 2017 at 10:51 PM, John Dunn <[hidden email]> wrote:
> It seems like a way to tag something to always be garbage collected as soon as there are no longer any references might work.

This idea is it needs two parts. One to detect the "as soon as there
are no longer any references" plus another to "tag something to..." (
and implement the special collections for this ). If you have the
first one, why bother with the second, just _gc it instead of tagging.

Reference counting is something which must either be built deeply into
the language ( like perl or C-python ) or be implemented in a language
which gives you more control on your data ( like C++, where you can
mix manually managed, reference counting and garbage-collected data in
the same program ). I, particularly, would love to have a way to
choose ( as my objects typically split into two categories, things
like sockets, db handles, files, which do not participate in cycles (
or can easily be made not to ) and would like to have managed in a
release-fast way and memory-chunk-like data which I do not care when
it is freed. That's why I favour C++ ways, where I manage my heavy
resources with reference counting and use garbage collectors or memory
pools ( where you free the whole pool ) for things like graphs.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Florian Weimer
In reply to this post by John Dunn-2
* John Dunn:

> 1000 sockets will likely be created before any of the non-referenced
> userdatas are collected. I can solve this with require careful script
> coding ( having a sock:close() call that has to be called when you are
> done with the socket, calling collectgarbage(), etc ) but I'd like a
> solution that is more robust than that. It seems like a way to tag
> something to always be garbage collected as soon as there are no
> longer any references might work.

Unfortunately, there doesn't seem to be an efficient way to spot when
references go away immediately and still deal with cycles.  Even
ignoring the cycle problem, reference counting tends to have a higher
overhead than tracing collectors (at least for heaps of smaller
sizes).

Another option would be to trigger a garbage collection when a
resource limit is exceeded.  But this only works for EMFILE (process
file descriptor limit exceeded), not for ENFILE (system limit
exceeded).  The latter might require triggering garbage collection in
*another* process, which isn't feasible.

So explicitly freeing non-memory resources is currently the way to go.
If you control your entire application environment, you can at least
avoid protected calls if you use finalizers and perform a full garbage
collection at the start of each exception handler.

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Francisco Olarte
On Thu, Mar 30, 2017 at 9:01 PM, Florian Weimer <[hidden email]> wrote:
> Unfortunately, there doesn't seem to be an efficient way to spot when
> references go away immediately and still deal with cycles.  Even
> ignoring the cycle problem, reference counting tends to have a higher
> overhead than tracing collectors (at least for heaps of smaller
> sizes).

Reference counting typically has the size overhead of storing the
reference count plus the time overhead of maintaning it, vs the cost
of the gc. But it has the advantage of keeping thight control of
memory consumption which can be beneficial in many cases ( as, for
what I know, gc work bests when you let them eat more memory than
needed, about 1.5 or 2 times, but I'm not current on state of the art
gc ).

Also, for me, the possibility of using RAII offsets many
inconveniences, I divide the languages on whether they NEED
try/finally or not ( not wheteher they HAVE it ). And ref counting
also aids you in situations more difficult to solve than exhausted
file handles, like mutex locking.


 I've never had a problem with pure ref counting systems, and, until I
had to program in Java for somehow biggy systems, I normally never
generated cycles ( programs in perl with runtime measured in months or
years processing nearly a million transactions a day prove it ).
The only ones where I did it where data analysis programs wich either
needed to free everything at end ( so just exit ) or allocated data
for each run from a 'memory arena' freed as a whole ( that was using
C/C++, where this is easy to do ).

OTOH gc can coexist with refcounts, you manage the refcount, if it
hits 0 free, and if memory usage grows you do a gc pass and free as
usual in gc. There are some runtimes around with this approach,
C-python and duktape at least IIRC, and I think it's great for my kind
of work ( I've never need any kind of data which needed cycles AND
allocated anything other than memory ). This may be, as you point,
more ineficient, but it does not double the cost ( having ref-count
means typically few live objects, cycled or not, and smaller memory
areas,  in many programs, which leads to very fast collections ).

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

William Ahern
In reply to this post by Francisco Olarte
On Thu, Mar 30, 2017 at 09:34:03AM +0200, Francisco Olarte wrote:

> John:
>
> On Wed, Mar 29, 2017 at 10:51 PM, John Dunn <[hidden email]> wrote:
> > It seems like a way to tag something to always be garbage collected as soon as there are no longer any references might work.
>
> This idea is it needs two parts. One to detect the "as soon as there
> are no longer any references" plus another to "tag something to..." (
> and implement the special collections for this ). If you have the
> first one, why bother with the second, just _gc it instead of tagging.
>
> Reference counting is something which must either be built deeply into
> the language ( like perl or C-python ) or be implemented in a language
> which gives you more control on your data ( like C++, where you can
> mix manually managed, reference counting and garbage-collected data in
> the same program ). I, particularly, would love to have a way to
> choose ( as my objects typically split into two categories, things
> like sockets, db handles, files, which do not participate in cycles (
> or can easily be made not to ) and would like to have managed in a
> release-fast way and memory-chunk-like data which I do not care when
> it is freed. That's why I favour C++ ways, where I manage my heavy
> resources with reference counting and use garbage collectors or memory
> pools ( where you free the whole pool ) for things like graphs.

Either Roberto or Luiz once proposed a construct that automatically invoked
a __close metamethod on an object upon exiting a block. IIRC, it would be
invoked even if scope exit occured because an error was thrown.

I thought that was a cool idea. It's similar to the using, with, and defer
statements of other languages.

I suppose all it would really require is a linked-list, stack-like data
structure adjacent to the jmp_buf structure used for exception handling.
Plus some new opcodes and changes to the code generator. Some of those
changes might be tricky.


Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Gé Weijers
On Fri, Mar 31, 2017 at 12:24 PM, William Ahern <[hidden email]> wrote:

Either Roberto or Luiz once proposed a construct that automatically invoked
a __close metamethod on an object upon exiting a block. IIRC, it would be
invoked even if scope exit occured because an error was thrown.

I thought that was a cool idea. It's similar to the using, with, and defer
statements of other languages.

It sometimes looks like this list is trying to reinvent Scheme one feature at a time. None of the above languages
have coroutines (or it's Scheme cousin: call-with-current-continuation), and 'exit means exit'.

Not so with coroutines, yield exits a block, but it may or may not return. If you're holding a lock using an RAII-style
userdata you may be in trouble, either because after a resume you run your code without owning the lock, or the coroutine
never gets resumed, and the lock is held way too long. 

Coming up with sensible semantics may be harder than actually implementing this feature.

(The 'solution' in Scheme is to have the programmer figure it out, and provide a feature called 'dynamic-wind', not pretty)


--
--

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Patrick Donnelly
On Fri, Mar 31, 2017 at 6:14 PM, Gé Weijers <[hidden email]> wrote:

> On Fri, Mar 31, 2017 at 12:24 PM, William Ahern <[hidden email]>
> wrote:
>>
>>
>> Either Roberto or Luiz once proposed a construct that automatically
>> invoked
>> a __close metamethod on an object upon exiting a block. IIRC, it would be
>> invoked even if scope exit occured because an error was thrown.
>>
>> I thought that was a cool idea. It's similar to the using, with, and defer
>> statements of other languages.
>
>
> It sometimes looks like this list is trying to reinvent Scheme one feature
> at a time. None of the above languages
> have coroutines (or it's Scheme cousin: call-with-current-continuation), and
> 'exit means exit'.
>
> Not so with coroutines, yield exits a block, but it may or may not return.
> If you're holding a lock using an RAII-style
> userdata you may be in trouble, either because after a resume you run your
> code without owning the lock, or the coroutine
> never gets resumed, and the lock is held way too long.

Uh, you'd be in trouble with coroutine yields regardless of the lock
management mechanism (RAII or just manual lock/unlock).

BTW, I find it unnecessarily passive aggressive to assert Lua et al.
are "reinventing" Scheme. It is the very nature of all (spoken and
artifical) languages to borrow (or steal) from each other. I don't go
out of my way to cite French whenever I pass out my resume (or spell
it to their standards for that matter).

--
Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Patrick Donnelly
In reply to this post by William Ahern
On Fri, Mar 31, 2017 at 3:24 PM, William Ahern
<[hidden email]> wrote:

> On Thu, Mar 30, 2017 at 09:34:03AM +0200, Francisco Olarte wrote:
>> John:
>>
>> On Wed, Mar 29, 2017 at 10:51 PM, John Dunn <[hidden email]> wrote:
>> > It seems like a way to tag something to always be garbage collected as soon as there are no longer any references might work.
>>
>> This idea is it needs two parts. One to detect the "as soon as there
>> are no longer any references" plus another to "tag something to..." (
>> and implement the special collections for this ). If you have the
>> first one, why bother with the second, just _gc it instead of tagging.
>>
>> Reference counting is something which must either be built deeply into
>> the language ( like perl or C-python ) or be implemented in a language
>> which gives you more control on your data ( like C++, where you can
>> mix manually managed, reference counting and garbage-collected data in
>> the same program ). I, particularly, would love to have a way to
>> choose ( as my objects typically split into two categories, things
>> like sockets, db handles, files, which do not participate in cycles (
>> or can easily be made not to ) and would like to have managed in a
>> release-fast way and memory-chunk-like data which I do not care when
>> it is freed. That's why I favour C++ ways, where I manage my heavy
>> resources with reference counting and use garbage collectors or memory
>> pools ( where you free the whole pool ) for things like graphs.
>
> Either Roberto or Luiz once proposed a construct that automatically invoked
> a __close metamethod on an object upon exiting a block. IIRC, it would be
> invoked even if scope exit occured because an error was thrown.
>
> I thought that was a cool idea. It's similar to the using, with, and defer
> statements of other languages.

Out of all the possible features of the next release, this is the one
I'm most looking forward to!

> I suppose all it would really require is a linked-list, stack-like data
> structure adjacent to the jmp_buf structure used for exception handling.
> Plus some new opcodes and changes to the code generator. Some of those
> changes might be tricky.

As mentioned in the thread bringing the feature up a year or so ago,
the mechanism would piggy back on the same code that closes upvalues
(as closing upvalues must also deal with errors and block scope
ending). It shouldn't require any new data structures I believe.

--
Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Gé Weijers
In reply to this post by Patrick Donnelly

On Sun, Apr 2, 2017 at 2:47 PM, Patrick Donnelly <[hidden email]> wrote:
BTW, I find it unnecessarily passive aggressive to assert Lua et al.
are "reinventing" Scheme. It is the very nature of all (spoken and
artifical) languages to borrow (or steal) from each other. I don't go
out of my way to cite French whenever I pass out my resume (or spell
it to their standards for that matter).


I'm not in any way trying to suggest that the authors of Lua are trying to reinvent Scheme. I'm actually quite impressed with the balance between the power of the language, its simplicity, and the ease of interfacing it with external code. All of that in a bit over 24000 lines of code (about 16600 if you strip comments and blank lines).

In my opinion (and it's just an opinion) there's a bit much of "if we add this feature it fixes today's problem" going on on this list. It's a good thing the Lua authors do not implement 99% of the ideas, because we'd be looking at yet another language that tries to be everything to everybody.

I'm not against adding features to improve the early releasing of resources, but adding a C# style 'using' clause or similar only fixes a part of the problem. This one leaks too:

for line in io.lines("myfile")
do
  if pattern:match(line) then break end
end

io.lines only closes the file if it hits the end of file, or when the garbage collector catches up with the abandoned file. Is there a way to catch cases like this as well? I don't know, but it would be nice to consider any options before copying an existing solution from C#/Go/Scheme or whatever other language there is.




Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Dirk Laurie-2
2017-04-03 20:34 GMT+02:00 Gé Weijers <[hidden email]>:

> In my opinion (and it's just an opinion) there's a bit much of
> "if we add this feature it fixes today's problem" going on on this list.
> It's a good thing the Lua authors do not implement 99% of the ideas,

+1. (Although much of the discussion usually is talk for talk's sake,
which is not necessarily a bad thing, especially when fortified by
some potable fluid).

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

steve donovan
In reply to this post by Gé Weijers
On Mon, Apr 3, 2017 at 8:34 PM, Gé Weijers <[hidden email]> wrote:
> garbage collector catches up with the abandoned file. Is there a way to
> catch cases like this as well? I don't know, but it would be nice to
> consider any options before copying an existing solution from C#/Go/Scheme
> or whatever other language there is.

I've always been a fan of deterministic finalization, but it's
problematic. This comment says it all:

http://lua-users.org/lists/lua-l/2015-11/msg00270.html

Summary: can already be done in Lua.

Reply | Threaded
Open this post in threaded view
|

Re: forcing userdata to be gc'd?

Patrick Donnelly
In reply to this post by Gé Weijers
On Mon, Apr 3, 2017 at 2:34 PM, Gé Weijers <[hidden email]> wrote:

> I'm not against adding features to improve the early releasing of resources,
> but adding a C# style 'using' clause or similar only fixes a part of the
> problem. This one leaks too:
>
> for line in io.lines("myfile")
> do
>   if pattern:match(line) then break end
> end
>
> io.lines only closes the file if it hits the end of file, or when the
> garbage collector catches up with the abandoned file. Is there a way to
> catch cases like this as well? I don't know, but it would be nice to
> consider any options before copying an existing solution from C#/Go/Scheme
> or whatever other language there is.

Well with the introduction of block scope finalization, Lua would
simultaneously adapt generic-for to have block scope finalization of
the state var (and possibly even the other hidden locals in the
generic-for).

--
Patrick Donnelly