Which is faster?

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

RE: garbage collection thoughts... again?!

Nick Trout

> From: Thatcher Ulrich [[hidden email]] 
> On Thu, 29 May 2003, Peter Hill wrote:
> >
> > For example: should the GC be completely transparent to the 
> coder (a 
> >
> > Despite the recent non-shrinking-table oddity (the 
> programmer having 
> 
> Yes, I think this is probably the best argument against 
> ref-counting for Lua.  One of the big advantages of 
> ref-counting is the predictability, which means programmers 
> may tend to rely on its behavior, making it impossible to 
> change the GC implementation without breaking programs.  
> Likewise ref-counting does sometimes require some explicit 
> management, which further ties programs to the implementation.

This was mentioned on the list a couple of weeks ago. Programmers should
not be relying on the way GC works to manage their data. If it is
important the way things are collected then the application should be
constructed in such way that it is independent of GC.

Wrt to Python I don't think Python came with a standard cyclic garbage
collector until version 2.0. There were several libraries before this
point to find cycles. I never paid any attention to cycles as I never
needed to - so I might have had a lot of leaked islands but I had lovely
VM. Without having studied it in detail it looks like you have to tell
Python which objects are grouped together so that it can work out where
cycles are, e.g.
http://www.python.org/dev/doc/devel/api/supporting-cycle-detection.html

I'm not sure what Python does with these containers; perhaps they have
write barriers and they use mark and sweep?! This doesn't look very
transparent, could be erroneous, and just complicates the binding
process. One of the reasons I moved away from Python towards Lua was
that you don't have to deal with reference counting in the Lua API -
which caused several annoying bugs when I first started using it. Lua is
much simpler and it looks like Python is more complicated than when I
left it.

I haven't read as much GC material as you, the following is my
buzzwordtastic understanding of Lua's direction and what I remember of
past GC discussions. Incremental GC on its own could be quite
inefficient, i.e. cycles will be wasted passing over the same data when
it drifts back into the grey area. But, incremental with generational
should counter this problem. I agree that this solution is non-trivial
but the Lua authors have come up with the goods in the past and have
made good optimisations and simplifications in the past (eg. Global
namespace as generic table, environments), and the API is pretty easy.
GC is transparent to the user and does exactly what it should: collects
unreachable objects. Ref counting on its own is undoubtably simpler, but
how do you deal with islands; are you not just back to the problem of
how you distribute processing of a mark and sweep like algorithm.
Superficially it looks like Python divides objects into groups and uses
writebarriers and some higher collection algorithm to collect the
containers.

Your solution for C++ objects relies on the skill of programmers to
optimise dependencies. This should be transparent to script users and is
indeed one of the reasons why people use such languages. Commonly, the
users of embedded Lua may be junior or "lesser" programmers than the
application programmers and unaware of cycles they may create and would
not understand weak links.

Perhaps there is a different solution besides IGC or RC. This discussion
has been had a few times and IGC seems to the frequent winner - in fact
I think I remember mails where you were adamant it was the correct
solution. I'm sure Thatcher will report any findings from the GC list.
:-) (P.S. I like "See through"). 

Regards,
Nick



Reply | Threaded
Open this post in threaded view
|

RE: garbage collection thoughts... again?!

Luiz Henrique de Figueiredo
In reply to this post by Thatcher Ulrich
Nick Trout wrote
>Incremental GC on its own could be quite inefficient [...] But,
>incremental with generational should counter this problem.

That's the direction we're heading right now for GC in 5.1. But no code yet.
--lhf

Reply | Threaded
Open this post in threaded view
|

RE: garbage collection thoughts... again?!

Nick Trout
In reply to this post by Thatcher Ulrich

> From: Luiz Henrique de Figueiredo
> 
> Nick Trout wrote
> >Incremental GC on its own could be quite inefficient [...] But, 
> >incremental with generational should counter this problem.
> 
> That's the direction we're heading right now for GC in 5.1. 
> But no code yet. --lhf

Stuck in a cycle? :-)


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Mark Hamburg-4
In reply to this post by Thatcher Ulrich
on 5/29/03 10:07 AM, Thatcher Ulrich at [hidden email] wrote:

> void SomeFunction(const SPtr<SomeObject>& ptr)
> {
> // ptr is a reference to an already-existing SPtr<>, so there's
> // no ref inc/dec overhead in the function call binding.
> }
> 
> This is safe and easy, in my experience.  In any case, most
> incremental collectors will need a write-barrier, which in practice is
> probably worse than inc/dec of a ref count.

That's what we did on the last project I worked on as well. It's safe from
the standpoint that you won't get stuck with a stale pointer, but it doesn't
have quite the same semantics since the pointer can be changed and that
change will be visible in the called code. "const" just means that
SomeFunction can't change the pointer.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Thatcher Ulrich
On Thu, 29 May 2003, Mark Hamburg wrote:

> on 5/29/03 10:07 AM, Thatcher Ulrich at [hidden email] wrote:
>
> > void SomeFunction(const SPtr<SomeObject>& ptr)
> > {
> > // ptr is a reference to an already-existing SPtr<>, so there's
> > // no ref inc/dec overhead in the function call binding.
> > }
> >
> > This is safe and easy, in my experience.  In any case, most
> > incremental collectors will need a write-barrier, which in practice is
> > probably worse than inc/dec of a ref count.
>
> That's what we did on the last project I worked on as well. It's safe from
> the standpoint that you won't get stuck with a stale pointer, but it doesn't
> have quite the same semantics since the pointer can be changed and that
> change will be visible in the called code. "const" just means that
> SomeFunction can't change the pointer.

True; the above example is the smart-pointered equivalent of
"SomeFunction(SomeObject* ptr)".  At work we use a "SPtrC<>" for const
pointers, if you want to emulate "SomeFunction(const SomeObject*
ptr)".

One of the nice things about C++ is that this ref-counted SPtr<> junk
can be made pretty automatic.

-Thatcher


Reply | Threaded
Open this post in threaded view
|

RE: garbage collection thoughts... again?!

Thatcher Ulrich
In reply to this post by Nick Trout
On Thu, 29 May 2003, Nick Trout wrote:
>
> I'm not sure what Python does with these containers; perhaps they
> have write barriers and they use mark and sweep?!

Looks like they have an optional incremental generational
mark-sweep-like thing, that cooperates with ref-counting.  I found
this explanation which is pretty clear:

http://arctrix.com/nas/python/gc/

> One of the reasons I moved away from Python towards
> Lua was that you don't have to deal with reference counting in the
> Lua API - which caused several annoying bugs when I first started
> using it. Lua is much simpler and it looks like Python is more
> complicated than when I left it.

Yeah, agreed about the ref-counting garbage in the Python API.  That
looks very error-prone.  I think the Lua API takes a safer approach by
requiring you to work through the Lua stack & registry/globals.  Lua's
API is encapsulated enough that C programmers don't need to know how
Lua does its GC.

-Thatcher


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Steve Dekorte-4
In reply to this post by James McCartney

On Thursday, May 29, 2003, at 12:39 PM, James McCartney wrote:
I've often wondered if it were possible to design a memory such that pointers are not a number but an open circuit. Close the circuit and the power to the objects' "live" bits turns off and they cease to exist. More of a metaphor than something that could be implemented directly. But some way of using hardware circuits to do your pointer tracing at the speed of light.

When logic and memory inevitably become merged in future computer architectures, I suspect something like this will be done. The machine could send a "mark" message to it's root node periodically. Each node would forward the mark message to those it referenced. A node that hasn't received a mark message in a given time period could safely assume itself garbage and power down and wait for a allocation message for it's type.

Cheers,
Steve
OSX freeware and shareware: http://www.dekorte.com/downloads.html


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Tobias Käs
In reply to this post by Thatcher Ulrich
As the question required the list to be kept existend by a pointer to _any_
element in the list, the solution you gave is insufficient. If the pointer
does not point to the first entry, the weak part will get lost. If you make
the weak part strong you have cycles ;-)

Besides, I think there should be made a difference between garbage
collection in scripting languages and in "native" languages. Scripting
languages do (usually) know everything about their environment like the
stack etc. since they specify and implement it. Therefore they should use
this information in their gc. In "native" languages assumptions about the
environment make the program most times depending on the machine or
operating system.

----- Original Message ----- 
From: "Thatcher Ulrich" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Thursday, May 29, 2003 10:39 PM
Subject: Re: garbage collection thoughts... again?!


> On Thu, 29 May 2003, Enrico Colombini wrote:
>
> > On Thursday 29 May 2003 19:07, Thatcher Ulrich wrote:
> > > * use a weak pointer when you don't "own" the other object.  E.g. a
> > >   child node would have a weak pointer to its parent.  This requires
> > >   some awareness on the part of the programmer of when to use a weak
> > >   pointer, but at least is safe.
> >
> > How do you apply this principle to a doubly-linked list, that could
> > be kept in existance by a single reference to an unpredictable item
> > (e.g. a reference either to the head or to the tail)?
>
> I guess it might look like this:
>
> struct element {
> // some data
> smart_ptr<element> next;
> weak_ptr<element> prev;
> };
>
> And, the list itself must be terminated by NULL, or be explicitly
> broken when you want to collect it.
> [...]


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Thatcher Ulrich
On Fri, 30 May 2003, [iso-8859-1] Tobias Käs wrote:

> As the question required the list to be kept existend by a pointer to _any_
> element in the list, the solution you gave is insufficient. If the pointer
> does not point to the first entry, the weak part will get lost. If you make
> the weak part strong you have cycles ;-)

Hm, true.  What we typically do in practice: use raw pointers for the
internal links, and keep the objects alive through other means
(e.g. have strong references in an array somewhere).

-Thatcher


Reply | Threaded
Open this post in threaded view
|

RE: garbage collection thoughts... again?!

Nick Trout
In reply to this post by Thatcher Ulrich
> From: Thatcher Ulrich
> 
> On Fri, 30 May 2003, [iso-8859-1] Tobias Käs wrote:
> 
> > As the question required the list to be kept existend by a 
> pointer to 
> > _any_ element in the list, the solution you gave is 
> insufficient. If 
> > the pointer does not point to the first entry, the weak 
> part will get 
> > lost. If you make the weak part strong you have cycles ;-)
> 
> Hm, true.  What we typically do in practice: use raw pointers 
> for the internal links, and keep the objects alive through 
> other means (e.g. have strong references in an array somewhere).

And you're suggesting we forget IGGC in favour of more simple RC? :-) I think the most complicated and potentially erroneous aspect of IGGC is the write barrier. Once this is intact and bug free I think the GC algorithm would be pretty well encapsulated, albeit a little complicated.

Interestingly, the Python IGC collects garbage! I.e. is looks for unreachable objects, not reachable ones; apparently the root set is difficult to determine because if the module implementation. Only "container" objects, that can contain references, are checked for references. This is somewhat simplified in Lua since tables represent all of the container objects in Python. So, Lua's GC implementation may be simpler to implement than Python, and hidden from the user, unlike Python's.

--Nick


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Taj Khattra-2
In reply to this post by Thatcher Ulrich
On Thu, May 29, 2003 at 01:47:58AM -0400, Thatcher Ulrich wrote:
> In case anyone is interested, I've written down my current thoughts
> about garbage collection (not necessarily specific to Lua), including
> a nice succinct quote by Ken Thompson.

the inferno gc that ken refers to is described in more detail in

    http://cm.bell-labs.com/who/lorenz/papers/ismm98.pdf

-taj

Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Roberto Ierusalimschy
In reply to this post by Thatcher Ulrich
My personal thoughts...


* I disagree with Thatcher (and Linus).  Mark-and-Sweep is a trivial
algorithm, and (as far as I know) the only GC technique that is
completely modular (it does not need any modification in other parts of
the interpreter). It has a quite acceptable overall performance. Its
main (only?) drawback is the long pauses. Reference counting (RC) is
a nightmare to implement right, can have a big performance impact
without careful optimizations (which make the implementation even more
difficult), and does not collect cycles. For those that think that RC
is conceptually simple, it is worth remembering that it took several
years after its creation for people to realize that it does not collect
cycles.


* The fact that Perl and Python use reference counting (RC) and are
very sucessful is not a very good argument.  Perl programs are not
exactly famous for not leaking memory and Python does not use "pure"
RC. Moreover, Lua has very different requirements than Perl and Python,
and has also very different characteristics.

Particularly, I believe that cycles are much more common in Lua than in
those other languages. As an example, something as simple as the
next line creates a cycle in Lua:

  function f() return 1 end

(Notice that getfenv(f).f == f ;-)

If you delete "f" you break the cycle. But if you delete the environment,
RC will not collect it.


* "make simple things simple, complex things possible". Currently it is
not possible to run a Lua program without eventual long pauses. Ending
these pauses is a complex requirement, for specialized uses. It should
be possible to do that.  But we should not sacrifice simple things (to
program without worring about cycles) to achieve that. I consider any GC
algorithm that does not collect cycles unacceptable for Lua.


* I think that any implementation that imposes an overhead on stack
manipulation in Lua would ruin its performance. (It is not by chance
that Lua is much faster than Perl and Python regarding function calls,
parameter passing, and other basic language facilities.)


* After all those "requirements", I must say that reference counting
is still an option ;) Regarding the stack, we could use a deferred
reference counting algorithm, which runs a "mark-and-sweep" only over
the stack(s) to complete a collection. (Of course, that spoils the
immediateness of reference counting, but such short stack marks can
be run much more frequently than a full mark-and-sweep.) Regarding
cycles, Rafael Lins recently published a paper called "Lazy Cyclic
Reference Counting", that seems a much improved version of his previous
algorithms.  We still have to study its impact on a "typical" Lua
program (whatever that means...).


* Whatever the final algorithm, it should be clear that it will
need some cooperation from a programmer to achieve soft real-time
performance. There will be short pauses to traverse the stack(s), to
clean weak tables, etc. (Cooperation here means things like to avoid the
use of too many weak tables or other similar rules.) There is no chance
to achieve hard real-time performance. I doubt this is possible with
any scripting language. This is difficult even for hard languages doing
dynamic memory allocation. (First you need a "malloc" that never returns
NULL...)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

David Olofson
(I think it's time to clear up some misconceptions here...)

On Saturday 31 May 2003 15.56, Roberto Ierusalimschy wrote:
[...]
> * Whatever the final algorithm, it should be clear that it will
> need some cooperation from a programmer to achieve soft real-time
> performance.

By the definitions I know of (and that matter in the real world, where 
I live and work), Lua already seems to do *soft* real time just fine. 
Deadlines can be met most of the time, although some will be missed 
every now and then. That's *soft* real time.


> There will be short pauses to traverse the stack(s),
> to clean weak tables, etc.

How short? "Short enough" is fine by me. You need to use a CPU that's 
fast enough for the job, no matter what.

Can the worst case delay be bounded? Yes, if the algorithm is right!

If there's a guaranteed (*) worst case latency for any particular 
hardware (ie you can predict how much work the GC will do in any 
given situation), it *is* hard real time.

Hard real time is *not* about speed. It's about deterministic 
behavior. If I need some system to react within one minute (or we 
have a reactor meltdown, or whatever), I don't care if it *can* 
respond in 50 µs. I just need to know that it will never ever fail to 
respond within one minute.

(*) The are no TRUE hard real time systems in the real world.
    They are not possible to implement without indestructible
    and completely stable hardware. What most "RT people"
    (control engineers like me) call "hard real time" actually
    means "the risk of missing a deadline isn't much bigger
    than the risk of the hardware spontaneously frying."
    Hardware *does* fry. Some bug or miscalculation *might*
    cause a response to be slightly late every few thousands
    of hours or so. That's why really critical systems have
    watchdogs, fallback systems, multiple redundancy, or
    combinations thereof.


> (Cooperation here means things like to
> avoid the use of too many weak tables or other similar rules.)

Of course. No system is stronger than it's weakest link. If an RTAI or 
RTLinux application has non-deterministic code in the hard real time 
path, it will not, and cannot ever be, a hard real time application.


> There is no chance to achieve hard real-time performance.

Sure there is. Hard real time != microseconds. Hard real time == known 
absolute worst case latency.


> I doubt
> this is possible with any scripting language.

No personal experience, but I think SuperCollider proves the opposite.

Either way, what is so special about scripting languages that they 
cannot be used in hard real time applications? What's so special 
about C that it *can* be used in hard real time systems?

I'm 99.99% sure the answer to that is "nothing". We're just talking 
about code here. If we rely only on deterministic algorithms - no 
matter what they're doing - we can have a hard real time system.


> This is difficult
> even for hard languages doing dynamic memory allocation.

Yes, it's a different set of rules from what "normal" application 
developers are used to.


> (First you
> need a "malloc" that never returns NULL...)

Well, that's the fist thing everyone gets wrong. DON'T use dynamic 
allocation in the first place. Simple as that.

BTW, "never running out of memory" is not a useful definition in this 
context. (It would be more useful to normal applications.) What are 
you going to do with all that memory anyway?

You can use pre-allocated buffers, pools of pre-allocated objects, and 
you can even have something that looks like a normal memory manager, 
but that is hard RT safe. That's still some form of "pool of 
pre-allocated objects", though.

Either way, you have to realize that if you design your system in a 
way that makes it possible for any combination of events to make it 
run out of memory, it is not a hard real time system. No hardware, no 
operating system and no language can save you if your design is 
broken.


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
   --- http://olofson.net --- http://www.reologica.se ---


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Mark Hamburg-4
In reply to this post by Roberto Ierusalimschy
If you or the implementation makes cycles involving short-lived objects
frequently, reference counting is essentially not an option without going to
much more complicated algorithms.

If this is not the case, then deferred reference counting -- i.e., don't
count references from the stack but don't free things without checking the
stack -- may be able to reduce the frequency with which one has to run a
mark-and-sweep algorithm. Combined with some form of interruptible
mark-and-sweep -- i.e., so that you can start a garbage collection while
idle but abort it if you've got real work to do -- this might suppress
enough of the issues with GC pauses to be an acceptable solution for many
(though obviously not all) applications.

There are then two overhead issues.

One is the space for the reference counts. If this is just on strings and
tables, it's probably not too bad.

The other is speed. You need the extra time needed to maintain the reference
counts to be paid for in the reduced time spent running the mark-and-sweep
algorithm.

Despite the extra space for the reference counts, it could also result in
reduced memory footprint because more memory would be recycled faster. This
could result in improved performance.

And then there's the issue of making this work with threads...

Mark


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Thatcher Ulrich
In reply to this post by David Olofson
On May 31, 2003 at 04:39 +0200, David Olofson wrote:
> 
> > There will be short pauses to traverse the stack(s),
> > to clean weak tables, etc.
> 
> How short? "Short enough" is fine by me. You need to use a CPU that's 
> fast enough for the job, no matter what.
> 
> Can the worst case delay be bounded? Yes, if the algorithm is right!

The problem with the current collector is that during a GC cycle, the
time is bounded, sure -- but it's bounded by the number of objects on
the heap.  IIRC, we had one report on the list of a group that was
using Lua, who observed an occasional half-second delay for GC (in a
robot application, of all things!).

Aha, here's the message from the archives:

http://lua-users.org/lists/lua-l/2002-08/msg00283.html

The point of the incremental work is to make it possible to convert
the occasional very long delay into a large number of relatively short
delays.

Dunno what that means in terms of hard/soft real-time.

-- 
Thatcher Ulrich
http://tulrich.com

Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Roberto Ierusalimschy
In reply to this post by David Olofson
> Can the worst case delay be bounded? Yes, if the algorithm is right!

If you traverse the stack, you can only bound the worst case delay if
you bound your stack size. This is the kind of programming discipline
I am talking about.


> Either way, what is so special about scripting languages that they 
> cannot be used in hard real time applications?

Pervasive use of dynamic memory allocation + garbage collection.


> Well, that's the fist thing everyone gets wrong. DON'T use dynamic 
> allocation in the first place. Simple as that.

That is what I said.  ("This is difficult even for hard languages doing
dynamic memory allocation.")

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

David Olofson
In reply to this post by Thatcher Ulrich
On Saturday 31 May 2003 21.43, Thatcher Ulrich wrote:
[...]
> The point of the incremental work is to make it possible to convert
> the occasional very long delay into a large number of relatively
> short delays.
>
> Dunno what that means in terms of hard/soft real-time.

Provided there's a way to reliably control the worst case latency 
without risking running out of memory, this should make hard real 
time possible.

Indeed, doing too little work at a time will result in more cache 
thrashing (you get new memory instead of recentlf reclaimed memory), 
but such a performance impact is insignificant as long as the system 
can meet it's deadlines.

Doesn't hurt if average performance can be maximised, but as far as 
I'm concerned, I'd rather put slightly faster CPUs in our instruments 
than not using Lua at all. (They're in the $50,000 range. :-) So, if 
one GC can't do both, there has to be at least two options to satisfy 
both soft/non and hard real time applications.


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
   --- http://olofson.net --- http://www.reologica.se ---


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

David Olofson
In reply to this post by Roberto Ierusalimschy
On Saturday 31 May 2003 23.51, Roberto Ierusalimschy wrote:
> > Can the worst case delay be bounded? Yes, if the algorithm is
> > right!
>
> If you traverse the stack, you can only bound the worst case delay
> if you bound your stack size. This is the kind of programming
> discipline I am talking about.
>
> > Either way, what is so special about scripting languages that
> > they cannot be used in hard real time applications?
>
> Pervasive use of dynamic memory allocation + garbage collection.

Many languages that are not referred to as scripting languages do that 
as well, but well, it's just a matter of definition. To me, anything 
that has the compiler built in (or interprets directly) is a 
scripting language. (I think... I don't really care all that much, 
because all that matters is whether it does the job or not.)


> > Well, that's the fist thing everyone gets wrong. DON'T use
> > dynamic allocation in the first place. Simple as that.
>
> That is what I said.  ("This is difficult even for hard languages
> doing dynamic memory allocation.")

Ah, sorry. So we're basically saying the same thing: It's not VM vs 
native code that makes the difference; it's memory management.

Anyway, RT "dynamic" memory allocation is hairy, but as I said, you 
can always use pre-allocated and locked memory along with suitable 
algorithms. Dynamic memory management is not soft real time by 
definition. It *may* however increase system complexity to a level 
where it might be unrealistic to base reliability calculations on 
anything other than results from extensive testing. "Ok, we've stress 
tested it for 100 hours straight, and it still haven't missed a 
deadline. That will do."


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
   --- http://olofson.net --- http://www.reologica.se ---


Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

Wim Couwenberg-2
In reply to this post by Roberto Ierusalimschy
I'm not sure if anything like the following was already mentioned in the
long and ongoing GC discussion.  (Or is it covered by some of the
"buzzwords" or does it just make no sense...?  ;-))

It should be possible to halt a M&S algorithm at (almost) any stage if you
divide collectible objects in "even" and "odd" generations.  Each GC cycle
only considers object of a single (current) generation.  Then switch the
parity that will be used for newly created objects at the beginning of each
GC cycle.  Marking must be an atomic operation and it flips the objects'
generation.  Only unmarked objects of the generation under consideration
will be collected in the sweep stage.  Then, after completing a GC cycle,
all remaining objects (both live and unreachable) belong to the same
generation.

Collectible objects are not relocatable (only the key/value data of tables
is) so GC can simply pick up its work where it left it after each break (if
it is implemented as a cooperative task.)  GC activities are under full
control of the Lua vm so it should be possible to apply a sensible
scheduler.  Somewhere in the main byte-code loop, possibly checking an
optional flag/gc_hook (to yield) every now and then?

Also, I'd definitely like to see "weakly referenced cycles" collected as
well (as discussed in some earlier threads about GC.)  So M&S should be
extended by recording an object's dependants in the mark stage.  (An object
A depends on object B if object A is the strongly referenced counterpart of
the weakly referenced object B in a weak table's key/value pair.)  Unlike
mentioned by Rici and Peter Hill (if I remember correctly) I would suggest
not to link the key/value pairs themselves, partly because of the storage
overhead, but mostly because key/value pairs _are_ relocatable.  On the
other hand this means that an extra GC admin (i.e. outside of the common
header) must be used for tracking sets of dependants.  But it's definitely
worth the extra effort in my view.

Anyway, just my 2 (euro) cents...

Bye,
Wim



Reply | Threaded
Open this post in threaded view
|

Re: garbage collection thoughts... again?!

James McCartney

On Sunday, June 1, 2003, at 05:29 AM, Wim Couwenberg wrote:

It should be possible to halt a M&S algorithm at (almost) any stage if you divide collectible objects in "even" and "odd" generations. Each GC cycle only considers object of a single (current) generation. Then switch the parity that will be used for newly created objects at the beginning of each GC cycle. Marking must be an atomic operation and it flips the objects'
generation.


You can get a lot more fine grained than that.
If you want to read up on incremental GC :

Paul Wilson's GC survey. extremely informative.
ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps

Wilson and Johnstone's real time GC. I implemented this in my language.
ftp://ftp.cs.utexas.edu/pub/garbage/GC93/wilson.ps


--
--- james mccartney   [hidden email]   <http://www.audiosynth.com>
SuperCollider - a real time audio synthesis programming language for MacOS X.


123