Hybird GC issues in lua 5.4

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Hybird GC issues in lua 5.4

云风 Cloud Wu
We switched to lua 5.4 in our new online game project , and tried
generational GC.

1. Major cycle is too slow to accept. (More than 1 second stop-the-world)
2. Minor cycle is good, it's fast and it reduces the rate of memory
growth and the peak memory usage.

We tried a hybird mode between incremental gc and generational gc.

1. Stop the automatic collector. Memory allocate driven collector is
not suitable in our case.
2. We use a timer to call gc step manually. Every time we call lua_gc
with LUA_GCSTEP, we measure the time the collector cost, and do not
let the time too long. That is to say, we use a time-driven collector
rather than allocate-driven collector.
3. We use generational mode at start, run some minor/young cycles, and
when the total memory increased to a threshold value, we switch it to
incremental  mode, and switch back after a full cycle (It goes through
a number of steps).

They works fine except enterinc(). When switching from generational to
incremental , it stop-the-world about 100ms~400ms / 1GB memory.

I guess the reason is `whitelist(g, g->allgc)`. It traverses all the
object. Is it possible to improved this in future ? I think it's not
very diffcult, we can add a stage something like GCSenterinc before
GCSpause.

Another issue is about the memory layout of TString.

I think using a pointer to hold the string data would be better the
putting the data into a continuous TString structure.

Although an additonal pointer would waste 4/8 bytes memory, but
TString object can be a small fixed size object. It's more
cache-friendly to GC , and many general purpose allocator could manage
small objects more efficiently.




--
http://blog.codingnow.com
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

Roberto Ierusalimschy
> 3. We use generational mode at start, run some minor/young cycles, and
> when the total memory increased to a threshold value, we switch it to
> incremental  mode, and switch back after a full cycle (It goes through
> a number of steps).
>
> They works fine except enterinc(). When switching from generational to
> incremental , it stop-the-world about 100ms~400ms / 1GB memory.
>
> I guess the reason is `whitelist(g, g->allgc)`. It traverses all the
> object. Is it possible to improved this in future ? I think it's not
> very diffcult, we can add a stage something like GCSenterinc before
> GCSpause.

This seems doable; Lua could white this list incrementally, too.
Certainly these transitions between modes could be improved.


> Another issue is about the memory layout of TString.
>
> I think using a pointer to hold the string data would be better the
> putting the data into a continuous TString structure.
>
> Although an additonal pointer would waste 4/8 bytes memory, but
> TString object can be a small fixed size object. It's more
> cache-friendly to GC , and many general purpose allocator could manage
> small objects more efficiently.

Besides this fixed-size headers, the allocator still would have
to deal with the variable-size contents. And the waste would
probably be larger than 4/8 bytes, because each extra memory block
aslo wastes some bytes in its header. Or not?

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

Hugo Musso Gualandi



> I think using a pointer to hold the string data would be better then
> putting the data into a continuous TString structure.

One advantage of putting the string data in a contiguous structure is that it avoids having to go through an additional layer of pointer indirection to read the string contents.

But to be honest, I don't know how much that matters in practice. That's an interesting question.
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

云风 Cloud Wu
In reply to this post by Roberto Ierusalimschy
Roberto Ierusalimschy <[hidden email]> 于2020年12月23日周三 上午3:52写道:
> This seems doable; Lua could white this list incrementally, too.
> Certainly these transitions between modes could be improved.

You may consider entergen(), I think it also can be improved to avoid
stop-the-world problem.

```
luaC_runtilstate(L, bitmask(GCSpause));  /* prepare to start a new cycle */
luaC_runtilstate(L, bitmask(GCSpropagate));  /* start new cycle */
numobjs = atomic(L);  /* propagates all and then do the atomic stuff */
```

These can be incrementally, too.

>
> Besides this fixed-size headers, the allocator still would have
> to deal with the variable-size contents. And the waste would
> probably be larger than 4/8 bytes, because each extra memory block
> aslo wastes some bytes in its header. Or not?

Yes.

But I think it can be optimized in the user-defined allocator (maybe) ,
and a separate string content may be easier to move during the GC to
avoid memory fragments.

We may add a flexible mechanism for long strings, too.
For example,  a user-defined long string free function to avoid memcpy
while pushing long strings into VM.

--
http://blog.codingnow.com
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

云风 Cloud Wu
In reply to this post by Hugo Musso Gualandi
Hugo Musso Gualandi <[hidden email]> 于2020年12月23日周三 上午4:21写道:
>
> > I think using a pointer to hold the string data would be better then
> > putting the data into a continuous TString structure.
>
> One advantage of putting the string data in a contiguous structure is that it avoids having to go through an additional layer of pointer indirection to read the string contents.
>
> But to be honest, I don't know how much that matters in practice. That's an interesting question.

In my opinion, the string object in lua VM core is only a reference.
We seldom use long-string as a key of the table,
and to the short-string key, lua doesn't need to read the string
contents. It only compares the pointers of TString.

A smaller GCObject is more (cache) friendly to the collector because
it always traverses the objects.

--
http://blog.codingnow.com
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

Ranier Vilela-2
Em qua., 23 de dez. de 2020 às 00:09, 云风 Cloud Wu <[hidden email]> escreveu:
Hugo Musso Gualandi <[hidden email]> 于2020年12月23日周三 上午4:21写道:
>
> > I think using a pointer to hold the string data would be better then
> > putting the data into a continuous TString structure.
>
> One advantage of putting the string data in a contiguous structure is that it avoids having to go through an additional layer of pointer indirection to read the string contents.
>
> But to be honest, I don't know how much that matters in practice. That's an interesting question.

In my opinion, the string object in lua VM core is only a reference.
We seldom use long-string as a key of the table,
and to the short-string key, lua doesn't need to read the string
contents. It only compares the pointers of TString.

A smaller GCObject is more (cache) friendly to the collector because
it always traverses the objects.
With C99 is possible declare a struct with 0 size, like this:

typedef struct TString {
  CommonHeader;
  lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
  lu_byte shrlen;  /* length for short strings */
  unsigned int hash;
  union {
    size_t lnglen;  /* length for long strings */
    struct TString *hnext;  /* linked list for hash table */
  } u;
  char contents[];
} TString;

Which effectively removes the "contents" field from the structure itself and places it immediately afterwards in memory.

regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

Roberto Ierusalimschy
In reply to this post by 云风 Cloud Wu
> But I think it can be optimized in the user-defined allocator (maybe) ,
> and a separate string content may be easier to move during the GC to
> avoid memory fragments.

That is a lot that can be done once we abandon malloc/free and start
using a user-defined allocator. We have no intention of following this
route.


> We may add a flexible mechanism for long strings, too.
> For example,  a user-defined long string free function to avoid memcpy
> while pushing long strings into VM.

This has been in our list of improvements for some time, but the
details still need some iron. For instance, what to do if one
tries to create a free-defined string which is too short?
Raise an error? Create a short string and call the free function
immediately? Would that free function need some kind of user value?

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: Hybird GC issues in lua 5.4

云风 Cloud Wu
Roberto Ierusalimschy <[hidden email]> 于2020年12月23日周三 下午10:36写道:
> This has been in our list of improvements for some time, but the
> details still need some iron. For instance, what to do if one

Yes. I agree.

> tries to create a free-defined string which is too short?
> Raise an error? Create a short string and call the free function
> immediately? Would that free function need some kind of user value?

My opinion is:

1. Add a new sub-type of string, it can be any size, even it's short.
2. When we use a short user-string as a table key, (param for
luaH_get/luaH_set), we create/find an equivalent short string instead.

To support user-defined string, we don't need to specify the free
function for each user-string.

We may need to add two APIs :

   void (*lua_Free)(void *ud, void *obj);
   void lua_setfreef(lua_State *L, lua_Free f, void *ud);

   lua_pushustring(lua_State *L, const char * s, size_t l, void *obj);

If obj == NULL, we use lua_Alloc to free user-string object. So that
we can allocate the string object by lua_Alloc before pushing it into
VM manually.

If obj is specified, Lua_Free will free the user-string object by the
`void *obj` rather than the `const char * s`, it's more flexible.

--
http://blog.codingnow.com