Garbage collection causes stack overflow

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Garbage collection causes stack overflow

Fotis Panagiotopoulos
Hi!

I am running Lua 5.2.2 in microcontroller environment, using an RTOS. The available resources are very limited (only a few kb of RAM), and I am trying to make sure that Lua will work correctly in this environment.

To stress test the whole firmware, and ensure that memory is handled correctly, I loaded a script that essentially does the following:

    index = 1
    data = {}

    -- This function is called repeatedly, till memory exhaustion.
    function test()
        data[index] = 5
        data[index + 1] = 6
        data[index + 2] = 7
        index = index + 3
    end

What I expect from this function is to make Lua run out of memory (heap memory, due to the continuous allocations).

To my surprise the program failed in another way. As the memory usage is increased, Lua calls the garbage collector to free up some memory. Each time the garbage collector is running, it is using more and more stack, leading to stack overflow long before the heap is exhausted. It seems that no matter how much I increase the stack size, the garbage collector will eat it all very soon.

Note that I am talking here about the C runtime stack, the actual memory that the CPU accesses on the hardware, not the Lua API stack.

Is this the expected behavior? I would expect that as the garbage collector has specific work to do, it should consume the same memory for every execution.

Is there any way to handle garbage collections to improve this situation?
Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Roberto Ierusalimschy
> To my surprise the program failed in another way. As the memory usage is
> increased, Lua calls the garbage collector to free up some memory. Each
> time the garbage collector is running, it is using more and more stack,
> leading to stack overflow long before the heap is exhausted. It seems that
> no matter how much I increase the stack size, the garbage collector will
> eat it all very soon.

Don't you have a stack trace?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Fotis Panagiotopoulos
Unfortunatelly not in my platform.

The stack overflow is detected by filling the stack with a known pattern, and then checking whether there are any data written to it. Thus the overflow is detected at a later time, not the moment that it is happening.

Maybe if I had more RAM (like a normal desktop computer), the problem wouldn't appear. But is this the way the GC is working? Does it use any kind of recursion? Or any large stack buffers?

Στις Πέμ, 11 Απρ 2019 στις 3:17 μ.μ., ο/η Roberto Ierusalimschy <[hidden email]> έγραψε:
> To my surprise the program failed in another way. As the memory usage is
> increased, Lua calls the garbage collector to free up some memory. Each
> time the garbage collector is running, it is using more and more stack,
> leading to stack overflow long before the heap is exhausted. It seems that
> no matter how much I increase the stack size, the garbage collector will
> eat it all very soon.

Don't you have a stack trace?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Roberto Ierusalimschy
> Maybe if I had more RAM (like a normal desktop computer), the problem
> wouldn't appear. But is this the way the GC is working? Does it use any
> kind of recursion? Or any large stack buffers?

The GC doen't use any kind of explicit recursion nor large stack
buffers. It is written assuming a restricted stack.  But some obscure
behavior may create recursive calls. Is it possible for your allocator
to fail when shrinking a memory block?

Any memory-allocation fail may trigger an "emergency collection".
During a GC cycle (finalizers are out of this story), Lua never
allocates new memory, but it may shrink some structures. If
shrinking fails (what Lua assumes should not happen), the emergency
collection will start a new collection, which will fail again,
creating a recursive loop.

But there may be other scenarios that I am not seeing.

Is it possible to have at least some estimate for the number of function
calls in the stack?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Fotis Panagiotopoulos
> The GC doen't use any kind of explicit recursion nor large stack
> buffers. It is written assuming a restricted stack.  But some obscure
> behavior may create recursive calls. Is it possible for your allocator
>to fail when shrinking a memory block?

I just run some tests on this, and it seems that there is no problem with reallocating smaller blocks.

> Any memory-allocation fail may trigger an "emergency collection".
> During a GC cycle (finalizers are out of this story), Lua never
> allocates new memory, but it may shrink some structures. If
> shrinking fails (what Lua assumes should not happen), the emergency
> collection will start a new collection, which will fail again,
> creating a recursive loop.

Could this condition be triggered due to heap fragmentation?
For example:
1. Lua runs out of memory.
2. Emergency GC is executed, some memory is actually freed. Heap is severely fragmented.
3. Lua tries to store some data, which normally should fit into the heap (due to previous GC). Due to high fragmentation allocation fails.
4. Lua calls emergency GC.
5. Repeat...

> Is it possible to have at least some estimate for the number of function
> calls in the stack?

I have no idea what is the stack trace at the moment of the error, but here is the sequence of functions I call.
- There are four nested C functions, stack usage is minimal.
- The fifth level of nesting is lua_pcall(). (No idea what happens next in terms of function calls)
- The called Lua function is the provided example above, no other function calls in Lua.

I am pretty confident that the issue is caused by the GC. I measure the stack exactly before and after lua_pcall(),
and indeed this is where the stack is used. I also placed some prints and I see that stack usage increases only when
GC is running.

Note that stack is used progressively. On every GC call, more and more is requested till the exhaustion. On the first
calls the usage is pretty minimal.


Στις Πέμ, 11 Απρ 2019 στις 3:49 μ.μ., ο/η Roberto Ierusalimschy <[hidden email]> έγραψε:
> Maybe if I had more RAM (like a normal desktop computer), the problem
> wouldn't appear. But is this the way the GC is working? Does it use any
> kind of recursion? Or any large stack buffers?

The GC doen't use any kind of explicit recursion nor large stack
buffers. It is written assuming a restricted stack.  But some obscure
behavior may create recursive calls. Is it possible for your allocator
to fail when shrinking a memory block?

Any memory-allocation fail may trigger an "emergency collection".
During a GC cycle (finalizers are out of this story), Lua never
allocates new memory, but it may shrink some structures. If
shrinking fails (what Lua assumes should not happen), the emergency
collection will start a new collection, which will fail again,
creating a recursive loop.

But there may be other scenarios that I am not seeing.

Is it possible to have at least some estimate for the number of function
calls in the stack?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Roberto Ierusalimschy
> Could this condition be triggered due to heap fragmentation?
> For example:
> 1. Lua runs out of memory.
> 2. Emergency GC is executed, some memory is actually freed. Heap is
> severely fragmented.
> 3. Lua tries to store some data, which normally should fit into the heap
> (due to previous GC). Due to high fragmentation allocation fails.
> 4. Lua calls emergency GC.
> 5. Repeat...

If memory allocation fails after an emergency GC, then there is a
real failure and pcall should return, finishing your program.


> I am pretty confident that the issue is caused by the GC. I measure the
> stack exactly before and after lua_pcall(),
> and indeed this is where the stack is used. I also placed some prints and I
> see that stack usage increases only when
> GC is running.
>
> Note that stack is used progressively. On every GC call, more and more is
> requested till the exhaustion. On the first
> calls the usage is pretty minimal.

Have you checked that after each GC call the stack is back where it
started (basic correct C behavior)?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Enrico Colombini
In reply to this post by Fotis Panagiotopoulos
On 11-Apr-19 19:21, Fotis Panagiotopoulos wrote:
> I am pretty confident that the issue is caused by the GC. I measure the stack exactly before
> and after lua_pcall(), and indeed this is where the stack is used. I also placed some prints
> and I see that stack usage increases only when GC is running.
>
> Note that stack is used progressively. On every GC call, more and more
> is requested till the exhaustion. On the first
> calls the usage is pretty minimal.

Sparse (and probably not useful) thoughts:

- What allocator are you using? Does it fully support realloc?

- You could try tracing allocations by inserting a tracer between Lua
allocation calls and the actual allocator, also doing a stack check
(and possibly other checks) before and after each call, to get a few
more clues about what is happening.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Ką Mykolas
I would add to this:
- How does Your memory layout looks like? (linker script). Maybe stack space and heap space overlaps?

On Fri, Apr 12, 2019 at 10:06 AM Enrico Colombini <[hidden email]> wrote:
On 11-Apr-19 19:21, Fotis Panagiotopoulos wrote:
> I am pretty confident that the issue is caused by the GC. I measure the stack exactly before
> and after lua_pcall(), and indeed this is where the stack is used. I also placed some prints
> and I see that stack usage increases only when GC is running.
>
> Note that stack is used progressively. On every GC call, more and more
> is requested till the exhaustion. On the first
> calls the usage is pretty minimal.

Sparse (and probably not useful) thoughts:

- What allocator are you using? Does it fully support realloc?

- You could try tracing allocations by inserting a tracer between Lua
allocation calls and the actual allocator, also doing a stack check
(and possibly other checks) before and after each call, to get a few
more clues about what is happening.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collection causes stack overflow

Fotis Panagiotopoulos
I am sorry guys, but for some reason it is impossible to reproduce the issue any more.
I am struggling over the last 3 days to make it happen again, I 've been through almost
all the latest commits, but nothing.

Possibly something supposedly unrelated? Some combination of other issues? Maybe I changed
something in the script, and now the issue is now triggered any more... I don't know.

I will have to stop searching for this problem for the time, and check if it appears again.
Either way, thanks for the help!


As per your recommendations / questions:

>Have you checked that after each GC call the stack is back where it
>started (basic correct C behavior)?
No, but I will if I have the opportunity again.

>- What allocator are you using? Does it fully support realloc?
It is a custom implementation. It does support realloc, and as per
my latest tests, it works OK.

>-You could try tracing allocations by inserting a tracer between Lua
>allocation calls and the actual allocator, also doing a stack check
>(and possibly other checks) before and after each call, to get a few
>more clues about what is happening.
I tried to, but to be honest I have trouble understanding the internals
of Lua. So even if there was anything wrong, I may not have noticed it...

>- How does Your memory layout looks like? (linker script). Maybe stack space and heap space overlaps?
The stacks and the heap are located to different non-contiguous RAM banks. I seen no
way the heap to corrupt the stack (a hardware exception will be raised).
Apart from this, there are also lots of other checks, so I believe that it is very
improbable that something unrelated to Lua (and thus running at another thread
with an isolated stack), will cause any such problem:
1. Stacks are prefilled with a known pattern. Their usage level is monitored.
2. The stack pointer is checked during context switches.
3. The heap blocks are padded with a known pattern. Any overflow will be caught.
4. Heap blocks are chained in a linked list fashion. During every malloc, realloc
or clear, this list is validated. Any corruption of the block headers will break the
chain and will be caught.
5. Lua runs on a dedicated thread. Thus it has a dedicated stack.

Στις Παρ, 12 Απρ 2019 στις 10:57 π.μ., ο/η Ką Mykolas <[hidden email]> έγραψε:
I would add to this:
- How does Your memory layout looks like? (linker script). Maybe stack space and heap space overlaps?

On Fri, Apr 12, 2019 at 10:06 AM Enrico Colombini <[hidden email]> wrote:
On 11-Apr-19 19:21, Fotis Panagiotopoulos wrote:
> I am pretty confident that the issue is caused by the GC. I measure the stack exactly before
> and after lua_pcall(), and indeed this is where the stack is used. I also placed some prints
> and I see that stack usage increases only when GC is running.
>
> Note that stack is used progressively. On every GC call, more and more
> is requested till the exhaustion. On the first
> calls the usage is pretty minimal.

Sparse (and probably not useful) thoughts:

- What allocator are you using? Does it fully support realloc?

- You could try tracing allocations by inserting a tracer between Lua
allocation calls and the actual allocator, also doing a stack check
(and possibly other checks) before and after each call, to get a few
more clues about what is happening.

--
   Enrico