[ 5.4 ] Bug - L->nCcalls unsigned short overflow

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[ 5.4 ] Bug - L->nCcalls unsigned short overflow

Павел Отчерцов
Hello.

Sometimes when calling collectgarbage() inside a coroutine, my application crashes with 'C Stack Overflow'. I managed to track it down to a negative number being assigned to nCcalls in lstate.c and ldo.c, so nCcalls becomes ~65534.

The temporary patch I use to fix this problem:

diff --git a/src/ldo.c b/src/ldo.c
index f2f9062..969a088 100644
--- a/src/ldo.c
+++ b/src/ldo.c
@@ -135,7 +135,8 @@ l_noret luaD_throw (lua_State *L, int errcode) {
 
 
 int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud) {
-  unsigned short oldnCcalls = L->nCcalls - L->nci;
+  unsigned short oldnCcalls = L->nCcalls;
+  unsigned short oldnci = L->nci;
   struct lua_longjmp lj;
   lj.status = LUA_OK;
   lj.previous = L->errorJmp;  /* chain new error handler */
@@ -144,7 +145,9 @@ int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud) {
     (*f)(L, ud);
   );
   L->errorJmp = lj.previous;  /* restore old error handler */
-  L->nCcalls = oldnCcalls + L->nci;
+  L->nCcalls = oldnCcalls;
+  if (L->nCcalls - (oldnci - L->nci) >= 0)
+    L->nCcalls -= oldnci - L->nci;
   return lj.status;
 }
 
diff --git a/src/lstate.c b/src/lstate.c
index a2cc4d9..3342635 100644
--- a/src/lstate.c
+++ b/src/lstate.c
@@ -135,13 +135,13 @@ void luaE_freeCI (lua_State *L) {
   CallInfo *ci = L->ci;
   CallInfo *next = ci->next;
   ci->next = NULL;
-  L->nCcalls -= L->nci;  /* to subtract removed elements from 'nCcalls' */
   while ((ci = next) != NULL) {
     next = ci->next;
     luaM_free(L, ci);
     L->nci--;
+    if (L->nCcalls > 0)
+      L->nCcalls--;  /* to subtract removed elements from 'nCcalls' */
   }
-  L->nCcalls += L->nci;  /* to subtract removed elements from 'nCcalls' */
 }
 
 
@@ -151,16 +150,16 @@ void luaE_freeCI (lua_State *L) {
 void luaE_shrinkCI (lua_State *L) {
   CallInfo *ci = L->ci;
   CallInfo *next2;  /* next's next */
-  L->nCcalls -= L->nci;  /* to subtract removed elements from 'nCcalls' */
   /* while there are two nexts */
   while (ci->next != NULL && (next2 = ci->next->next) != NULL) {
     luaM_free(L, ci->next);  /* free next */
     L->nci--;
+    if (L->nCcalls > 0)
+      L->nCcalls--;  /* to subtract removed elements from 'nCcalls' */
     ci->next = next2;  /* remove 'next' from the list */
     next2->previous = ci;
     ci = next2;  /* keep next's next */
   }
-  L->nCcalls += L->nci;  /* to subtract removed elements from 'nCcalls' */
 }
 
 
Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Roberto Ierusalimschy
> Sometimes when calling collectgarbage() inside a coroutine, my application
> crashes with 'C Stack Overflow'. I managed to track it down to a negative
> number being assigned to nCcalls in lstate.c and ldo.c, so nCcalls becomes
> ~65534.

Many thanks for the report.

Are you sure about the cause? In particular, can you check whether the
problem is that (L->nci > L->nCcalls) when luaD_rawrunprotected is
called?  That should never happen.

Can you produce a "small" self-sufficient program that reproduces the bug?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Roberto Ierusalimschy
> > Sometimes when calling collectgarbage() inside a coroutine, my application
> > crashes with 'C Stack Overflow'. I managed to track it down to a negative
> > number being assigned to nCcalls in lstate.c and ldo.c, so nCcalls becomes
> > ~65534.
>
> Many thanks for the report.
>
> Are you sure about the cause? In particular, can you check whether the
> problem is that (L->nci > L->nCcalls) when luaD_rawrunprotected is
> called?  That should never happen.
>
> Can you produce a "small" self-sufficient program that reproduces the bug?

One more question: does your application call 'lua_resume' directly,
from C?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Павел Отчерцов
In reply to this post by Roberto Ierusalimschy
This problem occurs only in our corporate application at my work. I'm sorry I couldn't reproduce it in a simple application.

But with the help of conditional breakpoints I think I managed to find the cause of L->nCcalls being less then L->nci, which led to it becoming negative number (>65500).

The reason is the original L->nCcalls was always overwritten with a new value in lua_resume. This simple patch fixes it for me:

diff --git a/src/ldo.c b/src/ldo.c
index f2f9062..202516d 100644
--- a/src/ldo.c
+++ b/src/ldo.c
@@ -646,6 +646,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
int *nresults) {
int status;
unsigned short oldnny = L->nny; /* save "number of non-yieldable" calls */
+ unsigned short incnCcalls = (from) ? from->nCcalls + 1 : 1;
lua_lock(L);
if (L->status == LUA_OK) { /* may be starting a coroutine */
if (L->ci != &L->base_ci) /* not in base level? */
@@ -653,7 +654,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
}
else if (L->status != LUA_YIELD)
return resume_error(L, "cannot resume dead coroutine", nargs);
- L->nCcalls = (from) ? from->nCcalls + 1 : 1;
+ L->nCcalls += incnCcalls;
if (L->nCcalls >= LUAI_MAXCCALLS)
return resume_error(L, "C stack overflow", nargs);
luai_userstateresume(L, nargs);
@@ -677,7 +678,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
*nresults = (status == LUA_YIELD) ? L->ci->u2.nyield
: cast_int(L->top - (L->ci->func + 1));
L->nny = oldnny; /* restore 'nny' */
- L->nCcalls--;
+ L->nCcalls -= incnCcalls;
lua_unlock(L);
return status;
}


On Wed, Jun 27, 2018 at 5:11 PM Roberto Ierusalimschy <[hidden email]> wrote:
> Sometimes when calling collectgarbage() inside a coroutine, my application
> crashes with 'C Stack Overflow'. I managed to track it down to a negative
> number being assigned to nCcalls in lstate.c and ldo.c, so nCcalls becomes
> ~65534.

Many thanks for the report.

Are you sure about the cause? In particular, can you check whether the
problem is that (L->nci > L->nCcalls) when luaD_rawrunprotected is
called?  That should never happen.

Can you produce a "small" self-sufficient program that reproduces the bug?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Roberto Ierusalimschy
> This problem occurs only in our corporate application at my work. I'm sorry
> I couldn't reproduce it in a simple application.
>
> But with the help of conditional breakpoints I think I managed to find the
> cause of L->nCcalls being less then L->nci, which led to it becoming
> negative number (>65500).
>
> The reason is the original L->nCcalls was always overwritten with a new
> value in lua_resume. [...]

Many thanks again. We will have a look there.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Павел Отчерцов
I managed to write a small program which reproduces a problem.


On Fri, Jun 29, 2018 at 4:42 PM Roberto Ierusalimschy <[hidden email]> wrote:
> This problem occurs only in our corporate application at my work. I'm sorry
> I couldn't reproduce it in a simple application.
>
> But with the help of conditional breakpoints I think I managed to find the
> cause of L->nCcalls being less then L->nci, which led to it becoming
> negative number (>65500).
>
> The reason is the original L->nCcalls was always overwritten with a new
> value in lua_resume. [...]

Many thanks again. We will have a look there.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Павел Отчерцов

On Sun, Jul 1, 2018 at 1:59 PM Павел Отчерцов <[hidden email]> wrote:
I managed to write a small program which reproduces a problem.


On Fri, Jun 29, 2018 at 4:42 PM Roberto Ierusalimschy <[hidden email]> wrote:
> This problem occurs only in our corporate application at my work. I'm sorry
> I couldn't reproduce it in a simple application.
>
> But with the help of conditional breakpoints I think I managed to find the
> cause of L->nCcalls being less then L->nci, which led to it becoming
> negative number (>65500).
>
> The reason is the original L->nCcalls was always overwritten with a new
> value in lua_resume. [...]

Many thanks again. We will have a look there.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ 5.4 ] Bug - L->nCcalls unsigned short overflow

Roberto Ierusalimschy
> https://github.com/pavelxdd/luabug_nccalls
>
> Feel free to check it out.
>
> On Sun, Jul 1, 2018 at 1:59 PM Павел Отчерцов <[hidden email]>
> wrote:
>
> > I managed to write a small program which reproduces a problem.

Many thanks (again).

-- Roberto