Coroutines & C boundaries

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Coroutines & C boundaries

David Given
I'm trying to write some code in Lua using coroutines, and am having trouble.

My Lua code is being used as a plugin for an external program. The program is
calling into Lua to tell it something's happening. As my algorithm is very
stateful, doing everything from call-ins is a nightmare, which makes it a
classic example for control inversion using coroutines.

Example: when input arrives on a socket, I'm called to tell me that it's
happening.

The core of my Lua code is a scheduler. Whenever input arrives, my C code kick
the scheduler and it runs the first task on the list for one iteration. It
does this by calling the following function with lua_pcall():

function schedule_now()
        if (table.getn(taskqueue) == 0) then
                idle()
                return
        end
       
        local task = taskqueue[1]
        if (type(task) == "function") then
                task = coroutine.create(task)
                taskqueue[1] = task
        end
       
        coroutine.resume(task)
       
        if (coroutine.status(task) == "dead") then
                table.remove(taskqueue, 1)
        end
end

This ought to be straightforward enough. I schedule tasks by adding functions
to the taskqueue table. They get run in stages until termination and then
removed. Here's a sample task:

function a_task()
        print("Hello,")
        coroutine.yield()
        print("world!")
end

However, whenever I try to actually use this, the call to coroutine.yield()
fails with a 'attempt to yield across metamethod/C-call boundary' error.

Why?

--
+- David Given --McQ-+ "I must have spent at least ten minutes out of my
|  [hidden email]    | life talking to this joker like he was a sane
| ([hidden email]) | person. I want a refund." --- Louann Miller, on
+- www.cowlark.com --+ rasfw


attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Monday 13 February 2006 22:51, David Given wrote:
[...]
> Why?

It occurs to me that my previous message might be seen as being a little
brusque. It's just I've been banging my head against this for a good while
now, and have no idea what's going on. Any suggestions or advice will be
gratefully appreciated.

--
+- David Given --McQ-+ "Working with Unix is like wrestling a worthy
|  [hidden email]    | opponent. Working with Windows is like attacking a
| ([hidden email]) | small whining child who is carrying a .38." ---
+- www.cowlark.com --+ Nancy Lebovitz


attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

D Burgess-4
Well, the message says it all. And it if you a reading the 5.1
manual then you wont find much help.

This subject has been done to death by many. Supermike who has
produced Coco (see the Wiki) which solves the issue you are
dealing with.

David B.

On 2/14/06, David Given <[hidden email]> wrote:

> On Monday 13 February 2006 22:51, David Given wrote:
> [...]
> > Why?
>
> It occurs to me that my previous message might be seen as being a little
> brusque. It's just I've been banging my head against this for a good while
> now, and have no idea what's going on. Any suggestions or advice will be
> gratefully appreciated.
>
> --
> +- David Given --McQ-+ "Working with Unix is like wrestling a worthy
> |  [hidden email]    | opponent. Working with Windows is like attacking a
> | ([hidden email]) | small whining child who is carrying a .38." ---
> +- www.cowlark.com --+ Nancy Lebovitz
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Monday 13 February 2006 23:18, D Burgess wrote:
> Well, the message says it all. And it if you a reading the 5.1
> manual then you wont find much help.
>
> This subject has been done to death by many. Supermike who has
> produced Coco (see the Wiki) which solves the issue you are
> dealing with.

Actually, this is on 5.0. And can you give me any pointers to *where* in the
wiki this page is? The only thing I can find is the ejcoro patch. (It's not
the best-organised wiki in the world.) And I'd rather like to avoid having to
patch Lua; I'm linking against a system shared library.

The thing is, *why* am I getting this error? I am not doing what it says I'm
doing. I have a C function that's calling into Lua, which is shuffling pure
Lua coroutines and occasionally calls non-reentrant C functions, and then
exits back to the calling C function. That's it. I am *not* trying to yield
across a C boundary that I am aware of.

--
+- David Given --McQ-+ "In America, family has become a code word for
|  [hidden email]    | something that you can put a five-year-old in front
| ([hidden email]) | of and come back secure in the knowledge that your
+- www.cowlark.com --+ child not will not have been exposed to any ideas."


attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

D Burgess-4
> Actually, this is on 5.0. And can you give me any pointers to *where* in the
> wiki this page is?

Ny apologies, it is on luaforge.

http://luajit.luaforge.net/coco.html
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mark Hamburg-4
In reply to this post by David Given
on 2/13/06 3:50 PM, David Given at [hidden email] wrote:

> The thing is, *why* am I getting this error? I am not doing what it says I'm
> doing. I have a C function that's calling into Lua, which is shuffling pure
> Lua coroutines and occasionally calls non-reentrant C functions, and then
> exits back to the calling C function. That's it. I am *not* trying to yield
> across a C boundary that I am aware of.

Your C code may need to use lua_resume rather than lua_call. Of course, then
it needs to work in terms of coroutines rather than functions.

You may also be getting bitten somewhere by pcall being a C function. I
suspect that this is the most common place where C code slips onto the stack
unnoticed and if Lua 5.2 doesn't address the issue wholesale, I would hope
that we could find a solution specifically for pcall.

Mark

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
[...]
> Your C code may need to use lua_resume rather than lua_call. Of course,
> then it needs to work in terms of coroutines rather than functions.

I did try making the scheduler itself a coroutine that the C code resumed; it
failed oddly, complaining that lua_resume() was trying to call a function
value. Looking at the code, it would appear that coroutine.create() does a
lot of stuff as well as calling lua_newthread(), and that the two might not
be strictly compatible.

I did try having a wrapper around the scheduler as follows:

local scheduler
function schedule_wrapper()
        scheduler = coroutine.create(schedule_now)
        coroutine.resume(scheduler)
end

This allowed me to change my call to coroutine.yield() with a call to
coroutine.resume(scheduler). This avoided the C boundary error, but caused
all kinds of other weird stuff to happen.

How does yield() know where to transfer execution to? Does it go to the most
recent resume() that caused the current coroutine to be run?

> You may also be getting bitten somewhere by pcall being a C function. I
> suspect that this is the most common place where C code slips onto the
> stack unnoticed and if Lua 5.2 doesn't address the issue wholesale, I would
> hope that we could find a solution specifically for pcall.

Surely *all* Lua code gets run through a pcall, though? I really don't
understand what's going on here.

--
+- David Given --McQ-+ "The time has come," Mithrandir said, "To talk of
|  [hidden email]    | many things: Of Moria, and bridges, and deep
| ([hidden email]) | fissures --- of Balrogs, and their wings." ---
+- www.cowlark.com --+ Meneldil on rasfw

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Adrian Sietsma
David Given wrote:
> [...]
> How does yield() know where to transfer execution to? Does it go to the most
> recent resume() that caused the current coroutine to be run?
yes.
yield() returns to the caller of the coroutine, ie. coroutine.resume() will
return when the resumed coroutine yields, returns, or dies.

Adrian

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
In reply to this post by Mark Hamburg-4
On Tuesday 14 February 2006 07:14, you wrote:
[...]
> Your C code may need to use lua_resume rather than lua_call. Of course,
> then it needs to work in terms of coroutines rather than functions.

The good news is that switching to use a local copy of Lua 5.1w6 with Mike
Pall's Resumable VM patch now works fine. The bad news is that Lua 5.1w6 is
rather elderly and doesn't contain a number of features that are now
confirmed for 5.1 (such as string.gmatch). Also, it's crashing oddly with
valgrind complaining about uninitialised memory in index2adr, but that may
not be its fault.

Does anyone know if the resumable VM patch is available for either 5.1rc or
5.0?

I still don't know why this isn't working, either (it doesn't work on vanilla
5.1rc either).

--
+- David Given --McQ-+ "This is the captain. We have a little problem
|  [hidden email]    | with our reentry sequence, so we may experience
| ([hidden email]) | some slight turbulence and then explode." --- Mal
+- www.cowlark.com --+ Reynolds, _Serenity_


attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mike Pall-2-2
Hi,

David Given wrote:
> Does anyone know if the resumable VM patch is available for either 5.1rc or
> 5.0?

Nope. But there's Coco which is functionally the same for your
purposes. I'll release an update to Coco whenever Lua 5.1 final
is out. Mail me if you need a pre-release against 5.1-rc4.

> I still don't know why this isn't working, either (it doesn't work on vanilla
> 5.1rc either).

IMHO it's better to find out the reason for this. It may point to
a mistake in your program or a misunderstanding. The example you
gave runs fine with a plain Lua script. My guess is that you are
using pcall() or a callback from C code somewhere in a coroutine.

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Wednesday 15 February 2006 15:48, Mike Pall wrote:
[...]
> Nope. But there's Coco which is functionally the same for your
> purposes. I'll release an update to Coco whenever Lua 5.1 final
> is out. Mail me if you need a pre-release against 5.1-rc4.

The problem with Coco, though, is that because it's doing C stack swapping
it's an order of magnitude more difficult to make portable. Currently all my
code is generic and doesn't care what platform it's running on. With Coco, my
build system will have to know details about the platform configuration, etc.
The Resumable VM patch doesn't have any of this stuff, which I don't need
anyway.

> > I still don't know why this isn't working, either (it doesn't work on
> > vanilla 5.1rc either).
>
> IMHO it's better to find out the reason for this. It may point to
> a mistake in your program or a misunderstanding. The example you
> gave runs fine with a plain Lua script. My guess is that you are
> using pcall() or a callback from C code somewhere in a coroutine.

The only thing I can think of is that Gaim is somehow calling back into my
plugin while I'm running code from within my plugin. Which is evil.

Looking at the Lua code, I can't actually figure out how anything works in the
first place. In (Lua 5.0) lua_yield(), I get the boundary error if nCcalls>0.
But luaD_call() always increments nCcalls while doing the call, and as far as
I can tell, all cases where C is calling Lua pass through there; so how can
it ever *be* zero?

--
+- David Given --McQ-+ "It's the dawn of the day before the time of the
|  [hidden email]    | land that the lost dinosaurs forgot to remember!"
| ([hidden email]) | --- Ookla the Mok
+- www.cowlark.com --+

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Wednesday 15 February 2006 16:35, David Given wrote:
[...]
> Looking at the Lua code, I can't actually figure out how anything works in
> the first place. In (Lua 5.0) lua_yield(), I get the boundary error if
> nCcalls>0. But luaD_call() always increments nCcalls while doing the call,
> and as far as I can tell, all cases where C is calling Lua pass through
> there; so how can it ever *be* zero?

I've done some more investigation using Lua 5.0 without the patch. (To try and
figure out what's happening.)

My code use tolua, which BTW is incredibly nice. I have exactly two places
where I'm calling lua_pcall, and they're both in the same function. The code
looks like this:

callback()
{
        count++
        lua_getglobal(L, "queuetask")
        lua_pcall(L)
        count--

        if (count == 0)
        {
                count++
                lua_getglobal(L, "schedule")
                lua_pcall(L)
                count--
        }
}

That is, callback() calls the queuetask() function in Lua. This adds a new
function onto my task list. On return, if nobody's using Lua, then I call my
scheduler. This pulls the first thing off the task list, and if it's a
function turns it into a coroutine. Then it resumes it. When the coroutine
yields, the scheduler returns back to the function above.

The *first* time the scheduler is called, this happens:

* Scheduler pulls a function off the task list.
* Scheduler turns it into a coroutine with coroutine.create() and replaces it
on the task list.
* Scheduler resumes it.
* Coroutine does some work.
* Coroutine calls coroutine.yield.
* 'Attempt to yield across yada yada'.
* Coroutine dies.
* Scheduler resumes, and sees dead coroutine.

Creating the coroutine, resuming it, and then trying to yield from it all
occur in the *same* invocation of Lua. Why is this failing?

Looking at ldo.c, I see this:

void luaD_call (lua_State *L, StkId func, int nResults) {
...
  if (++L->nCcalls >= LUA_MAXCCALLS) {
...
  }
  firstResult = luaD_precall(L, func);
  if (firstResult == NULL)  /* is a Lua function? */
    firstResult = luaV_execute(L);  /* call it */
...
  L->nCcalls--;
...
}

This functions appears, from what I can tell, to be the bottleneck through
which all invocations of Lua pass. It is not possible to execute Lua, from
outside Lua, without passing though the above code. nCcalls is initialised to
zero, therefore barring overflow, there should be no way that it can be zero
while luaV_execute() is being called.

Since lots of people (including me) have had this all working, I'm obviously
wrong. Can anybody point out what I'm missing? Please?

--
+- David Given --McQ-+ "I must have spent at least ten minutes out of my
|  [hidden email]    | life talking to this joker like he was a sane
| ([hidden email]) | person. I want a refund." --- Louann Miller, on
+- www.cowlark.com --+ rasfw

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mike Pall-2-2
Hi,

David Given wrote:
> * Coroutine calls coroutine.yield.
> * 'Attempt to yield across yada yada'.

Try print(debug.traceback()) immediately before the yield and
ye shall see.

> void luaD_call (lua_State *L, StkId func, int nResults) {
>
> This functions appears, from what I can tell, to be the bottleneck through
> which all invocations of Lua pass. It is not possible to execute Lua, from
> outside Lua, without passing though the above code.

No. luaV_execute() is called from two places.
You want the other one. It's called resume(). :-)

Ok path:

-> lua_resume
--> resume
---> luaV_execute
----> luaD_precall
-----> luaB_yield
------> lua_yield
<------

Your problem:

-> lua_resume
--> resume
---> luaV_execute
----> luaD_precall
-----> luaB_pcall or a C function which calls back into Lua <-- !!
------> lua_pcall or lua_call
-------> luaD_call       nCcalls++
--------> luaV_execute
---------> luaD_precall
----------> luaB_yield
-----------> lua_yield   nCcalls > 0 -> throw error

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Thursday 16 February 2006 00:53, Mike Pall wrote:
[...]
> Try print(debug.traceback()) immediately before the yield and
> ye shall see.

I'm slightly embarrassed to admit I didn't think about that.

[...]
> No. luaV_execute() is called from two places.
> You want the other one. It's called resume(). :-)

I found that, but I'm never calling it, and it doesn't change nCcalls unless
cleaning up after an error, so it shouldn't have been relevant.

[...]
> Your problem:
>
> -> lua_resume
> --> resume
> ---> luaV_execute
> ----> luaD_precall
> -----> luaB_pcall or a C function which calls back into Lua <-- !!

The problem was that my coroutine top-level function runs everything inside a
pcall, so it can trap errors. And pcall, despite being a pure Lua function,
apparently counts as a C call and you can't yield across it. Ack.

It might be worth emphasising this in the manual; I would never have thought
of this --- pcall looks like pure Lua, and therefore I assumed that you could
yield across it.

Now it's all working, and the R is back into my RAD. Thanks for all the help,
everyone!

--
+- David Given --McQ-+ "There is // One art // No more // No less // To
|  [hidden email]    | do // All things // With art // Lessness." --- Piet
| ([hidden email]) | Hein
+- www.cowlark.com --+

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Luiz Henrique de Figueiredo
> pcall looks like pure Lua, and therefore I assumed that you could
> yield across it.

Does pcall really look like pure Lua? Lua is like C in this respect: it
has NO builtin functions. So, all functions provided by the standard Lua
library are C functions. --lhf
Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mark Hamburg-4
I think the problem is that pcall is relatively special compared to the rest
of the library in that it provides flow control and in fact more or less has
to be used for certain pieces of flow control. In other words, it's
functionality is essential enough to the use of Lua that it feels more like
part of the language than part of a C library.

The work around is to use something like coxpcall which uses coroutines to
simulate pcall semantics.

http://luaforge.net/frs/?group_id=6

Mark

on 2/16/06 3:35 AM, Luiz Henrique de Figueiredo at [hidden email]
wrote:

>> pcall looks like pure Lua, and therefore I assumed that you could
>> yield across it.
>
> Does pcall really look like pure Lua? Lua is like C in this respect: it
> has NO builtin functions. So, all functions provided by the standard Lua
> library are C functions. --lhf

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Adrian Sietsma
In reply to this post by David Given
David Given wrote:
> [...]
>
> The problem was that my coroutine top-level function runs everything inside a
> pcall, so it can trap errors. And pcall, despite being a pure Lua function,
> apparently counts as a C call and you can't yield across it. Ack.
>
If you are running functions by resuming coroutines, do you need pcall ?
Lua errors, asserts, etc within the coro should all just abort the coroutine
and return to caller (resumer).