Coroutines & C boundaries

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Coroutines & C boundaries

David Given
I'm trying to write some code in Lua using coroutines, and am having trouble.

My Lua code is being used as a plugin for an external program. The program is 
calling into Lua to tell it something's happening. As my algorithm is very 
stateful, doing everything from call-ins is a nightmare, which makes it a 
classic example for control inversion using coroutines.

Example: when input arrives on a socket, I'm called to tell me that it's 
happening.

The core of my Lua code is a scheduler. Whenever input arrives, my C code kick 
the scheduler and it runs the first task on the list for one iteration. It 
does this by calling the following function with lua_pcall():

function schedule_now()
	if (table.getn(taskqueue) == 0) then
		idle()
		return
	end
	
	local task = taskqueue[1]
	if (type(task) == "function") then
		task = coroutine.create(task)
		taskqueue[1] = task
	end
	
	coroutine.resume(task)
	
	if (coroutine.status(task) == "dead") then
		table.remove(taskqueue, 1)
	end
end

This ought to be straightforward enough. I schedule tasks by adding functions 
to the taskqueue table. They get run in stages until termination and then 
removed. Here's a sample task:

function a_task()
	print("Hello,")
	coroutine.yield()
	print("world!")
end

However, whenever I try to actually use this, the call to coroutine.yield() 
fails with a 'attempt to yield across metamethod/C-call boundary' error.

Why?

-- 
+- David Given --McQ-+ "I must have spent at least ten minutes out of my
|  [hidden email]    | life talking to this joker like he was a sane
| ([hidden email]) | person. I want a refund." --- Louann Miller, on
+- www.cowlark.com --+ rasfw

Attachment: pgpGLHyXCNc0K.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Monday 13 February 2006 22:51, David Given wrote:
[...]
> Why?

It occurs to me that my previous message might be seen as being a little 
brusque. It's just I've been banging my head against this for a good while 
now, and have no idea what's going on. Any suggestions or advice will be 
gratefully appreciated.

-- 
+- David Given --McQ-+ "Working with Unix is like wrestling a worthy
|  [hidden email]    | opponent. Working with Windows is like attacking a
| ([hidden email]) | small whining child who is carrying a .38." ---
+- www.cowlark.com --+ Nancy Lebovitz

Attachment: pgp8VskVQhdgr.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

D Burgess-4
Well, the message says it all. And it if you a reading the 5.1
manual then you wont find much help.

This subject has been done to death by many. Supermike who has
produced Coco (see the Wiki) which solves the issue you are
dealing with.

David B.

On 2/14/06, David Given <[hidden email]> wrote:
> On Monday 13 February 2006 22:51, David Given wrote:
> [...]
> > Why?
>
> It occurs to me that my previous message might be seen as being a little
> brusque. It's just I've been banging my head against this for a good while
> now, and have no idea what's going on. Any suggestions or advice will be
> gratefully appreciated.
>
> --
> +- David Given --McQ-+ "Working with Unix is like wrestling a worthy
> |  [hidden email]    | opponent. Working with Windows is like attacking a
> | ([hidden email]) | small whining child who is carrying a .38." ---
> +- www.cowlark.com --+ Nancy Lebovitz
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Monday 13 February 2006 23:18, D Burgess wrote:
> Well, the message says it all. And it if you a reading the 5.1
> manual then you wont find much help.
>
> This subject has been done to death by many. Supermike who has
> produced Coco (see the Wiki) which solves the issue you are
> dealing with.

Actually, this is on 5.0. And can you give me any pointers to *where* in the 
wiki this page is? The only thing I can find is the ejcoro patch. (It's not 
the best-organised wiki in the world.) And I'd rather like to avoid having to 
patch Lua; I'm linking against a system shared library.

The thing is, *why* am I getting this error? I am not doing what it says I'm 
doing. I have a C function that's calling into Lua, which is shuffling pure 
Lua coroutines and occasionally calls non-reentrant C functions, and then 
exits back to the calling C function. That's it. I am *not* trying to yield 
across a C boundary that I am aware of.

-- 
+- David Given --McQ-+ "In America, family has become a code word for
|  [hidden email]    | something that you can put a five-year-old in front
| ([hidden email]) | of and come back secure in the knowledge that your
+- www.cowlark.com --+ child not will not have been exposed to any ideas."

Attachment: pgpoVPE6YAhFW.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

D Burgess-4
> Actually, this is on 5.0. And can you give me any pointers to *where* in the
> wiki this page is?

Ny apologies, it is on luaforge.

http://luajit.luaforge.net/coco.html


Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mark Hamburg-4
In reply to this post by David Given
on 2/13/06 3:50 PM, David Given at [hidden email] wrote:

> The thing is, *why* am I getting this error? I am not doing what it says I'm
> doing. I have a C function that's calling into Lua, which is shuffling pure
> Lua coroutines and occasionally calls non-reentrant C functions, and then
> exits back to the calling C function. That's it. I am *not* trying to yield
> across a C boundary that I am aware of.

Your C code may need to use lua_resume rather than lua_call. Of course, then
it needs to work in terms of coroutines rather than functions.

You may also be getting bitten somewhere by pcall being a C function. I
suspect that this is the most common place where C code slips onto the stack
unnoticed and if Lua 5.2 doesn't address the issue wholesale, I would hope
that we could find a solution specifically for pcall.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
[...]
> Your C code may need to use lua_resume rather than lua_call. Of course,
> then it needs to work in terms of coroutines rather than functions.

I did try making the scheduler itself a coroutine that the C code resumed; it 
failed oddly, complaining that lua_resume() was trying to call a function 
value. Looking at the code, it would appear that coroutine.create() does a 
lot of stuff as well as calling lua_newthread(), and that the two might not 
be strictly compatible.

I did try having a wrapper around the scheduler as follows:

local scheduler
function schedule_wrapper()
	scheduler = coroutine.create(schedule_now)
	coroutine.resume(scheduler)
end

This allowed me to change my call to coroutine.yield() with a call to 
coroutine.resume(scheduler). This avoided the C boundary error, but caused 
all kinds of other weird stuff to happen.

How does yield() know where to transfer execution to? Does it go to the most 
recent resume() that caused the current coroutine to be run?

> You may also be getting bitten somewhere by pcall being a C function. I
> suspect that this is the most common place where C code slips onto the
> stack unnoticed and if Lua 5.2 doesn't address the issue wholesale, I would
> hope that we could find a solution specifically for pcall.

Surely *all* Lua code gets run through a pcall, though? I really don't 
understand what's going on here.

-- 
+- David Given --McQ-+ "The time has come," Mithrandir said, "To talk of
|  [hidden email]    | many things: Of Moria, and bridges, and deep
| ([hidden email]) | fissures --- of Balrogs, and their wings." ---
+- www.cowlark.com --+ Meneldil on rasfw

Attachment: pgpK_icid17FV.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Adrian Sietsma
David Given wrote:
[...]
How does yield() know where to transfer execution to? Does it go to the most recent resume() that caused the current coroutine to be run?
yes.
yield() returns to the caller of the coroutine, ie. coroutine.resume() will return when the resumed coroutine yields, returns, or dies.

Adrian


Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
In reply to this post by Mark Hamburg-4
On Tuesday 14 February 2006 07:14, you wrote:
[...]
> Your C code may need to use lua_resume rather than lua_call. Of course,
> then it needs to work in terms of coroutines rather than functions.

The good news is that switching to use a local copy of Lua 5.1w6 with Mike 
Pall's Resumable VM patch now works fine. The bad news is that Lua 5.1w6 is 
rather elderly and doesn't contain a number of features that are now 
confirmed for 5.1 (such as string.gmatch). Also, it's crashing oddly with 
valgrind complaining about uninitialised memory in index2adr, but that may 
not be its fault.

Does anyone know if the resumable VM patch is available for either 5.1rc or 
5.0?

I still don't know why this isn't working, either (it doesn't work on vanilla 
5.1rc either).

-- 
+- David Given --McQ-+ "This is the captain. We have a little problem
|  [hidden email]    | with our reentry sequence, so we may experience
| ([hidden email]) | some slight turbulence and then explode." --- Mal
+- www.cowlark.com --+ Reynolds, _Serenity_

Attachment: pgp20av6WTH6y.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mike Pall-2-2
Hi,

David Given wrote:
> Does anyone know if the resumable VM patch is available for either 5.1rc or 
> 5.0?

Nope. But there's Coco which is functionally the same for your
purposes. I'll release an update to Coco whenever Lua 5.1 final
is out. Mail me if you need a pre-release against 5.1-rc4.

> I still don't know why this isn't working, either (it doesn't work on vanilla 
> 5.1rc either).

IMHO it's better to find out the reason for this. It may point to
a mistake in your program or a misunderstanding. The example you
gave runs fine with a plain Lua script. My guess is that you are
using pcall() or a callback from C code somewhere in a coroutine.

Bye,
     Mike

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Wednesday 15 February 2006 15:48, Mike Pall wrote:
[...]
> Nope. But there's Coco which is functionally the same for your
> purposes. I'll release an update to Coco whenever Lua 5.1 final
> is out. Mail me if you need a pre-release against 5.1-rc4.

The problem with Coco, though, is that because it's doing C stack swapping 
it's an order of magnitude more difficult to make portable. Currently all my 
code is generic and doesn't care what platform it's running on. With Coco, my 
build system will have to know details about the platform configuration, etc. 
The Resumable VM patch doesn't have any of this stuff, which I don't need 
anyway.

> > I still don't know why this isn't working, either (it doesn't work on
> > vanilla 5.1rc either).
>
> IMHO it's better to find out the reason for this. It may point to
> a mistake in your program or a misunderstanding. The example you
> gave runs fine with a plain Lua script. My guess is that you are
> using pcall() or a callback from C code somewhere in a coroutine.

The only thing I can think of is that Gaim is somehow calling back into my 
plugin while I'm running code from within my plugin. Which is evil.

Looking at the Lua code, I can't actually figure out how anything works in the 
first place. In (Lua 5.0) lua_yield(), I get the boundary error if nCcalls>0. 
But luaD_call() always increments nCcalls while doing the call, and as far as 
I can tell, all cases where C is calling Lua pass through there; so how can 
it ever *be* zero?

-- 
+- David Given --McQ-+ "It's the dawn of the day before the time of the
|  [hidden email]    | land that the lost dinosaurs forgot to remember!"
| ([hidden email]) | --- Ookla the Mok
+- www.cowlark.com --+ 

Attachment: pgpMSshVtGS5n.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Wednesday 15 February 2006 16:35, David Given wrote:
[...]
> Looking at the Lua code, I can't actually figure out how anything works in
> the first place. In (Lua 5.0) lua_yield(), I get the boundary error if
> nCcalls>0. But luaD_call() always increments nCcalls while doing the call,
> and as far as I can tell, all cases where C is calling Lua pass through
> there; so how can it ever *be* zero?

I've done some more investigation using Lua 5.0 without the patch. (To try and 
figure out what's happening.)

My code use tolua, which BTW is incredibly nice. I have exactly two places 
where I'm calling lua_pcall, and they're both in the same function. The code 
looks like this:

callback()
{
	count++
	lua_getglobal(L, "queuetask")
	lua_pcall(L)
	count--

	if (count == 0)
	{
		count++
		lua_getglobal(L, "schedule")
		lua_pcall(L)
		count--
	}
}

That is, callback() calls the queuetask() function in Lua. This adds a new 
function onto my task list. On return, if nobody's using Lua, then I call my 
scheduler. This pulls the first thing off the task list, and if it's a 
function turns it into a coroutine. Then it resumes it. When the coroutine 
yields, the scheduler returns back to the function above.

The *first* time the scheduler is called, this happens:

* Scheduler pulls a function off the task list.
* Scheduler turns it into a coroutine with coroutine.create() and replaces it 
on the task list.
* Scheduler resumes it.
* Coroutine does some work.
* Coroutine calls coroutine.yield.
* 'Attempt to yield across yada yada'.
* Coroutine dies.
* Scheduler resumes, and sees dead coroutine.

Creating the coroutine, resuming it, and then trying to yield from it all 
occur in the *same* invocation of Lua. Why is this failing?

Looking at ldo.c, I see this:

void luaD_call (lua_State *L, StkId func, int nResults) {
...
  if (++L->nCcalls >= LUA_MAXCCALLS) {
...
  }
  firstResult = luaD_precall(L, func);
  if (firstResult == NULL)  /* is a Lua function? */
    firstResult = luaV_execute(L);  /* call it */
...
  L->nCcalls--;
...
}

This functions appears, from what I can tell, to be the bottleneck through 
which all invocations of Lua pass. It is not possible to execute Lua, from 
outside Lua, without passing though the above code. nCcalls is initialised to 
zero, therefore barring overflow, there should be no way that it can be zero 
while luaV_execute() is being called.

Since lots of people (including me) have had this all working, I'm obviously 
wrong. Can anybody point out what I'm missing? Please?

-- 
+- David Given --McQ-+ "I must have spent at least ten minutes out of my
|  [hidden email]    | life talking to this joker like he was a sane
| ([hidden email]) | person. I want a refund." --- Louann Miller, on
+- www.cowlark.com --+ rasfw

Attachment: pgp7_NYPUwU9k.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mike Pall-2-2
Hi,

David Given wrote:
> * Coroutine calls coroutine.yield.
> * 'Attempt to yield across yada yada'.

Try print(debug.traceback()) immediately before the yield and
ye shall see.

> void luaD_call (lua_State *L, StkId func, int nResults) {
> 
> This functions appears, from what I can tell, to be the bottleneck through 
> which all invocations of Lua pass. It is not possible to execute Lua, from 
> outside Lua, without passing though the above code.

No. luaV_execute() is called from two places.
You want the other one. It's called resume(). :-)

Ok path:

-> lua_resume
--> resume
---> luaV_execute
----> luaD_precall
-----> luaB_yield
------> lua_yield
<------

Your problem:

-> lua_resume
--> resume
---> luaV_execute
----> luaD_precall
-----> luaB_pcall or a C function which calls back into Lua <-- !!
------> lua_pcall or lua_call
-------> luaD_call       nCcalls++
--------> luaV_execute
---------> luaD_precall
----------> luaB_yield
-----------> lua_yield   nCcalls > 0 -> throw error

Bye,
     Mike

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

David Given
On Thursday 16 February 2006 00:53, Mike Pall wrote:
[...]
> Try print(debug.traceback()) immediately before the yield and
> ye shall see.

I'm slightly embarrassed to admit I didn't think about that.

[...]
> No. luaV_execute() is called from two places.
> You want the other one. It's called resume(). :-)

I found that, but I'm never calling it, and it doesn't change nCcalls unless 
cleaning up after an error, so it shouldn't have been relevant.

[...]
> Your problem:
>
> -> lua_resume
> --> resume
> ---> luaV_execute
> ----> luaD_precall
> -----> luaB_pcall or a C function which calls back into Lua <-- !!

The problem was that my coroutine top-level function runs everything inside a 
pcall, so it can trap errors. And pcall, despite being a pure Lua function, 
apparently counts as a C call and you can't yield across it. Ack.

It might be worth emphasising this in the manual; I would never have thought 
of this --- pcall looks like pure Lua, and therefore I assumed that you could 
yield across it.

Now it's all working, and the R is back into my RAD. Thanks for all the help, 
everyone!

-- 
+- David Given --McQ-+ "There is // One art // No more // No less // To
|  [hidden email]    | do // All things // With art // Lessness." --- Piet
| ([hidden email]) | Hein
+- www.cowlark.com --+ 

Attachment: pgpwaMkn9dqBw.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Luiz Henrique de Figueiredo
> pcall looks like pure Lua, and therefore I assumed that you could
> yield across it.

Does pcall really look like pure Lua? Lua is like C in this respect: it
has NO builtin functions. So, all functions provided by the standard Lua
library are C functions. --lhf

Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Mark Hamburg-4
I think the problem is that pcall is relatively special compared to the rest
of the library in that it provides flow control and in fact more or less has
to be used for certain pieces of flow control. In other words, it's
functionality is essential enough to the use of Lua that it feels more like
part of the language than part of a C library.

The work around is to use something like coxpcall which uses coroutines to
simulate pcall semantics.

http://luaforge.net/frs/?group_id=6

Mark

on 2/16/06 3:35 AM, Luiz Henrique de Figueiredo at [hidden email]
wrote:

>> pcall looks like pure Lua, and therefore I assumed that you could
>> yield across it.
> 
> Does pcall really look like pure Lua? Lua is like C in this respect: it
> has NO builtin functions. So, all functions provided by the standard Lua
> library are C functions. --lhf


Reply | Threaded
Open this post in threaded view
|

Re: Coroutines & C boundaries

Adrian Sietsma
In reply to this post by David Given
David Given wrote:
[...]

The problem was that my coroutine top-level function runs everything inside a pcall, so it can trap errors. And pcall, despite being a pure Lua function, apparently counts as a C call and you can't yield across it. Ack.

If you are running functions by resuming coroutines, do you need pcall ?
Lua errors, asserts, etc within the coro should all just abort the coroutine and return to caller (resumer).