lua_newstate

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

lua_newstate

Gregg Reynolds-2
"Creates a new thread running in a new, independent state...."

What does "thread" mean here? A pthread, for example?

It returns a lua_State, which is "An opaque structure that points to a thread..."

Same question.

Thanks,
Gregg
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Luiz Henrique de Figueiredo
> "Creates a new thread running in a new, independent state...."
>
> What does "thread" mean here? A pthread, for example?

"The type thread represents independent threads of execution and
it is used to implement coroutines (see §2.6). Lua threads are
not related to operating-system threads. Lua supports coroutines
on all systems, even those that do not support threads natively."
https://www.lua.org/manual/5.3/manual.html#2.1

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Gregg Reynolds-2


On Dec 22, 2017 12:52 PM, "Luiz Henrique de Figueiredo" <[hidden email]> wrote:
> "Creates a new thread running in a new, independent state...."
>
> What does "thread" mean here? A pthread, for example?

"The type thread represents independent threads of execution ...

Thanks, but I hope you can see why I find this language a little opaque. "Type thread represents thread" is circular.

I take that the Lua engine implements it's own lthread infrastructure, which is non-pre-emptive. Yeah? So a Lua thread is basically a set of state data - an environment in which code executes. Afaik exactly parallel to any generic threading lib like pthreads (where a "thread" is actually a data structure). No? (I confess I find the language of threads confusing in general.)
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Sean Conner
It was thus said that the Great Gregg Reynolds once stated:

> On Dec 22, 2017 12:52 PM, "Luiz Henrique de Figueiredo" <
> [hidden email]> wrote:
>
> > "Creates a new thread running in a new, independent state...."
> >
> > What does "thread" mean here? A pthread, for example?
>
> "The type thread represents independent threads of execution ...
>
>
> Thanks, but I hope you can see why I find this language a little opaque.
> "Type thread represents thread" is circular.
>
> I take that the Lua engine implements it's own lthread infrastructure,
> which is non-pre-emptive. Yeah? So a Lua thread is basically a set of state
> data - an environment in which code executes. Afaik exactly parallel to any
> generic threading lib like pthreads (where a "thread" is actually a data
> structure). No? (I confess I find the language of threads confusing in
> general.)

  Lua uses coroutines, not threads (even though the Lua documentation calls
coroutines "threads").  I'll attempt to give a quick run down of the
differences, using the Intel x86-64 architecture as a concrete example
(given that it's so prevalent) and I'll start with a term I learned in
college---a "unit of execution."

  A "unit of execution" is the set of CPU registers.  The x86-64 contain a
large number of them (RAX, R15, XMM0, etc.) that are used during the
execution of a program, but the two most important ones are RIP (the
instruction pointer) and RSP (the stack pointer).  Swap out RIP and RSP (and
possibly others) and you have another "unit of execution" (as long as RIP
points to actual code, and RSP points to usable memory).

  Another term for "unit of execution" here is "coroutine."  You can see an
implementation for the x86-64 here [1].  It takes an explicit call to switch
the RIP and ESP.  These may also be called "fibers" (from Microsoft) or
"green threads" (other systems I'm blanking on). All three require explicit
action from the program to switch "units of execution."

  A "thread" is a "unit of execution" that doesn't need an explicit call to
switch execution.  Well, it does, but the "units of execution" don't need to
call the switching function.  Generally, you will have some periodic
interruption (such as a timer chip) that interrupts the current "unit of
execution."  Then this "interrupt handler" will possibly switch from the
current "unit of execution" that was running with a different one and resume
the newer one.  This "interrupt handler" is usually in the kernel and thus,
most times "threads" are managed by the kernel [2].

  A "process" is just a thread but with its own address space [3] and
definitely managed by the kernel.  But a "process" is just more than a "unit
of execution" as they are the abstraction by which system resources are
handed out---to a "process" as a whole (memory, files, etc.).  So a
"process" contains resources plus at least one "unit of execution."  There
can be many "units of execution" within a "process" but they all share the
resources of the overall "process."

  Getting back to Lua, Lua is based upon a VM, so it has its own concept of
RIP and RSP.  A call to lua_newstate() (or luaL_newstate()) returns a new
Lua VM context, which can be likened to a "process" in that it manages the
resources and contains at least one "unit of execution." The function
lua_newthread() creates a new "unit of execution" within the current Lua VM,
which the Lua documentation calls a "thread" but is more like a "coroutine"
although in any case, it's a "unit of execution" that needs to be
explicitely switched to.  And this "unit of execution" only has meaning to
the Lua VM---it is totally unrelated to the system notion of "units of
execution" (like threads or processes).

  -spc (Hope this helps some ... )

[1] https://github.com/spc476/C-Coroutines

[2] Not always.  On Unix, one could set up a signal handler for SIGALRM
        and do the switching, but it's more involved that simple coroutines
        [1].

[3] On systems with virtual memory, which is most systems today.  On
        systems without virtual memory, you can still have a "process" but
        it's a polite fiction with some wavy hand action.

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Enrico Colombini
On 23-Dec-17 00:41, Sean Conner wrote:

>    Getting back to Lua, Lua is based upon a VM, so it has its own concept of
> RIP and RSP.  A call to lua_newstate() (or luaL_newstate()) returns a new
> Lua VM context, which can be likened to a "process" in that it manages the
> resources and contains at least one "unit of execution." The function
> lua_newthread() creates a new "unit of execution" within the current Lua VM,
> which the Lua documentation calls a "thread" but is more like a "coroutine"
> although in any case, it's a "unit of execution" that needs to be
> explicitely switched to.  And this "unit of execution" only has meaning to
> the Lua VM---it is totally unrelated to the system notion of "units of
> execution" (like threads or processes).

Thanks for this clear comparison between Lua 'threads' (coroutines) and
native threads. I have always been too lazy to look up how Lua
coroutines actually work and I did not visualize the Lua VM as a sort of
CPU (though in retrospective it should have been obvious).

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Gregg Reynolds-2
In reply to this post by Sean Conner


On Dec 22, 2017 5:42 PM, "Sean Conner" <[hidden email]> wrote:
It was thus said that the Great Gregg Reynolds once stated:
> On Dec 22, 2017 12:52 PM, "Luiz Henrique de Figueiredo" <
> [hidden email]> wrote:
>
> > "Creates a new thread running in a new, independent state...."
> >
> > What does "thread" mean here? A pthread, for example?
>
> "The type thread represents independent threads of execution ...
>
>
> Thanks, but I hope you can see why I find this language a little opaque.
> "Type thread represents thread" is circular.
>
> I take that the Lua engine implements it's own lthread infrastructure,
> which is non-pre-emptive. Yeah? So a Lua thread is basically a set of state
> data - an environment in which code executes. Afaik exactly parallel to any
> generic threading lib like pthreads (where a "thread" is actually a data
> structure). No? (I confess I find the language of threads confusing in
> general.)

  Lua uses coroutines, not threads (even though the Lua documentation calls
coroutines "threads").  I'll attempt to give a quick run down of the
differences, using the Intel x86-64 architecture as a concrete example
(given that it's so prevalent) and I'll start with a term I learned in
college---a "unit of execution."

  A "unit of execution" is the set of CPU registers.  The x86-64 contain a
large number of them (RAX, R15, XMM0, etc.) that are used during the
execution of a program, but the two most important ones are RIP (the
instruction pointer) and RSP (the stack pointer).  Swap out RIP and RSP (and
possibly others) and you have another "unit of execution" (as long as RIP
points to actual code, and RSP points to usable memory).

  Another term for "unit of execution" here is "coroutine."  You can see an
implementation for the x86-64 here [1].  It takes an explicit call to switch
the RIP and ESP.  These may also be called "fibers" (from Microsoft) or
"green threads" (other systems I'm blanking on). All three require explicit
action from the program to switch "units of execution."

  A "thread" is a "unit of execution" that doesn't need an explicit call to
switch execution.  Well, it does, but the "units of execution" don't need to
call the switching function.  Generally, you will have some periodic
interruption (such as a timer chip) that interrupts the current "unit of
execution."  Then this "interrupt handler" will possibly switch from the
current "unit of execution" that was running with a different one and resume
the newer one.  This "interrupt handler" is usually in the kernel and thus,
most times "threads" are managed by the kernel [2].

  A "process" is just a thread but with its own address space [3] and
definitely managed by the kernel.  But a "process" is just more than a "unit
of execution" as they are the abstraction by which system resources are
handed out---to a "process" as a whole (memory, files, etc.).  So a
"process" contains resources plus at least one "unit of execution."  There
can be many "units of execution" within a "process" but they all share the
resources of the overall "process."

  Getting back to Lua, Lua is based upon a VM, so it has its own concept of
RIP and RSP.  A call to lua_newstate() (or luaL_newstate()) returns a new
Lua VM context, which can be likened to a "process" in that it manages the
resources and contains at least one "unit of execution." The function
lua_newthread() creates a new "unit of execution" within the current Lua VM,
which the Lua documentation calls a "thread" but is more like a "coroutine"
although in any case, it's a "unit of execution" that needs to be
explicitely switched to.  And this "unit of execution" only has meaning to
the Lua VM---it is totally unrelated to the system notion of "units of
execution" (like threads or processes).

  -spc (Hope this helps some ... )

Definitely - well said! Something along these lines in PIL and/or the refman would be useful.

But now: what's the relationship between luathreads and platform threads? There is always only one lua vm, right? In my use case, lua_newstate could be called simultaneously on multiple distinct platform threads. That would work so long as the Lua engine/vm is reentrant, yes?

So my (wrapped, c) lib may call lua_newstate and then pass control to a user-defined Lua callback on multiple platform threads, in parallel. So long as the callback does not use (Lua) globals, everything should be copacetic, yes?

What if the cb needs to read/write shared data? Do it by calling a C function that handles synchronization? Use coroutines in some way?

Thanks, 
G
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Sean Conner
It was thus said that the Great Gregg Reynolds once stated:

> On Dec 22, 2017 5:42 PM, "Sean Conner" <[hidden email]> wrote:
> >
> >   Getting back to Lua, Lua is based upon a VM, so it has its own concept of
> > RIP and RSP.  A call to lua_newstate() (or luaL_newstate()) returns a new
> > Lua VM context, which can be likened to a "process" in that it manages the
> > resources and contains at least one "unit of execution." The function
> > lua_newthread() creates a new "unit of execution" within the current Lua VM,
> > which the Lua documentation calls a "thread" but is more like a "coroutine"
> > although in any case, it's a "unit of execution" that needs to be
> > explicitely switched to.  And this "unit of execution" only has meaning to
> > the Lua VM---it is totally unrelated to the system notion of "units of
> > execution" (like threads or processes).
> >
> >   -spc (Hope this helps some ... )
>
> Definitely - well said! Something along these lines in PIL and/or the
> refman would be useful.
>
> But now: what's the relationship between luathreads and platform threads?

  There is no relationship between the two.

> There is always only one lua vm, right?

  There doesn't have to be:

        lua_State L1 = luaL_newstate();
        lua_State L2 = luaL_newstate();

  What happens in L1 does not affect L2 and vice versa.  They are
independent of each other.  

> In my use case, lua_newstate could
> be called simultaneously on multiple distinct platform threads. That would
> work so long as the Lua engine/vm is reentrant, yes?

  Lua is reentrant, but (by default) not thread safe.  That is, if a system
thread (pthread) calls into Lua with state L1, and another system thread
calls into Lua with state L1 at the same time, things blow up.  But system
thread 1 calling into Lua with state L1 and system thread 2 calling into Lua
with state L2 is fine.

  There is a way to compile Lua to make it thread safe, but 1) that means
you need to include your custom Lua library with your application and 2)
will slow Lua down to Python speeds.

> So my (wrapped, c) lib may call lua_newstate and then pass control to a
> user-defined Lua callback on multiple platform threads, in parallel. So
> long as the callback does not use (Lua) globals, everything should be
> copacetic, yes?

  Not if all the system threads use the same Lua state.

> What if the cb needs to read/write shared data? Do it by calling a C
> function that handles synchronization? Use coroutines in some way?

  Yes.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Viacheslav Usov
On Sat, Dec 23, 2017 at 11:35 PM, Sean Conner <[hidden email]> wrote:

>  will slow Lua down to Python speeds.

A reference to such a comparison would be good here.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Javier Guerra Giraldez
On 24 December 2017 at 08:41, Viacheslav Usov <[hidden email]> wrote:
> On Sat, Dec 23, 2017 at 11:35 PM, Sean Conner <[hidden email]> wrote:
>
>>  will slow Lua down to Python speeds.
>
> A reference to such a comparison would be good here.

if that was a reference to defining those macros that essentially add
lock operations to every mutation, it does feel about right since in
essence it adds a Global Interpreter Lock, the dreaded "GIL problem"
that made Python folks essentially abandon threading in favor of
communicating processes.

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Viacheslav Usov
On Wed, Dec 27, 2017 at 10:20 AM, Javier Guerra Giraldez <[hidden email]> wrote:

>  if that was a reference to defining those macros that essentially add lock operations to every mutation, it does feel about right since in essence it adds a Global Interpreter Lock, the dreaded "GIL problem" that made Python folks essentially abandon threading in favor of communicating processes.

I mean a reference to a test, with a full disclosure of the methods, whose results would indicate that something "slowed Lua down to Python speeds". Note that even the formulation of the statement is dubious, because apparently a non-MT-safe version of Lua is compared with an MT-safe version of Lua, which can only reasonably be done with a single-threaded test, so what is being compared to what and why is that relevant? Secondly, the formulation suggests that "stock" Lua is significantly "faster" than Python, while there are some results, such as [1], that show that the situation is more complicated than that.

Cheers,
V.

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Sean Conner
It was thus said that the Great Viacheslav Usov once stated:

> On Wed, Dec 27, 2017 at 10:20 AM, Javier Guerra Giraldez <[hidden email]
> > wrote:
>
> >  if that was a reference to defining those macros that essentially add
> lock operations to every mutation, it does feel about right since in
> essence it adds a Global Interpreter Lock, the dreaded "GIL problem" that
> made Python folks essentially abandon threading in favor of communicating
> processes.
>
> I mean a reference to a test, with a full disclosure of the methods, whose
> results would indicate that something "slowed Lua down to Python speeds".
> Note that even the formulation of the statement is dubious, because
> apparently a non-MT-safe version of Lua is compared with an MT-safe version
> of Lua, which can only reasonably be done with a single-threaded test, so
> what is being compared to what and why is that relevant? Secondly, the
> formulation suggests that "stock" Lua is significantly "faster" than
> Python, while there are some results, such as [1], that show that the
> situation is more complicated than that.

  I seriously question the methods used for the Python programs.  In each
case where Python won (with the exception of mandelbrot), it included a
multiprocessing module, whereas Lua was just the stock Lua, so it's not a
fair "apples-to-apples" comparison (in this case, Python's "batteries
included" gives it a boost with respect to Lua's "batteries not included."

  As for what is being measured, it's the overhead of the "global
interpreter lock" (a concept in Python, not in Lua) to make Lua thread safe,
and such locks do take time.  

  Be that as it mays, I was only able to run one test before the machine I
was testing on shutdown due to thermal issues [2] but it is illustrative of
the issue.  Anyway, the machine I tested on was an Unbuntu Linux 64-bit with
quad core.  The Lua version is 5.3 with all bugs patched [3].  The first
test is with the stock definition of lua_lock() and lua_unlock() (basically,
no implementation):

[spc]saltmine-2:/tmp/bm>time lua-53 fasta.lua-2.lua 25000000 >output-fasta

real    0m32.409s
user    0m31.970s
sys     0m0.304s

  This is faster than the results on the benchmark site:

source  secs    mem     gz      cpu     cpu load
Lua     50.08   2,920   1061    50.06   0% 7% 94% 0%

but in reading about the website tests [6], they sample the program for
memory usage every 0.2 seconds, which I'm not doing, so that *may* explain
the difference.  I don't know.  The version of Ubuntu wasn't specified, nor
the actual x86-64 quadcore chip, so that may also have an effect on the
results.

  I digress.  For the next test, I modified the Lua-5.3 source code to
include locking:

diff --git a/src/llimits.h b/src/llimits.h
index f21377f..0044e12 100644
--- a/src/llimits.h
+++ b/src/llimits.h
@@ -211,8 +211,8 @@ typedef unsigned long Instruction;
 ** ('lua_lock') and leaves the core ('lua_unlock')
 */
 #if !defined(lua_lock)
-#define lua_lock(L)    ((void) 0)
-#define lua_unlock(L)  ((void) 0)
+#define lua_lock(L)    pthread_mutex_lock(&(L)->l_G->lock)
+#define lua_unlock(L)  pthread_mutex_unlock(&(L)->l_G->lock)
 #endif
 
 /*
diff --git a/src/lstate.c b/src/lstate.c
index 9194ac3..dfbe88f 100644
--- a/src/lstate.c
+++ b/src/lstate.c
@@ -12,7 +12,7 @@
 
 #include <stddef.h>
 #include <string.h>
-
+#include <pthread.h>
 #include "lua.h"
 
 #include "lapi.h"
@@ -328,6 +328,7 @@ LUA_API lua_State *lua_newstate (lua_Alloc f, void *ud) {
   g->gcfinnum = 0;
   g->gcpause = LUAI_GCPAUSE;
   g->gcstepmul = LUAI_GCMUL;
+  pthread_mutex_init(&g->lock,NULL);
   for (i=0; i < LUA_NUMTAGS; i++) g->mt[i] = NULL;
   if (luaD_rawrunprotected(L, f_luaopen, NULL) != LUA_OK) {
     /* memory allocation error: free partial state */
diff --git a/src/lstate.h b/src/lstate.h
index a469466..5677e6e 100644
--- a/src/lstate.h
+++ b/src/lstate.h
@@ -112,6 +112,7 @@ typedef struct CallInfo {
 #define setoah(st,v)   ((st) = ((st) & ~CIST_OAH) | (v))
 #define getoah(st)     ((st) & CIST_OAH)
 
+#include <pthread.h>
 
 /*
 ** 'global state', shared by all threads of this state
@@ -151,6 +152,7 @@ typedef struct global_State {
   TString *tmname[TM_N];  /* array with tag-method names */
   struct Table *mt[LUA_NUMTAGS];  /* metatables for basic types */
   TString *strcache[STRCACHE_N][STRCACHE_M];  /* cache for strings in API */
+  pthread_mutex_t lock;
 } global_State;
 
I figure this was the minimum required for the job.  What that done, I
compiled the custom version of Lua and reran the test:

[spc]saltmine-2:/tmp/bm>time ./lua-5.3/src/lua fasta.lua-2.lua 25000000 >output-fasta

real    0m36.767s
user    0m36.010s
sys     0m0.320s

  Definitely slower.  And the ratio between these results (32.4 / 36.8 or
.88) is close enough to the benchmark results (50.0 / 59.5 or .84) seems to
indicate that yes, the modified version may be a slow as Python [4].

  I did run the n-body test [5] for a baseline and got a result of 283.5
seconds.  I was running the modified Lua version when the CPU overheated and
shutdown (and I lost connection to the box).  I do not know how long it ran
before shutting down.

  -spc (Cheers!)

> [1] https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=lua&lang2=python3

[2] A Linux laptop at work---it matches the specs given on the website
        [1] so that's why I was using it.  I won't be able to finish this
        until I get back to the office after the New Year.

        Yes, that laptop is a bit temperamental.  Sigh.

[3] https://www.lua.org/bugs.html#5.3.4

[4] I'm did not install Python3 as I was trying to get results for Lua.
        Also, I'm on vacation.

[5] https://benchmarksgame.alioth.debian.org/u64q/nbody-description.html#nbody

[6] https://benchmarksgame.alioth.debian.org/how-programs-are-measured.html

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Viacheslav Usov
On Thu, Dec 28, 2017 at 1:40 AM, Sean Conner <[hidden email]> wrote:

>  I seriously question the methods used for the Python programs.  In each case where Python won (with the exception of mandelbrot), it included a multiprocessing module, whereas Lua was just the stock Lua, so it's not a fair "apples-to-apples" comparison (in this case, Python's "batteries included" gives it a boost with respect to Lua's "batteries not included."

In fact, even in the mandelbrot case the Python program used multiprocessing, it is just a little less obvious. And the fasta test, too, uses multiprocessing. 

That is interesting indeed, but that also means that the test suite I referenced does not say much about strictly serial Lua vs strictly serial Python performance, there is only one relevant comparison. Worse yet, the fasta test, which you used below as a benchmark of Python's "slowness" vs Lua, and then stock Lua vs patched Lua, was originally parallel Python vs serial Lua.

> As for what is being measured, it's the overhead of the "global interpreter lock" (a concept in Python, not in Lua) to make Lua thread safe, and such locks do take time.

I understand embedding Python much less than Lua, but somehow I thought that "global" means really global, so that multiple instance of Python's equivalents of the Lua state (if that is possible at all) would share the same lock. Your patch introduces one lock per state. This is probably the right thing to do, given the context of this thread.

Secondly, your patch uses a pthread mutex initialised with default attributes. A different mechanism or different attributes  might yield different results.

>  And the ratio between these results (32.4 / 36.8 or .88) is close enough to the benchmark results (50.0 / 59.5 or .84) seems to indicate that yes, the modified version may be a slow as Python [4].

As I said above, "(50.0 / 59.5 or .84)" is serial Lua vs parallel Python.

I cannot say with certainty, but I would guess that the original Python fasta implementation uses multiprocessing to achieve better performance, which means the same test without multiprocessing would take longer. The only test where strictly serial Lua was compared with strictly serial Python was n-body, where Python was over 100% slower. In your test, the slowdown was less that 15%, which does not look like "as slow as Python", but, in fairness, this is still comparing results of two different tests.
 
Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Viacheslav Usov
On Thu, Dec 28, 2017 at 10:25 AM, Viacheslav Usov <[hidden email]> wrote:

>  In fact, even in the mandelbrot case the Python program used multiprocessing, it is just a little less obvious.

In fact, even the Lua program used multiprocessing in the mandelbrot case. Looking at the CPU load indications, that seems to be the only such case at https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=lua&lang2=python3

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Luiz Henrique de Figueiredo
In reply to this post by Sean Conner
> +  pthread_mutex_t lock;
>  } global_State;

Note that you don't need to change this struct to support OS threads.
Lua states contain an array of size LUA_EXTRASPACE that can be used
to store additional data. See also lua_getextraspace. The macros
luai_userstate* can be defined to manage OS threads.

Search the archives for examples. Note that LUA_EXTRASPACE was spelled
LUAI_EXTRASPACE before 5.3.

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Thijs Schreijer
In reply to this post by Gregg Reynolds-2
See http://www.thijsschreijer.nl/blog/?p=693


On 22 Dec 2017, at 19:46, Gregg Reynolds <[hidden email]> wrote:

"Creates a new thread running in a new, independent state...."

What does "thread" mean here? A pthread, for example?

It returns a lua_State, which is "An opaque structure that points to a thread..."

Same question.

Thanks,
Gregg

Reply | Threaded
Open this post in threaded view
|

Re: lua_newstate

Sean Conner
In reply to this post by Luiz Henrique de Figueiredo
It was thus said that the Great Luiz Henrique de Figueiredo once stated:

> > +  pthread_mutex_t lock;
> >  } global_State;
>
> Note that you don't need to change this struct to support OS threads.
> Lua states contain an array of size LUA_EXTRASPACE that can be used
> to store additional data. See also lua_getextraspace. The macros
> luai_userstate* can be defined to manage OS threads.
>
> Search the archives for examples. Note that LUA_EXTRASPACE was spelled
> LUAI_EXTRASPACE before 5.3.

  I saw LUA_EXTRASPACE, but I decided against it for a few reasons:

        1) A `void *` may not be big enough to store a pthread_mutex_t.

        2) A `void *` *is* big enough to store a pointer to a
           pthread_mutex_t, but then I would need to add code to allocate a
           pthread_mutex_t.

        3) I need to recompile Lua *anyway* with a new defintion for
           lua_lock() and lua_unlock().

        4) Since this is just for benchmarking, I wanted to do the simplest
           thing that could work.

  -spc