Squeezing more coroutines ?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Squeezing more coroutines ?

Bogdan Harjoc-2
Hi,

While writing a performance testing tool, I chose Lua to describe a set of tests that
run in parallel.

So far things are looking great, as far as both available cpu and memory usage are concerned: after increasing LUAI_MAXCSTACK, I've been testing with 300.000 concurrent and mostly idle threads, created with lua_newthread(). Memory usage
for these is about 350M, i.e. a little over 1KB per created thread.

After a glance at the lua_State definition, it looks like there a couple of members that could be left out but nothing that would noticeably reduce sizeof(lua_State).

Has anyone managed to create something on the order of 1,000,000 coroutines ? If so, any tips on what to leave out from lua_State (or other places) would be
appreciated.


Cheers!
Bogdan


Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

Stefan Sandberg
I don't want to be rude or anything, but whatever you're doing, there's a better way!

Of course, if you're just killing time figuring out the limit, awesome.. :)

Bogdan Harjoc wrote:
Hi,

While writing a performance testing tool, I chose Lua to describe a set of tests that
run in parallel.

So far things are looking great, as far as both available cpu and memory usage are concerned: after increasing LUAI_MAXCSTACK, I've been testing with 300.000 concurrent and mostly idle threads, created with lua_newthread(). Memory usage
for these is about 350M, i.e. a little over 1KB per created thread.

After a glance at the lua_State definition, it looks like there a couple of members that could be left out but nothing that would noticeably reduce sizeof(lua_State).

Has anyone managed to create something on the order of 1,000,000 coroutines ? If so, any tips on what to leave out from lua_State (or other places) would be
appreciated.


Cheers!
Bogdan




Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

Thomas Lauer-3
Stefan Sandberg <[hidden email]> wrote:
> I don't want to be rude or anything, but whatever you're doing, there's 
> a better way!

I don't want to be rude or anything, but how about judging something
only if and when you have the full picture? Perhaps you're right and
there's no need for such a high number of threads. Then again, with
memory chips a dozen a dime these days and programmer's time being the
scarce resource it has always been, who knows where Bogdan's trade-off
is?

One not-half-bad definition of a hacker is that he knows how to make a
system do things it wasn't designed to do. Most people who design
something have no complete picture of what their creation can or can't
do. Or put differently, the imagination of one man, even if he is the
cleverest guy ever born, has no chance to compete long-term with that of
a dedicated team of people, even if they're all of mediocre talent.

(One of these days, I really have to get rid of this philosophical hat
of mine.)

-- 
cheers  thomasl

web : http://thomaslauer.com/start


Reply | Threaded
Open this post in threaded view
|

RE: Squeezing more coroutines ?

Markus Heukelom
> (One of these days, I really have to get rid of this 
> philosophical hat of mine.)

Well - as some say "A philosopher has a problem to every solution" ;-)
(Btw, wearing my philosophical head: is "whatever you're doing, there is a
better way" not true in general?)

However, I do agree with Stefan, that a million threads is probably not the
best known solution, whatever the problem... But ofcourse I am happy to be
shown wrong here!

-Markus Heukelom



Reply | Threaded
Open this post in threaded view
|

OT: RE: Squeezing more coroutines ?

Robert Raschke-2
Markus wrote:
> However, I do agree with Stefan, that a million threads is probably not the
> best known solution, whatever the problem... But ofcourse I am happy to be
> shown wrong here!

Telco land has needs for enormous amounts of concurrency.  Have a look
at Erlang (http://www.erlang.org/) for one way of dealing with that.

Robby

--
r dot raschke at tombob dot com



Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

Thomas Lauer-3
In reply to this post by Markus Heukelom
"Markus Heukelom" <[hidden email]> wrote:
> > (One of these days, I really have to get rid of this 
> > philosophical hat of mine.)
> 
> Well - as some say "A philosopher has a problem to every solution" ;-)
> (Btw, wearing my philosophical head: is "whatever you're doing, there is a
> better way" not true in general?)

I was half-expecting this, hehe. Of course, there's always a better way.
It all depends on how you define "better" and how much resources you are
willing to throw at a problem.

Shift "better" and you have a whole new world. Get loads of money (or
more GHz, DRAM, whisky, ...) and you have a whole new world.

I just couldn't suppress my relativist urges when I read Stefan's
"absolute" remark. As we all know, absolute statements are always
wrong;-/

-- 
cheers  thomasl

web : http://thomaslauer.com/start


Reply | Threaded
Open this post in threaded view
|

Re: OT: RE: Squeezing more coroutines ?

Bogdan Harjoc-2
In reply to this post by Robert Raschke-2
Robert Raschke wrote:
Markus wrote:
However, I do agree with Stefan, that a million threads is probably not the
best known solution, whatever the problem... But ofcourse I am happy to be
shown wrong here!

Telco land has needs for enormous amounts of concurrency.  Have a look
at Erlang (http://www.erlang.org/) for one way of dealing with that.

Robby

Yours was the first on-topic reply, although I suppose I should have given more details if I was expecting on-topic repies. Yes, I am (aiming at) simulating 1 million voip clients, each placing a phonecall from time to time (maybe you 'guessed' by
looking at the email address ?).

And the reason I (assumed I) needed one thread per client was because I needed to be able to express client behaviour in a natural way (Lua coroutines + an event
loop being a good candidate for doing this).


Regards,
Bogdan

Reply | Threaded
Open this post in threaded view
|

RE: OT: RE: Squeezing more coroutines ?

Jerome Vuarand-2
Bogdan Harjoc wrote:
> Robert Raschke wrote:
>> Markus wrote:
>>> However, I do agree with Stefan, that a million threads is probably
>>> not the best known solution, whatever the problem... But ofcourse I
>>> am happy to be shown wrong here!
>> 
>> Telco land has needs for enormous amounts of concurrency.  Have a
>> look at Erlang (http://www.erlang.org/) for one way of dealing with
>> that. 
> 
> Yours was the first on-topic reply, although I suppose I should have
> given more details if I was expecting on-topic repies. Yes, I am
> (aiming at) simulating 1 million voip clients, each placing a
> phonecall from time to time (maybe you 'guessed' by looking at the
> email address ?).    
> 
> And the reason I (assumed I) needed one thread per client was because
> I needed to be able to express client behaviour in a natural way (Lua
> coroutines + an event 
> loop being a good candidate for doing this).

What you're trying to do is called a Multi-Agent System (MAS) [1]. In
many cases cooperative multi-threading is the best solution to MAS
problems, and coroutine are an example of cooperative multi-threading
systems. On the other hand Lua coroutines were designed as a general
programming tool, whereas getting millions of concurrent threads require
some specialized models (among other things to solve the memory problem
you encountered). Maybe you can have a look at Erlang, which was
designed for the kind of problems you have. And if it doesn't fit you're
probably better writing your own high level domain specific language,
with a kind of virtual machine to run it (the virtual machine can be
implemented in Lua and use coroutines, but maybe not one per agent).

[1] http://en.wikipedia.org/wiki/Multi-agent_systems


Reply | Threaded
Open this post in threaded view
|

Re: OT: RE: Squeezing more coroutines ?

Tony Finch
In reply to this post by Bogdan Harjoc-2
Regarding Erlang processes vs. Lua coroutines, Erlang processes take a
minimum of 318 words, i.e. 1272 bytes on a 32 bit machine, which is about
the same as a Lua coroutine (based on the numbers given earlier in this
thread). I doubt you'll do much better than this without some low-level
domain-specific hacking.

Tony.
-- 
f.a.n.finch  <[hidden email]>  http://dotat.at/
PLYMOUTH BISCAY FITZROY: NORTHWEST, BACKING SOUTHEAST FOR A TIME, 5 TO 7,
OCCASIONALLY GALE 8 AT FIRST IN BISCAY. MODERATE OR ROUGH. SHOWERS, RAIN
LATER. MODERATE OR GOOD.

Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

Stefan Sandberg
In reply to this post by Thomas Lauer-3
I didn't mean to offend, apologies if I did..
I've do a lot of GI in lua, and when I noticed "performance", "parallell", "coroutines" and "1,000,000", I had a similar
reaction most women have to spiders (or computers) .. :)
You wrote "to describe a set of tests that run in parallel", so I guess that means what it means, that whatever you're performing isn't
being done by the coroutines themselves... or maybe they do, I dunno..

Also, to further feed the OT theme of this thread:
Or put differently, the imagination of one man, even if he is the
cleverest guy ever born, has no chance to compete long-term with that of
a dedicated team of people, even if they're all of mediocre talent.
... that should end that silly Linux vs. Windows debate :)

mpb
Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

mpb
In reply to this post by Bogdan Harjoc-2
On 5/30/07, Bogdan Harjoc <[hidden email]> wrote:

While writing a performance testing tool, I chose Lua to describe a set
of tests that
run in parallel.

So far things are looking great, as far as both available cpu and memory
usage
are concerned: after increasing LUAI_MAXCSTACK, I've been testing with
300.000
concurrent and mostly idle threads, created with lua_newthread(). Memory
usage
for these is about 350M, i.e. a little over 1KB per created thread.

After a glance at the lua_State definition, it looks like there a couple
of members
that could be left out but nothing that would noticeably reduce
sizeof(lua_State).

Has anyone managed to create something on the order of 1,000,000
coroutines ?
If so, any tips on what to leave out from lua_State (or other places)
would be
appreciated.

Hi Bogdan,

I'm curious - have you considered using Stackless Python?

http://www.stackless.com/
http://en.wikipedia.org/wiki/Stackless_Python

I've never used Stackless, but I've heard that Stackless threads
(which seem to be called Tasklets) are "just a few bytes each".

Thanks!

-mpb

Reply | Threaded
Open this post in threaded view
|

Re: Squeezing more coroutines ?

Roberto Ierusalimschy
In reply to this post by Bogdan Harjoc-2
> So far things are looking great, as far as both available cpu and
> memory usage are concerned: after increasing LUAI_MAXCSTACK, I've been
> testing with 300.000 concurrent and mostly idle threads, created with
> lua_newthread(). Memory usage for these is about 350M, i.e. a little
> over 1KB per created thread.
>
> [...]
>
> Has anyone managed to create something on the order of 1,000,000
> coroutines ? If so, any tips on what to leave out from lua_State (or
> other places) would be appreciated.

Have you tried to reduce the stack size? If each coroutine does very
simple things, maybe each could run with a smaller initial stack.
For instance, you could try BASIC_CI_SIZE=5 (or even smaller; this is
the number of nested calls in a coroutine), and LUA_MINSTACK=10
(this is a little tricky, but should work; I don't think any C function
uses more than 10 temporaries without checking the stack...).

Beware that whenever a stack is too small Lua doubles its size. So,
if you start with too small stacks, you may end with them larger than
if you start with initially larger stacks. (E.g., you set BASIC_CI_SIZE=3,
but your program needs 7. Lua will double it twice, leaving the final
size equal to 12.)

-- Roberto