Re: LuaJIT strange memory limit

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
Ok, I looked through lj_alloc.c and realized that there is probably no way for LuaJIT to use more than 1Gb in 64-bit Linux... Man, that is so sad. :-( The dream shattered.:-)

On 10.11.2010, at 17:56, Arseny Vakhrushev wrote:

> Hello, Mike and everyone!
>
> I have met some kind of a memory limit issue while using LuaJIT. I am sorry if it has already been discussed because I haven't been following the list for a while. However, I didn't find anything related except for some FreeBSD x64 memory issues.
>
> So, my setup is 64-bit Linux, latest git LuaJIT build out of the box, latest Lua.
>
> test.lua
> ---------------------
> local s = {}
>
> for i = 1, 200 do
>    local j = math.random(1, 1000000)
>    local t = s[j]
>    if not t then
>        t = {}
>        s[j] = t
>    end
>    for k = 1, 1000000 do
>        table.insert(t, [[45623452345234523452345234523452345396804958604986570954867098456097804958670
>        563470658430956830948609348507690586-0386094867-23980785-0923095860945634-059609586-34
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457
>        9456093486-09865-094365-0956-08u6980-485798563782354671238947215639876t573450972398457]])
>    end
>    print(collectgarbage 'count')
> end
> ---------------------
>
>
> LuaJIT
> ---------------------
>> luajit-2.0.0-beta5 test.lua
> 8222.0166015625
> 16414.055664062
> 24606.141601562
> 32798.180664062
> 40990.313476562
> ...
> 983078.68066406
> 991270.71972656
> 999462.75878906
> 1007654.7978516
> 1015846.8369141
> 1024038.8759766
> 1032230.9150391
> PANIC: unprotected error in call to Lua API (not enough memory)
> ---------------------
>
>
> Lua 5.1.4
> ---------------------
>> lua test.lua
> 16414.825195312
> 32797.431640625
> 49181.416015625
> 65565.517578125
> 81949.775390625
> ...
> 3113014.8691406
> 3129398.9707031
> 3145783.0722656
> 3162167.1738281
> 3178551.2753906
> 3194935.3769531
> 3211319.4785156
> 3227703.5800781
> 3244087.6816406
> 3260471.7832031
> 3276855.8847656
> ---------------------
>
>
> Furthermore, I thought about turning off the built-in memory allocator in LuaJIT, but it's mandatory for 64-bit. I ran a similar test using my production platform which is in general a multithreaded and multiprocess LuaJIT environment, and it looks like multiple threads (LuaJIT states) share one global limit (that might be somehow connected with the built-in allocator), i.e. the error comes equally faster, although I expected the states to be totally independent from each other.
>
> Thanks for any help in advance!
>
> // Seny


Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Alexander Gladysh
On Thu, Nov 11, 2010 at 12:50, Arseny Vakhrushev
<[hidden email]> wrote:
> Ok, I looked through lj_alloc.c and realized that there is probably no way for LuaJIT to use more than 1Gb in 64-bit Linux... Man, that is so sad. :-( The dream shattered.:-)

I always thought that the limit was 2 GB, not 1 GB.

Anyway, in practice this is enough in most of the cases. You can
always move some data to lightuserdata pointers.

If you share more information, we may be able to help.

Alexander.

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Tony Finch
On Thu, 11 Nov 2010, Alexander Gladysh wrote:
>
> I always thought that the limit was 2 GB, not 1 GB.

The 1GB limit is because the Linux kernel's memory layout only leaves a
1GB slot for use by mmap in the bottom 4GB of address space.

Tony.
--
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7,
DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR
ROUGH. RAIN THEN FAIR. GOOD.

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
In reply to this post by Alexander Gladysh
>
> Anyway, in practice this is enough in most of the cases. You can
> always move some data to lightuserdata pointers.

1Gb for a server environment is not enough.

> If you share more information, we may be able to help.
>
> Alexander.
>

Well, I have a clustered server system where LuaJIT handles high-level logic scripting. All objects stored and managed in the system are high-level as well. I can easily avoid the above problem by scaling the system - adding more daemons and connecting them through the loopback interface. However, that leads to unnecessary overhead and throws me out of "one machine -> one daemon" scheme. Or, I could switch back to vanilla Lua which always works as a replacement engine.

// Seny


Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Mike Pall-21
In reply to this post by Arseny Vakhrushev
Arseny Vakhrushev wrote:
> Ok, I looked through lj_alloc.c and realized that there is
> probably no way for LuaJIT to use more than 1Gb in 64-bit
> Linux...

Well, even if that sounds strange, but your best option is to
compile LuaJIT as 32 bit. 32 bit processes can use close to 4GB of
memory under a Linux/x64 kernel. And as I've previously explained,
there's little performance difference with LuaJIT on x86 vs. x64.

> Well, I have a clustered server system where LuaJIT handles
> high-level logic scripting. All objects stored and managed in
> the system are high-level as well. I can easily avoid the above
> problem by scaling the system - adding more daemons and
> connecting them through the loopback interface. However, that
> leads to unnecessary overhead and throws me out of "one machine
> -> one daemon" scheme.

A single process doesn't make good use of a multi-core CPU. And
unless you're very careful and use non-blocking I/O everywhere,
you'll hit I/O bottlenecks. Running 10-20 worker processes on a
quad-core is a common setup.

Also, it's not a good idea to store millions of objects occupying
several gigabytes in a single Lua state. The Lua garbage collector
is simply not up to the task (LuaJIT currently uses the same GC).
It's very, very inefficient for huge out-of-cache workloads. The
GC causes serious cache thrashing and this kills performance.

I've attached a simple test which allocates just enough objects to
stay below 1GB for LuaJIT. Here's the output on my (fast) machine:

[The numbers are much higher with plain Lua, whether x86 or x64.]

0.97 seconds allocation time with stopped GC
1.53 seconds for a full GC
0.93 seconds for a cleanup GC

1.92 seconds allocation time with enabled GC
1.52 seconds for a full GC
0.99 seconds for a cleanup GC

1.96 seconds allocation time with enabled GC
2.95 seconds for a full GC with randomized links
1.01 seconds for a cleanup GC with randomized links

Explanation:
- A full GC takes 50% more time than the allocations themselves.
- If the GC is enabled, it doubles the allocation time.
- To simulate a real application, the links between objects are
  randomized in the third run. This doubles the GC time!

And that was just for 1GB! Now imagine using 8GB -- a full GC
cycle would keep the CPU busy for a whopping 24 seconds!

Ok, so the normal mode is to use the incremental GC. But this just
means the overhead is ~30% higher, it's mixed in between the
allocations and it will evict the CPU cache every time. Basically
your application will be dominated by the GC overhead and you'll
begin to wonder why it's slow ....

tl;dr version: Don't try this at home. And the GC needs a rewrite
(postponed to LuaJIT 2.1).

--Mike

gc_speed.lua (821 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
> Well, even if that sounds strange, but your best option is to
> compile LuaJIT as 32 bit. 32 bit processes can use close to 4GB of
> memory under a Linux/x64 kernel. And as I've previously explained,
> there's little performance difference with LuaJIT on x86 vs. x64.

Should the hosting process be compiled as 32bit as well?

>
> A single process doesn't make good use of a multi-core CPU. And
> unless you're very careful and use non-blocking I/O everywhere,
> you'll hit I/O bottlenecks. Running 10-20 worker processes on a
> quad-core is a common setup.

My setup is not single threaded. Well, it really is a single process which spawns several tens (usually 32 or 64) of  LuaJIT threads along with some other working threads for performing non-blocking network I/O, etc. But, as I said before, the 1Gb memory is shared between all of the LuaJIT threads within the process. For instance, if I spawn just one LuaJIT thread and run a test script which tries to consume as much memory as it can, it fires an error having consumed around 1Gb of memory. If I spawn four such threads, they consume proportionally less memory, i.e. around 256Mb or even less.

>
> Also, it's not a good idea to store millions of objects occupying
> several gigabytes in a single Lua state. The Lua garbage collector
> is simply not up to the task (LuaJIT currently uses the same GC).
> It's very, very inefficient for huge out-of-cache workloads. The
> GC causes serious cache thrashing and this kills performance.

I know that. That is why I spawn many LuaJIT threads not to let them get overloaded with GC cycles too much. Data and request load are balanced among them equally.

Let's say one LuaJIT state is fine with handling 200Mb of objects in general maximum. So, if I had 30 of such states working independently within one process, they would consume 200*30 = 6Gb of memory and I would be totally happy with that ceiling. However, right now it's not feasible no matter of how many LuaJIT threads I create.

// Seny


Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Mike Pall-21
Arseny Vakhrushev wrote:
> > Well, even if that sounds strange, but your best option is to
> > compile LuaJIT as 32 bit. 32 bit processes can use close to 4GB of
> > memory under a Linux/x64 kernel. And as I've previously explained,
> > there's little performance difference with LuaJIT on x86 vs. x64.
>
> Should the hosting process be compiled as 32bit as well?

Yes. Linking mixed x86 and x64 object files doesn't work.

To get the most out of x86, compile all your other C files with
-march=native -fomit-frame-pointer, enable -DLUAJIT_CPU_SSE2 and
link LuaJIT statically on x86 (less important for x64).

> But, as I said before, the 1Gb memory is shared between all of
> the LuaJIT threads within the process.

Yes, sadly that's true on Linux/x64 right now. With some effort I
could bump that up to around 2GB (avoiding MAP_32BIT and resorting
to address probing). But that won't suffice for your use case.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev

>> But, as I said before, the 1Gb memory is shared between all of
>> the LuaJIT threads within the process.
>
> Yes, sadly that's true on Linux/x64 right now. With some effort I
> could bump that up to around 2GB (avoiding MAP_32BIT and resorting
> to address probing). But that won't suffice for your use case.

I suppose that this 1Gb limit is sufficient in LuaJIT only because of the built-in allocator since vanilla Lua doesn't suffer from that.
Is it somehow possible to make 64-bit LuaJIT use the standard allocator to temporarily avoid the problem? Is it worth it?

// Seny
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Mike Pall-21
Arseny Vakhrushev wrote:
> > Yes, sadly that's true on Linux/x64 right now. With some effort I
> > could bump that up to around 2GB (avoiding MAP_32BIT and resorting
> > to address probing). But that won't suffice for your use case.
>
> I suppose that this 1Gb limit is sufficient in LuaJIT only
> because of the built-in allocator since vanilla Lua doesn't
> suffer from that. Is it somehow possible to make 64-bit LuaJIT
> use the standard allocator to temporarily avoid the problem? Is
> it worth it?

Nope, that won't work. LuaJIT uses 32 bit pointers everywhere (the
standard allocators won't guarantee that). And due to other
limitations in the x64 architecture, it may only use the lowest 2GB.

Usually this is not a problem, because most Lua-based apps which
need lots of memory keep the majority of it in C structures (e.g.
huge images or database caches). You're the first one who hit that
limit in a real-world use case ...

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
>
> Usually this is not a problem, because most Lua-based apps which
> need lots of memory keep the majority of it in C structures (e.g.
> huge images or database caches). You're the first one who hit that
> limit in a real-world use case ...

I refuse to think that LuaJIT is not suitable for server-side applications! :-) It is not Mike Pall telling me that!

Seriously, it seems I have three options:
- switch to 32-bit and use 4Gb with LuaJIT
- switch to vanilla Lua
- find another scripting language (<--- that is so creepy)

Ok, anyway thanks a lot, Mike for your help!

// Seny


Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Javier Guerra Giraldez
On Thu, Nov 11, 2010 at 12:05 PM, Arseny Vakhrushev
<[hidden email]> wrote:
> I refuse to think that LuaJIT is not suitable for server-side applications! :-) It is not Mike Pall telling me that!
>
> Seriously, it seems I have three options:

#4: split LuaJIT tasks to separate processes


--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Romulo
On Thu, Nov 11, 2010 at 3:09 PM, Javier Guerra Giraldez
<[hidden email]> wrote:
> On Thu, Nov 11, 2010 at 12:05 PM, Arseny Vakhrushev
> <[hidden email]> wrote:
>> I refuse to think that LuaJIT is not suitable for server-side applications! :-) It is not Mike Pall telling me that!
>>
>> Seriously, it seems I have three options:
>
> #4: split LuaJIT tasks to separate processes

What about the suggestion to move some data to lightuserdata? Wouldn't
that improve memory allocation on LuaJIT side?


--rb

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Florian Weimer
In reply to this post by Mike Pall-21
* Mike Pall:

> Nope, that won't work. LuaJIT uses 32 bit pointers everywhere (the
> standard allocators won't guarantee that). And due to other
> limitations in the x64 architecture, it may only use the lowest 2GB.

Are these pointers tagged?  If not, you could compress them further,
getting to 32 GB addressable range.  Or you could keep a base address
and use relative addressing (but it seems that for Hotspot, this was
slower than the scaled-only variant).

On the other hand, I guess the effort required for process separation
is generally worth it.

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Javier Guerra Giraldez
On Thu, Nov 11, 2010 at 1:32 PM, Florian Weimer <[hidden email]> wrote:
> On the other hand, I guess the effort required for process separation
> is generally worth it.

the sad thing is that multiple Lua states and multithreading buy you
most of the benefits of separate processes.  bigger addresspace isn't
one of them obviously.

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Cosmin Apreutesei
> the sad thing is that multiple Lua states and multithreading buy you
> most of the benefits of separate processes.  bigger addresspace isn't
> one of them obviously.

...which makes me wonder if luaproc could be extended to support
process workers too...

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
In reply to this post by Javier Guerra Giraldez

>> I refuse to think that LuaJIT is not suitable for server-side applications! :-) It is not Mike Pall telling me that!
>>
>> Seriously, it seems I have three options:
>
> #4: split LuaJIT tasks to separate processes
>

Definitely! However, I haven't mentioned that on the option list because no extra effort needed since the system I'm developing is ready for that. The only concern I have is that, according to my tests, internal communication between LuaJIT threads within a process is slightly faster than communication between processes. Anyway, that seems far less critical than the need to control a LuaJIT state's hosting process not to exceed 1Gb of memory consumption.
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Arseny Vakhrushev
In reply to this post by Romulo
>> #4: split LuaJIT tasks to separate processes
>
> What about the suggestion to move some data to lightuserdata? Wouldn't
> that improve memory allocation on LuaJIT side?

Well, that is fair enough when the hosting process is aware of what is going on inside Lua states. This would benefit if we had objects living in lightuserdata buffers and providing Lua with "methods" to work with them.

In my case, LuaJIT is a scripting engine which is wrapped around with low-level code that deals with networking and disk I/O, data (de)serializing, client and control requests, RPCs, etc. To Lua, all that finally appears as native Lua calls with arguments containing native Lua objects. Generally, Lua knows nothing about where these arguments came from and where the return values go to. On the other hand, the low-level code knows nothing about what are these arguments and what happens to them as the result of the call.


Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Peter Sommerfeld-3
In reply to this post by Arseny Vakhrushev
Arseny Vakhrushev wrote:
> The only concern I have is that, according to my tests, internal  
> communication between LuaJIT threads within a process is slightly faster  
> than communication between processes.

Can you provide some rough numbers about the differences ?
No details needed, just the range.

Peter

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Alexander Gladysh
In reply to this post by Arseny Vakhrushev
On Thu, Nov 11, 2010 at 20:05, Arseny Vakhrushev
<[hidden email]> wrote:

>> Usually this is not a problem, because most Lua-based apps which
>> need lots of memory keep the majority of it in C structures (e.g.
>> huge images or database caches). You're the first one who hit that
>> limit in a real-world use case ...

> I refuse to think that LuaJIT is not suitable for server-side applications! :-) It is not Mike Pall telling me that!

We use LJ2 on server side. And we both know another guy, who used it. ;-)

LJ2 is quite suitable for server side. But, alas, not for every
conceivable server program architecture.

Alexander.

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT strange memory limit

Linker
On Fri, Nov 12, 2010 at 08:11, Alexander Gladysh <[hidden email]> wrote:
We use LJ2 on server side. And we both know another guy, who used it. ;-)

LJ2 is quite suitable for server side. But, alas, not for every
conceivable server program architecture.

I cannot agree with you. The memory limit is not a problem, if you select the multi-processes architecture for your servers.It had been used in my servers for many years.



 
Alexander.




--
Regards,
Linker Lin

[hidden email]

12