Memory usage stats for 5.2 vs 5.3

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
Hi list,

Thought I'd share some memory usage numbers from my tests of 5.3.0 (rc4) compared to 5.2.3. This is running on an ARMv7-M CPU (THUMB2 instruction set with 96KB RAM) in a highly custom environment. Lua is configured with 32-bit integers and no floating point (on 5.2, lua_Number is 32-bit 'long', on 5.3 both types of number are 32-bit 'long'), and all other code and config is identical except for the different Lua version being used. I wasn't using any legacy 5.1 stuff so the upgrade procedure was pretty smooth. Mainly a case of cooking up a new luaconf.h with the nonstandard number sizes and some patches to fudge floating point API calls.

The first figure on each line of the stats below is the memory used according to the allocator, and the second the figure returned from lua_gc. The allocator is one which doesn't have any allocation overhead (except for 8-byte alignment padding), and in this runtime there are no allocations that aren't via Lua, so the two figures should in theory match pretty closely. lua_gc seems to underestimate quite heavily, however. I assume that either it is not counting some of its own state in the figures it returns, or maybe the underestimation is simply the sum of all the strings whose lengths aren't a multiple of 8 (and thus Lua thinks they're smaller than the allocator does).

The test is written in C so I can guarantee no memory is allocated by the function that's printing the results. I assume lua_gc(LUA_GCCOUNT[B]) won't allocate either.

The modules mentioned are a selection from my codebase, and the numbers listed are the memory used to 'require' them, minus the baseline figure (which is the memory usage of setting up a new runtime and requiring an empty module). The runtime is reset and all memory freed between each module require. All modules were precompiled and stripped with luac, and loaded from XIP ROM. These numbers won't really be comparable to any other system - as mentioned these are custom modules, and the Lua runtime itself is pretty stripped down, with no io or math modules, for instance. lua_gc(L, LUA_GCCOLLECT, 0) is run before taking each of the module stats.

So, on to the figures:

Lua 5.2.3
=========
lua_newstate: 2032 (1908)
luaL_openlibs: 6400 (6016)
Baseline mem usage: 10120
Module membuf: 1744 (1217)
Module membuf.types: 7880 (7168)
Module membuf.print: 4256 (3649)
Module int64: 1176 (688)
Module oo: 2360 (1842)
Module misc: 1928 (1434)
Module runloop: 5736 (5102)
Module interpreter: 13736 (12808)
Module input.input: 11904 (11012)
Module bitmap.bitmap: 4064 (3461)
Module bitmap.transform: 4096 (3524)
Module tetris.tetris: 41144 (39387)

Lua 5.3.0
=========
lua_newstate: 2616 (2483)
luaL_openlibs: 8224 (7793)
Baseline mem usage: 12968
Module membuf: 1872 (1301)
Module membuf.types: 8096 (7357)
Module membuf.print: 4352 (3705)
Module int64: 1240 (708)
Module oo: 2384 (1830)
Module misc: 1960 (1426)
Module runloop: 5904 (5226)
(Test crashed here, not Lua's fault)

Lua 5.3.0 no utf8, bit32, package.path
======================================
lua_newstate: 2616 (2483)
luaL_openlibs: 6776 (6385)
Baseline mem usage: 11392
Module membuf: 1872 (1334)
Module membuf.types: 8096 (7396)
Module membuf.print: 4352 (3738)
Module int64: 1240 (741)
Module oo: 2384 (1863)
Module misc: 2024 (1518)
Module runloop: 5968 (5318)
Module interpreter: 14312 (13439)
Module input.input: 13632 (12752)
Module bitmap.bitmap: 4368 (3758)
Module bitmap.transform: 4272 (3689)
Module tetris.tetris: 42696 (41051)

The last stats above were after trimming out a few things I didn't need from 5.3 - eg I rewrote all the bit32 operations using the new operators and disabled LUA_COMPAT_BITLIB, and stopped loading utf8.

Based on the allocator stats (because that is how much memory actually gets used) lua_newstate uses 584 bytes more (+29%), luaL_openlibs uses an extra 376 bytes (+5.9%), modules increase by between 1% and 15%, and loading the biggest module in my test (tetris is 461 LOC as measured by 'cloc' tool) overall used 2824 bytes more if you also include the baseline usage increase, so (42696 + 11392) - (41144 + 10120).

I've no idea why the input module took 15% more RAM, the size and shape of the code isn't much different to say the interpreter module.

Not huge increases, but on this board 2824 bytes is unfortunately over 3% of the available memory, so every byte makes a big difference, and I was already on the limit of what would work with 5.2. Hope these figures are of interest (to someone!), if anyone has any suggestions for possible ways of saving RAM I'd love to hear them. I'm not using coroutine or most of debug table, so I can stop loading them fairly easily for a quick win, and I'm already stripping all the modules.

Cheers,

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

BogdanM
On Mon, Jan 12, 2015 at 11:05 PM, Tom Sutcliffe <[hidden email]> wrote:
Hi list,

Thought I'd share some memory usage numbers from my tests of 5.3.0 (rc4) compared to 5.2.3. This is running on an ARMv7-M CPU (THUMB2 instruction set with 96KB RAM) in a highly custom environment. Lua is configured with 32-bit integers and no floating point (on 5.2, lua_Number is 32-bit 'long', on 5.3 both types of number are 32-bit 'long'), and all other code and config is identical except for the different Lua version being used. I wasn't using any legacy 5.1 stuff so the upgrade procedure was pretty smooth. Mainly a case of cooking up a new luaconf.h with the nonstandard number sizes and some patches to fudge floating point API calls.

The first figure on each line of the stats below is the memory used according to the allocator, and the second the figure returned from lua_gc. The allocator is one which doesn't have any allocation overhead (except for 8-byte alignment padding), and in this runtime there are no allocations that aren't via Lua, so the two figures should in theory match pretty closely. lua_gc seems to underestimate quite heavily, however. I assume that either it is not counting some of its own state in the figures it returns, or maybe the underestimation is simply the sum of all the strings whose lengths aren't a multiple of 8 (and thus Lua thinks they're smaller than the allocator does).

The test is written in C so I can guarantee no memory is allocated by the function that's printing the results. I assume lua_gc(LUA_GCCOUNT[B]) won't allocate either.

The modules mentioned are a selection from my codebase, and the numbers listed are the memory used to 'require' them, minus the baseline figure (which is the memory usage of setting up a new runtime and requiring an empty module). The runtime is reset and all memory freed between each module require. All modules were precompiled and stripped with luac, and loaded from XIP ROM. These numbers won't really be comparable to any other system - as mentioned these are custom modules, and the Lua runtime itself is pretty stripped down, with no io or math modules, for instance. lua_gc(L, LUA_GCCOLLECT, 0) is run before taking each of the module stats.

So, on to the figures:

Lua 5.2.3
=========
lua_newstate: 2032 (1908)
luaL_openlibs: 6400 (6016)
Baseline mem usage: 10120
Module membuf: 1744 (1217)
Module membuf.types: 7880 (7168)
Module membuf.print: 4256 (3649)
Module int64: 1176 (688)
Module oo: 2360 (1842)
Module misc: 1928 (1434)
Module runloop: 5736 (5102)
Module interpreter: 13736 (12808)
Module input.input: 11904 (11012)
Module bitmap.bitmap: 4064 (3461)
Module bitmap.transform: 4096 (3524)
Module tetris.tetris: 41144 (39387)

Lua 5.3.0
=========
lua_newstate: 2616 (2483)
luaL_openlibs: 8224 (7793)
Baseline mem usage: 12968
Module membuf: 1872 (1301)
Module membuf.types: 8096 (7357)
Module membuf.print: 4352 (3705)
Module int64: 1240 (708)
Module oo: 2384 (1830)
Module misc: 1960 (1426)
Module runloop: 5904 (5226)
(Test crashed here, not Lua's fault)

Lua 5.3.0 no utf8, bit32, package.path
======================================
lua_newstate: 2616 (2483)
luaL_openlibs: 6776 (6385)
Baseline mem usage: 11392
Module membuf: 1872 (1334)
Module membuf.types: 8096 (7396)
Module membuf.print: 4352 (3738)
Module int64: 1240 (741)
Module oo: 2384 (1863)
Module misc: 2024 (1518)
Module runloop: 5968 (5318)
Module interpreter: 14312 (13439)
Module input.input: 13632 (12752)
Module bitmap.bitmap: 4368 (3758)
Module bitmap.transform: 4272 (3689)
Module tetris.tetris: 42696 (41051)

The last stats above were after trimming out a few things I didn't need from 5.3 - eg I rewrote all the bit32 operations using the new operators and disabled LUA_COMPAT_BITLIB, and stopped loading utf8.

Based on the allocator stats (because that is how much memory actually gets used) lua_newstate uses 584 bytes more (+29%), luaL_openlibs uses an extra 376 bytes (+5.9%), modules increase by between 1% and 15%, and loading the biggest module in my test (tetris is 461 LOC as measured by 'cloc' tool) overall used 2824 bytes more if you also include the baseline usage increase, so (42696 + 11392) - (41144 + 10120).

I've no idea why the input module took 15% more RAM, the size and shape of the code isn't much different to say the interpreter module.

Not huge increases, but on this board 2824 bytes is unfortunately over 3% of the available memory, so every byte makes a big difference, and I was already on the limit of what would work with 5.2. Hope these figures are of interest (to someone!), if anyone has any suggestions for possible ways of saving RAM I'd love to hear them. I'm not using coroutine or most of debug table, so I can stop loading them fairly easily for a quick win, and I'm already stripping all the modules.

Hi,

Interesting, thanks for sharing. Incidentally, I'm working on something quite similar at the moment (early phases) and I'm also seeing an increase in RAM consumption in 5.3.0 compared to 5.2.3 (this is on desktop though, 'make generic' in 32 bit mode). Will investigate further.

Best,
Bogdan
 

Cheers,

Tom


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Sean Conner
In reply to this post by Tom Sutcliffe
It was thus said that the Great Tom Sutcliffe once stated:

> The first figure on each line of the stats below is the memory used
> according to the allocator, and the second the figure returned from lua_gc.
> The allocator is one which doesn't have any allocation overhead (except for
> 8-byte alignment padding), and in this runtime there are no allocations
> that aren't via Lua, so the two figures should in theory match pretty
> closely. lua_gc seems to underestimate quite heavily, however. I assume
> that either it is not counting some of its own state in the figures it
> returns, or maybe the underestimation is simply the sum of all the strings
> whose lengths aren't a multiple of 8 (and thus Lua thinks they're smaller
> than the allocator does).

  One question---does your allocator just assign the next available free
spot?  And is free() implemented?  To me, the difference can be explained by
Lua freeing some garbage that isn't reflected in your custom allocator.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Hisham
In reply to this post by Tom Sutcliffe
On 12 January 2015 at 21:05, Tom Sutcliffe <[hidden email]> wrote:
> Hi list,
>
> Thought I'd share some memory usage numbers from my tests of 5.3.0 (rc4)
> compared to 5.2.3. This is running on an ARMv7-M CPU (THUMB2 instruction set
> with 96KB RAM) in a highly custom environment. Lua is configured with 32-bit
> integers and no floating point (on 5.2, lua_Number is 32-bit 'long', on 5.3
> both types of number are 32-bit 'long'),

> Module int64: 1176 (688)
> Module int64: 1240 (708)

Random question, I might be asking something completely silly, but I
got curious so here it goes:

I notice a module called int64, so you seem to need both 32 and 64 bit
ints. Is it possible to configure Lua 5.3 so that "integer" is 32-bit
ints and "number" is 64-bit int? If so, could this be a saving
compared to using an int64 module? (I suppose your int64 module does
only the set of 64-bit operations you really need.)

-- Hisham

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
In reply to this post by Tom Sutcliffe
On 13 Jan, 2015,at 12:54 AM, Sean Conner <[hidden email]> wrote:

One question---does your allocator just assign the next available free
spot? And is free() implemented? To me, the difference can be explained by
Lua freeing some garbage that isn't reflected in your custom allocator.

The placement algorithm is a little smarter than simply 'next free spot in an effort to reduce fragmentation. It maintains a free list of deallocated cells, so it does implement free() properly. Trust me, when it doesnt everything blows up very quickly with this little RAM! Im pretty sure of the allocators correctness, even though the fragmentation behaviour could probably be improved.

Cheers,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
In reply to this post by Hisham
On 13 Jan 2015, at 01:40, Hisham <[hidden email]> wrote:

>
>> On 12 January 2015 at 21:05, Tom Sutcliffe <[hidden email]> wrote:
>> Hi list,
>>
>> Thought I'd share some memory usage numbers from my tests of 5.3.0 (rc4)
>> compared to 5.2.3. This is running on an ARMv7-M CPU (THUMB2 instruction set
>> with 96KB RAM) in a highly custom environment. Lua is configured with 32-bit
>> integers and no floating point (on 5.2, lua_Number is 32-bit 'long', on 5.3
>> both types of number are 32-bit 'long'),
>
>> Module int64: 1176 (688)
>> Module int64: 1240 (708)
>
> Random question, I might be asking something completely silly, but I
> got curious so here it goes:
>
> I notice a module called int64, so you seem to need both 32 and 64 bit
> ints. Is it possible to configure Lua 5.3 so that "integer" is 32-bit
> ints and "number" is 64-bit int? If so, could this be a saving
> compared to using an int64 module? (I suppose your int64 module does
> only the set of 64-bit operations you really need.)

Interesting possibility. I'm sure it would be doable. I only need int64s in a very few places though, so the current design isn't all that much of a problem. At some point I should try making everything 64-bit and see if it has any tangible impact - the chip is only 32-bit but ARM's handling of double-wide math operations isn't too terrible. But having number be 64 and integer 32 might be a nice compromise.

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Roberto Ierusalimschy
In reply to this post by Tom Sutcliffe
> Thought I'd share some memory usage numbers from my tests of 5.3.0
> (rc4) compared to 5.2.3. [...]

Can you do a little experiment?

- In 5.2.3, do the following change in file lobject.h:

--- lobject.h 2013/04/12 18:48:47 2.71.1.1
+++ lobject.h 2015/01/13 11:55:58
@@ -414,6 +414,7 @@
     lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
     unsigned int hash;
     size_t len;  /* number of characters in string */
+    void *dummy;
   } tsv;
 } TString;
 
Then, recompile Lua 5.2.3 and redo your measurements.

Many thanks,

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
Can you do a little experiment?
- In 5.2.3, do the following change in file lobject.h: 
Then, recompile Lua 5.2.3 and redo your measurements.

(looks at 5.3 lobject.h source) So this is to simulate the cost of adding "struct TString *hnext" in TString?

Lua 5.2.3 unmodified (same figures as before)
====================
lua_newstate: 2032 (1908)
luaL_openlibs: 6400 (6016)
Baseline mem usage: 10120
Module membuf: 1744 (1217)
Module membuf.types: 7880 (7168)
Module membuf.print: 4256 (3649)
Module int64: 1176 (688)
Module oo: 2360 (1842)
Module misc: 1928 (1434)
Module runloop: 5736 (5102)
Module interpreter: 13736 (12808)
Module input.input: 11904 (11012)
Module bitmap.bitmap: 4064 (3461)
Module bitmap.transform: 4096 (3524)
Module tetris.tetris: 41144 (39387)

Lua 5.2.3 with "void *dummy" in TString ("5.2-dummy-TString")
=======================================
lua_newstate: 2352 (2228)
luaL_openlibs: 7184 (6800)
Baseline mem usage: 11376
Module membuf: 1880 (1353)
Module membuf.types: 8368 (7656)
Module membuf.print: 4464 (3857)
Module int64: 1248 (760)
Module oo: 2408 (1890)
Module misc: 1976 (1482)
Module runloop: 6008 (5374)
Module interpreter: 14288 (13360)
Module input.input: 12616 (11724)
Module bitmap.bitmap: 4352 (3749)
Module bitmap.transform: 4280 (3708)
Module tetris.tetris: 42968 (41211)

I also redid the figures for 5.3 without utf8 library, but otherwise the same as 5.2 (ie including bit32, and not removing package.path) so that they are more directly comparable to the 5.2 figures.

Lua 5.3.0 no utf8 ("5.3-no-utf8")
=================
lua_newstate: 2616 (2483)
luaL_openlibs: 7352 (6956)
Baseline mem usage: 11968
Module membuf: 1872 (1334)
Module membuf.types: 8128 (7421)
Module membuf.print: 4352 (3738)
Module int64: 1240 (741)
Module oo: 2384 (1863)
Module misc: 1960 (1459)
Module runloop: 5904 (5259)
Module interpreter: 14280 (13405)
Module input.input: 13568 (12693)
Module bitmap.bitmap: 4400 (3789)
Module bitmap.transform: 4272 (3689)
Module tetris.tetris: 42664 (41023)

lua_newstate usage is now larger although not quite as much as in 5.3. It makes luaL_openlibs usage larger than cut-down 5.3 but not quite to the level of 5.3-no-utf8. Likewise baseline module usage increases but not quite to the 5.3-no-utf8 levels. Module memory usages are generally higher on 5.2-dummy-TString than 5.3-no-utf8, and always higher than 5.2.3-unmodified (as you'd expect).

So it looks like this TString member is the major contributor to the increased memory usage per-module in 5.3, but 5.3 still uses more memory in setting up the initial environment above and beyond the contribution from the TString. There are a few new members in the standard tables that might contribute to that, every extra variable takes up memory. I can live with that though, it's not reasonable to expect new features that take up no space :) I can offset that with the savings of removing bit32, package.path, coroutine etc.

It's interesting how much of an effect the TString member had on the usage of lua_newstate in 5.2.3-dummy-TString, I wouldn't have expected that many strings are constructed before luaL_openlibs or similar has been called.

Cheers,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Roberto Ierusalimschy
> (looks at 5.3 lobject.h source) So this is to simulate the cost of
> adding "struct TString *hnext" in TString?

Exactly.


> It's interesting how much of an effect the TString member had on the
> usage of lua_newstate in 5.2.3-dummy-TString, I wouldn't have expected
> that many strings are constructed before luaL_openlibs or similar has
> been called.

'luaL_openlibs' creates lots of new strings, with the names of the
functions being registered.

This new field 'hnext' greatly simplifies the GC of strings (as they can
live in the 'allgc' list together with all other objects), but we can
revert to the old design if it proves too expensive.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
>>
>> It's interesting how much of an effect the TString member had on the
>> usage of lua_newstate in 5.2.3-dummy-TString, I wouldn't have expected
>> that many strings are constructed before luaL_openlibs or similar has
>> been called.
>
> 'luaL_openlibs' creates lots of new strings, with the names of the
> functions being registered.

I meant that lua_newState seemed to be creating a lot of strings, before luaL_openLibs. Thinking about it some more, that'll be for the reserved words in the lexer, and now of course there are a few more of those with the new bitops.

> This new field 'hnext' greatly simplifies the GC of strings (as they can
> live in the 'allgc' list together with all other objects), but we can
> revert to the old design if it proves too expensive.

Taking into account the alignment of a conventional 32-bit CPU (like the ARM), it increases the overhead of TStrings from 16 bytes to 24, which is a quite considerable increase if you have a lot of short strings like variable names. Generally I'm all in favour of simplification (setfenv being replaced by _ENV was supremely elegant and is my go-to example of good language evolution) but the cost in memory here might be a bit too high. In the test case I mentioned where it requires the tetris module, that one change uses over 2% of available RAM before the module has even set up most of its data structures.

Out of interest why does string data need to be strongly aligned? I can't think of a situation offhand where it would ever be needed?

I was wondering if there was an optimisation that could be applied for when Lua strings are constructed from string literals? Since Lua strings are immutable anyway, and C string literals cannot ever mutate or go out of scope (in any implementation I can think of) copying them into RAM seems a bit wasteful. There's even a lua_pushliteral API already isn't there, although it is a simple macro wrapper at present?

Back of an envelope calculation suggests there are about 100 reserved words and variable names in the standard libraries, and if we conservatively assume they are all less than 8 characters long, on 5.2 they use ~2400 bytes and on 5.3 currently ~3200 bytes (assuming no allocator overhead and 8 byte alignment). If we further assume that it wouldn't be that hard to make short strings use a different memory layout, and that we go back to 5.2 string GC, and assume the string data needn't be copied because they're all C literals, you could fit the whole lot in ~1600 bytes. And the more long string literals there are, the better the saving gets.

It might sound a bit excessive just to save a few KB of RAM but when your RAM budget is also measured in KB these are the sort of hoops you start considering :-)

(My even crazier question would be whether it'd be possible to execute precompiled modules in place rather than copying them to RAM in lua_load, but I realise that's a much more complex problem.)

Cheers,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Luiz Henrique de Figueiredo
> (My even crazier question would be whether it'd be possible to execute precompiled modules in place rather than copying them to RAM in lua_load, but I realise that's a much more complex problem.)

It'd probably be easy to run bytecode from ROM but using the table
of constants would be harder. There are Lua functions with lots of
constants, for instance, graphics metafiles with lots of real numbers,
or databases with lots of strings.

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Roberto Ierusalimschy
In reply to this post by Tom Sutcliffe
> I meant that lua_newState seemed to be creating a lot of strings,
> before luaL_openLibs. Thinking about it some more, that'll be for the
> reserved words in the lexer, and now of course there are a few more of
> those with the new bitops.

Besides reserved words, Lua also creates the metamethod names (__add,
etc.). At all, there would be ~50 strings (22 reserved words + 24
metamethods + _ENV + "not enough memory") in bare Lua.


> Taking into account the alignment of a conventional 32-bit CPU (like
> the ARM), it increases the overhead of TStrings from 16 bytes to
> 24, which is a quite considerable increase if you have a lot of
> short strings like variable names. Generally I'm all in favour of
> simplification (setfenv being replaced by _ENV was supremely elegant
> and is my go-to example of good language evolution) but the cost in
> memory here might be a bit too high. In the test case I mentioned
> where it requires the tetris module, that one change uses over 2%
> of available RAM before the module has even set up most of its data
> structures.
>
> Out of interest why does string data need to be strongly aligned? I
> can't think of a situation offhand where it would ever be needed?

Because you can store binary data in strings. E.g., you could move
an entire struct to a string and then type-cast the string back
to the struct. But of course the idea would be not to waste space
with that.


> I was wondering if there was an optimisation that could be applied
> for when Lua strings are constructed from string literals? Since Lua
> strings are immutable anyway, and C string literals cannot ever mutate
> or go out of scope (in any implementation I can think of) copying them
> into RAM seems a bit wasteful. There's even a lua_pushliteral API
> already isn't there, although it is a simple macro wrapper at present?

For small strings, we want to internalize them so that we can use
simple pointer equality when comparing strings (good for a fast
hashing). Moreover, the internal string would need 4 bytes anyway (for a
pointer to the external string), so the gains would not be too big.  For
large strings, however, we are considering a good API to allow the use
of external strings.


> Back of an envelope calculation suggests there are about 100 reserved
> words and variable names in the standard libraries,

More like 200, only for variable names.

-- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
> Besides reserved words, Lua also creates the metamethod names (__add,
> etc.). At all, there would be ~50 strings (22 reserved words + 24
> metamethods + _ENV + "not enough memory") in bare Lua.

Ah yes forgot about those!

>> Out of interest why does string data need to be strongly aligned? I
>> can't think of a situation offhand where it would ever be needed?
>
> Because you can store binary data in strings. E.g., you could move
> an entire struct to a string and then type-cast the string back
> to the struct. But of course the idea would be not to waste space
> with that.

Hmm I suppose you could do that with the Lua C API, it wouldn't have occurred to me that anyone would want to though, given that's what userdata is for! I guess that's one of those situations where I'd happily restrict it if it were just for me, but in base Lua you can't risk making such assumptions.

> For small strings, we want to internalize them so that we can use
> simple pointer equality when comparing strings (good for a fast
> hashing). Moreover, the internal string would need 4 bytes anyway (for a
> pointer to the external string), so the gains would not be too big.

I was thinking you'd still internalise them to a struct similar to TString and keep the pointer comparison hashing benefits, but that this TShortString would shrink the len field to a single byte (and there's one spare between extra and hash if I've understood the struct layout right) meaning you could fit everything including the 4 byte pointer to the actual data into 16 bytes. Short strings would be limited to 255 chars or whatever, but I don't suppose too many people change the current max len?

It's not a huge gain it's true, but 8 bytes multiplied by 100 or 200 is pretty good - it's maybe 1 or 2 percent of the entire memory footprint of a barebones Lua runtime, depending how barebones you make it.

> For
> large strings, however, we are considering a good API to allow the use
> of external strings.

Interesting. Not quite the same problem but I'd be interested to know what you decide. In my situation though it's very much about the potential for optimising literals large or small, because code is executed in place from ROM and thus constdata literals don't have any impact on RAM usage.

Cheers,

Tom


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Soni "They/Them" L.

On 13/01/15 05:13 PM, Tom Sutcliffe wrote:

>> Besides reserved words, Lua also creates the metamethod names (__add,
>> etc.). At all, there would be ~50 strings (22 reserved words + 24
>> metamethods + _ENV + "not enough memory") in bare Lua.
> Ah yes forgot about those!
>
>>> Out of interest why does string data need to be strongly aligned? I
>>> can't think of a situation offhand where it would ever be needed?
>> Because you can store binary data in strings. E.g., you could move
>> an entire struct to a string and then type-cast the string back
>> to the struct. But of course the idea would be not to waste space
>> with that.
> Hmm I suppose you could do that with the Lua C API, it wouldn't have occurred to me that anyone would want to though, given that's what userdata is for! I guess that's one of those situations where I'd happily restrict it if it were just for me, but in base Lua you can't risk making such assumptions.
Strings can be persisted, userdata cannot.

>
>> For small strings, we want to internalize them so that we can use
>> simple pointer equality when comparing strings (good for a fast
>> hashing). Moreover, the internal string would need 4 bytes anyway (for a
>> pointer to the external string), so the gains would not be too big.
> I was thinking you'd still internalise them to a struct similar to TString and keep the pointer comparison hashing benefits, but that this TShortString would shrink the len field to a single byte (and there's one spare between extra and hash if I've understood the struct layout right) meaning you could fit everything including the 4 byte pointer to the actual data into 16 bytes. Short strings would be limited to 255 chars or whatever, but I don't suppose too many people change the current max len?
>
> It's not a huge gain it's true, but 8 bytes multiplied by 100 or 200 is pretty good - it's maybe 1 or 2 percent of the entire memory footprint of a barebones Lua runtime, depending how barebones you make it.
>
>> For
>> large strings, however, we are considering a good API to allow the use
>> of external strings.
> Interesting. Not quite the same problem but I'd be interested to know what you decide. In my situation though it's very much about the potential for optimising literals large or small, because code is executed in place from ROM and thus constdata literals don't have any impact on RAM usage.
>
> Cheers,
>
> Tom
>
>

--
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Sean Conner
In reply to this post by Tom Sutcliffe
It was thus said that the Great Tom Sutcliffe once stated:

> >
> > Because you can store binary data in strings. E.g., you could move
> > an entire struct to a string and then type-cast the string back
> > to the struct. But of course the idea would be not to waste space
> > with that.
>
> Hmm I suppose you could do that with the Lua C API, it wouldn't have
> occurred to me that anyone would want to though, given that's what
> userdata is for! I guess that's one of those situations where I'd happily
> restrict it if it were just for me, but in base Lua you can't risk making
> such assumptions.

  It's useful when you receive data from a network connection.  If the data
is sent aligned, then it's easy to pull the fields out without much
overhead (reading byte-by-byte and reconstructing 16 and 32 bit values).

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

BogdanM
In reply to this post by Tom Sutcliffe


On Tue, Jan 13, 2015 at 7:13 PM, Tom Sutcliffe <[hidden email]> wrote:
> Besides reserved words, Lua also creates the metamethod names (__add,
> etc.). At all, there would be ~50 strings (22 reserved words + 24
> metamethods + _ENV + "not enough memory") in bare Lua.

Ah yes forgot about those!

>> Out of interest why does string data need to be strongly aligned? I
>> can't think of a situation offhand where it would ever be needed?
>
> Because you can store binary data in strings. E.g., you could move
> an entire struct to a string and then type-cast the string back
> to the struct. But of course the idea would be not to waste space
> with that.

Hmm I suppose you could do that with the Lua C API, it wouldn't have occurred to me that anyone would want to though, given that's what userdata is for! I guess that's one of those situations where I'd happily restrict it if it were just for me, but in base Lua you can't risk making such assumptions.

> For small strings, we want to internalize them so that we can use
> simple pointer equality when comparing strings (good for a fast
> hashing). Moreover, the internal string would need 4 bytes anyway (for a
> pointer to the external string), so the gains would not be too big.

I was thinking you'd still internalise them to a struct similar to TString and keep the pointer comparison hashing benefits, but that this TShortString would shrink the len field to a single byte (and there's one spare between extra and hash if I've understood the struct layout right) meaning you could fit everything including the 4 byte pointer to the actual data into 16 bytes. Short strings would be limited to 255 chars or whatever, but I don't suppose too many people change the current max len?

There was an attempt to do something like that a while ago:


As you can see, it's quite complex and even the author is unsure of its benefits (http://lua-users.org/wiki/MikePall)

It's not a huge gain it's true, but 8 bytes multiplied by 100 or 200 is pretty good - it's maybe 1 or 2 percent of the entire memory footprint of a barebones Lua runtime, depending how barebones you make it.

> For
> large strings, however, we are considering a good API to allow the use
> of external strings.

Interesting. Not quite the same problem but I'd be interested to know what you decide. In my situation though it's very much about the potential for optimising literals large or small, because code is executed in place from ROM and thus constdata literals don't have any impact on RAM usage.

I did this in eLua (note that that's not the final code, there were some fixes, but it illustrates the main idea quite well):


I also have unpublished code that saves the whole TString structure in the compiled bytecode file, so if this file is in flash, the interpreter can access the TString structure directly. It's ugly and non-portable, but works at least partially in 5.1. I don't see this (and other similar optimizations) being part of mainline Lua, but it can exist at least as a "power patch" targeted at small systems. I intend to rework all my memory optimization patch on top of 5.3; from the looks of things, it should be easier to implement them on top of 5.3 than it was to implement them in 5.1.
Note, though, that while these "memory optimizations" are nice (and interesting exercises in programming sometimes), one should not expect miracles from using them; Lua is a dynamic language, needs RAM to do the things we love it for, end of story (they can be quite useful in some specific situations though). That said, a big +1 from me for removing those additional 4 bytes (from TString.hnext) in Lua 5.3 :)

Thanks,
Bogdan


Cheers,

Tom



Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Roberto Ierusalimschy
> [...] That said, a big +1 from me for removing those
> additional 4 bytes (from TString.hnext) in Lua 5.3 :)

I think we can do that following Tom's idea of storing the length of
short strings in a byte. Long strings are not hashed, so they do not
need the 'hnext' field. The struct could look like this:

typedef struct TString {
  CommonHeader;
  lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
  lu_byte shtlen;  /* length for short strings */
  unsigned int hash;
  union {
    size_t len;  /* length for long string */
    struct TString *hnext;  /* linked list for hash (only for short strings) */
  } u;
} TString;


-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

BogdanM

On Wed, Jan 14, 2015 at 12:31 PM, Roberto Ierusalimschy <[hidden email]> wrote:
> [...] That said, a big +1 from me for removing those
> additional 4 bytes (from TString.hnext) in Lua 5.3 :)

I think we can do that following Tom's idea of storing the length of
short strings in a byte. Long strings are not hashed, so they do not
need the 'hnext' field. The struct could look like this:

typedef struct TString {
  CommonHeader;
  lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
  lu_byte shtlen;  /* length for short strings */
  unsigned int hash;
  union {
    size_t len;  /* length for long string */
    struct TString *hnext;  /* linked list for hash (only for short strings) */
  } u;
} TString;
 

That looks like a very good solution to me.

Thanks,
Bogdan 
 
 
-- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage stats for 5.2 vs 5.3

Tom Sutcliffe
In reply to this post by Roberto Ierusalimschy
On 14 Jan, 2015,at 12:31 PM, Roberto Ierusalimschy <[hidden email]> wrote:

[...] That said, a big +1 from me for removing those
additional 4 bytes (from TString.hnext) in Lua 5.3 :)

I think we can do that following Tom's idea of storing the length of
short strings in a byte. Long strings are not hashed, so they do not
need the 'hnext' field. The struct could look like this:

typedef struct TString {
CommonHeader;
lu_byte extra; /* reserved words for short strings; "has hash" for longs */
lu_byte shtlen; /* length for short strings */
unsigned int hash;
union {
size_t len; /* length for long string */
struct TString *hnext; /* linked list for hash (only for short strings) */
} u;
} TString; 

That sounds like the best of both worlds. Gets my vote :)

I've just finished prototyping the external string support that I mentioned. Even though it won't work with 5.3, I used a very similar approach (involving the spare byte and making len a union) so I figure I might as well share the results anyway.

The idea was that for strings you know to be string literals or otherwise in constdata (ie declared with "static const") you don't copy the string data to the TString but just keep a pointer to it. In this way you save at least 8 bytes per string (due to most allocators rounding up to an 8-byte boundary). I tried this on 5.2 with the pointer in a union with TString.len, and using the spare byte to indicate the length if it was an external string, so I didn't increase the size of TString or introduce a new type. I should probably have called them constdata strings or literals or something, never mind. Where I say "external", this is what I mean.

I changed the reserved words, metaevents and anything passed to lua_pushliteral to be external and the results were:

Lua 5.2.3 unmodified (same numbers as before)
====================
lua_newstate: 2032 (1908)
luaL_openlibs: 6400 (6016)
Baseline mem usage: 10120

Lua 5.2.3 with reserved strings external
========================================
lua_newstate: 1672 (1668)
luaL_openlibs: 6376 (6117)
Baseline mem usage: 9736

Which saves a few hundred bytes from the baseline figure. (384 bytes is 3.8% which sounds rather better).

Next I made sure that all the built-in variable names registered with luaL_setfuncs were made external too (but without assuming all other callers of luaL_setfuncs would be, which would probably be an assumption too far).

Lua 5.2.3 with most built-in strings external
=============================================
lua_newstate: 1672 (1668)
luaL_openlibs: 5656 (5577)
Baseline mem usage: 9016

Which is a saving of 10%.

Finally, I exported the new luaL_setfuncsexternal function in the Lua API and used it in all my code (because like probably most people I always declare my luaL_Reg arrays as static const) which saved a bit more memory due to the additional runtime-specific tables included in the environment for my "Baseline mem usage" figure.

Lua 5.2.3 with all luaL_setfuncs external
=========================================
lua_newstate: 1672 (1668)
luaL_openlibs: 5624 (5550)
Baseline mem usage: 8760

Which brings the memory savings up to 13%, which is pretty nice. Unfortunately I realise that this 5.2-based prototype can't be applied on 5.3 due to the 'hnext' field being needed for short strings, which is a bit of a shame. The 5.2 diffs aren't that big and would be quite similar to Roberto's proposal, but if there isn't a spare 4 bytes in the TString then the savings would be negligable unless you had a lot of large literals (and you made the external stuff work for long strings, which I didnt bother doing). Ah well back to the drawing board for ways to eke out my RAM budget!

Cheers,

Tom