Size optimizations for Lua 5.3 VM on ARM Cortex M4

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Size optimizations for Lua 5.3 VM on ARM Cortex M4

Abhijit Nandy
Hi,

I am trying to get a minimal Lua VM to run on Cortex M4. I do not need the parser, only byte code execution.

I have done all the changes in https://www.lua.org/notes/ltn002.html

Also I removed all libraries except base:
/*
** these libs are loaded by lua.c and are readily available to any Lua
** program
*/
static const luaL_Reg loadedlibs[] = {
  {"_G", luaopen_base},
//  {LUA_LOADLIBNAME, luaopen_package},
//  {LUA_COLIBNAME, luaopen_coroutine},
//  {LUA_TABLIBNAME, luaopen_table},
//  {LUA_IOLIBNAME, luaopen_io},
//  {LUA_OSLIBNAME, luaopen_os},
//  {LUA_STRLIBNAME, luaopen_string},
 // {LUA_MATHLIBNAME, luaopen_math},
 // {LUA_UTF8LIBNAME, luaopen_utf8},
//  {LUA_DBLIBNAME, luaopen_debug},
#if defined(LUA_COMPAT_BITLIB)
  {LUA_BITLIBNAME, luaopen_bit32},
#endif
  {NULL, NULL}
};

This reduced the size to 63k for the Lua executable with maximum release optimization for x86 msvc 2015 and on cortex m4 its about 59k

I generally need only table indexing and function calls, so I compiled one of the common scripts which I intend to run on the cortex m4, and saw that it does not use most opcodes(I used chunkspy to decompile and see the opcodes). 

I may need Lua's other language features later, so I decided to try optimizing its VM first.

So now I am removing some of the opcodes, which maybe a bit risky(I am commenting out the relevant vmcase in lvm.c).

What are my other options to reduce the size? I would like to bring down the Lua executable's size to about 10k like eLua!

I am not directly using eLua as it seems to have a number of other things apart from Lua which may not be easy to remove(haven't checked in detail though)

I am also trying out Lua 4.0 to see if I can get a smaller size

Thanks for your help.
Abhijit
Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

BogdanM
Hi,

On Fri, Jun 8, 2018 at 1:25 PM Abhijit Nandy <[hidden email]> wrote:

What are my other options to reduce the size? I would like to bring down the Lua executable's size to about 10k like eLua!


I am one of the authors of eLua, and I've never heard of someone compiling it in such a small space (10k). Where did you find that information? I'd be very interested in those findings.
As for reducing the executable size, the VM is a fairly sensitive area, I wouldn't start from there personally.  Use a size profiler (such as https://github.com/google/bloaty) to figure out what's taking space in your executable, and start from there. You can try to reduce the contribution of external libs (mainly libc) to the executable size. For example, if you don't need floating point, compile it in integer mode only, then use GCC ARM embedded with Newlib-nano (a smaller libc that is included in the GCC ARM embedded toolchain). You can also try to remove all references to scanf, but that means removing the ability to read numbers from streams with "io.read". Obviously compile everything with "-Os". If you can use the ARM compiler (or IAR) instead of GCC, try that too, it might make your exectuable smaller.

Best,
Bogdan
Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Luiz Henrique de Figueiredo
In reply to this post by Abhijit Nandy
> I am trying to get a minimal Lua VM to run on Cortex M4. I do not need the
> parser, only byte code execution.
>
> I have done all the changes in https://www.lua.org/notes/ltn002.html

I hope you have seen and used http://www.lua.org/extras/5.3/noparser.c.
You may also want to remove luaP_opnames in lopcodes.c, but that is
data, not code.

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

云风 Cloud Wu
In reply to this post by Abhijit Nandy

> 在 2018年6月8日,下午6:24,Abhijit Nandy <[hidden email]> 写道:
>
>
> What are my other options to reduce the size? I would like to bring down the Lua executable's size to about 10k like eLua!
>

You may get rid off the metatable, it can reduce many codes.
Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Phil Leblanc
>> What are my other options to reduce the size? (...)

> You may get rid off the metatable, it can reduce many codes.

I am curious about this. Has anybody built a Lua derivative without metatables?

Is it feasible? or are metatables too deeply intricated with the Lua VM?

What would be the gain? (executable footprint, performance?)

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Abhijit Nandy
Hi,

I guess the 10k figure was for eLua RAM usage, my bad: http://www.eluaproject.net/doc/v0.9/en_arch_ltr.html

Thanks for the pointers, I have removed external libs already, and using integers gives me a 2k reduction in the executable size.
I will try removing scanf.

How little has Lua been ever compiled to which keeps all language features of say Lua 4.0? Has anyone tried going back further to Lua 3.0 for example?

I have applied http://www.lua.org/extras/5.3/noparser.c  as well on Lua 5.3 and the size as I mentioned was about 63k on x86. Our last figure with -O2 was 59k on ARM. I will check if we can drop in the Newlib-nano and try higher optimizations.

I decided to give Lua 4.0.1 a shot, and got about 27k on ARM with external libs removed and integer number type. I tried removing all opcodes too just to see how much of a difference removing opcodes can potentially make and got a reduction of about 5k(I will of course be keeping only the opcodes that I see in the bytecode disassembly of my lua code). I assume each opcode is totally independent.

I also realized that I could remove some of the functions from lapi.c which I do not need to bind any C functions, might give me some 1 or 2K.

Someone mentioned removing metatables from the Lua code base. Is that easily removable by removing a function or two?

Is there a regression suite I can run after removing opcodes to check if expected language features still work? Like say just function calling and table access.


Thanks
Abhijit


On Mon, Jun 11, 2018 at 12:51 AM Phil Leblanc <[hidden email]> wrote:
>> What are my other options to reduce the size? (...)

> You may get rid off the metatable, it can reduce many codes.

I am curious about this. Has anybody built a Lua derivative without metatables?

Is it feasible? or are metatables too deeply intricated with the Lua VM?

What would be the gain? (executable footprint, performance?)

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

BogdanM


On Mon, Jun 11, 2018 at 7:27 PM Abhijit Nandy <[hidden email]> wrote:
Hi,

I guess the 10k figure was for eLua RAM usage, my bad: http://www.eluaproject.net/doc/v0.9/en_arch_ltr.html

Thanks for the pointers, I have removed external libs already, and using integers gives me a 2k reduction in the executable size.
I will try removing scanf.

How little has Lua been ever compiled to which keeps all language features of say Lua 4.0? Has anyone tried going back further to Lua 3.0 for example?

I have applied http://www.lua.org/extras/5.3/noparser.c  as well on Lua 5.3 and the size as I mentioned was about 63k on x86. Our last figure with -O2 was 59k on ARM. I will check if we can drop in the Newlib-nano and try higher optimizations.

If you're compiling with -O2 now, try switching to -Os. That'll likely result in additional size savings.
 

I decided to give Lua 4.0.1 a shot, and got about 27k on ARM with external libs removed and integer number type. I tried removing all opcodes too just to see how much of a difference removing opcodes can potentially make and got a reduction of about 5k(I will of course be keeping only the opcodes that I see in the bytecode disassembly of my lua code). I assume each opcode is totally independent.

I also realized that I could remove some of the functions from lapi.c which I do not need to bind any C functions, might give me some 1 or 2K.

Someone mentioned removing metatables from the Lua code base. Is that easily removable by removing a function or two?

I never tried this myself, but I don't think it'd be easy to do, and I also think that the result would be a severly crippled interpreter that couldn't even be considered Lua anymore. Then again, this might be exactly what you need. But still, I'd only keep this as a last resort solution.
 
Is there a regression suite I can run after removing opcodes to check if expected language features still work? Like say just function calling and table access. 

There are Lua test suites for various versions available at https://www.lua.org/tests/.
 


Thanks
Abhijit


On Mon, Jun 11, 2018 at 12:51 AM Phil Leblanc <[hidden email]> wrote:
>> What are my other options to reduce the size? (...)

> You may get rid off the metatable, it can reduce many codes.

I am curious about this. Has anybody built a Lua derivative without metatables?

Is it feasible? or are metatables too deeply intricated with the Lua VM?

What would be the gain? (executable footprint, performance?)

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

云风 Cloud Wu
In reply to this post by Phil Leblanc

Phil Leblanc <[hidden email]>于2018年6月11日周一 上午3:21写道:
>> What are my other options to reduce the size? (...)

> You may get rid off the metatable, it can reduce many codes.

I am curious about this. Has anybody built a Lua derivative without metatables?

Is it feasible? or are metatables too deeply intricated with the Lua VM?

What would be the gain? (executable footprint, performance?)


I try to do some quick and dirty works to remove metatable from lua VM (not test yet) . It can reduce about 5K executable footprint or more . I haven't remove metatable pointer from struct  Table and Udata yet, maybe it can reduce a little memory at runtime.



removemt.diff (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Phil Leblanc
On Tue, Jun 12, 2018 at 7:35 AM, 云风 Cloud Wu <[hidden email]> wrote:
> I try to do some quick and dirty works to remove metatable from lua VM (not
> test yet) . It can reduce about 5K executable footprint or more . I haven't
> remove metatable pointer from struct  Table and Udata yet, maybe it can
> reduce a little memory at runtime.

Thanks. Will look into it.

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Luiz Henrique de Figueiredo
In reply to this post by Abhijit Nandy
You might also want to try removing parts of ldebug.c.

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Abhijit Nandy
Thanks Luiz. I will try that. 

I am working with Lua 4.0.1 code now. We have gotten the size to about 22k now with just the VM code and with the std libs removed and the -O2 and -Os options in place. -O3 for some reason seems to increase the size on arm gcc!

I am hopeful of reducing it to about 18k with ldebug and opcode removals.

Anyone knows the smallest ever size to which Lua with VM only, has ever been compiled :) !

Thanks
Abhijit

On Tue, Jun 12, 2018 at 11:04 PM Luiz Henrique de Figueiredo <[hidden email]> wrote:
You might also want to try removing parts of ldebug.c.

Reply | Threaded
Open this post in threaded view
|

Re: Size optimizations for Lua 5.3 VM on ARM Cortex M4

Coda Highland
On Sat, Jun 16, 2018 at 12:18 PM, Abhijit Nandy <[hidden email]> wrote:

> Thanks Luiz. I will try that.
>
> I am working with Lua 4.0.1 code now. We have gotten the size to about 22k
> now with just the VM code and with the std libs removed and the -O2 and -Os
> options in place. -O3 for some reason seems to increase the size on arm gcc!
>
> I am hopeful of reducing it to about 18k with ldebug and opcode removals.
>
> Anyone knows the smallest ever size to which Lua with VM only, has ever been
> compiled :) !
>
> Thanks
> Abhijit

-O3 should be expected to increase binary size, as it performs
optimizations like unrolling and inlining. -Os is probably the flag
you want to use, because that enables performance optimizations that
don't increase the binary size.

/s/ Adam