Lua Needs

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Lua Needs

Tim Hill


> On Dec 1, 2018, at 7:51 AM, Dirk Laurie <[hidden email]> wrote:
>
> Op Sa. 1 Des. 2018 om 17:08 het Philippe Verdy <[hidden email]> geskryf:
>
>> If there's something to do to the Lua engine is to rework the way upvalues are allocated: they should be on the call stack as much as possible (e.g. for up to ~250 upvalues) and not on the heap.
>
> The way I understand it, upvalues are not on the heap. They are on the
> main execution stack below the bottom of the current function's stack
> frame.
>

My understanding was they start that way, but if the owning function exits they are migrated to a distinct structure to lift them “out” of the stack frame that is being discarded. Something like that Roberto probably winced since its been a while since I looked at his paper in this subject.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: Lua Needs

Soni "They/Them" L.
In reply to this post by Philippe Verdy
Sorry, accidentally replied directly instead of reply to list >.<

I still stand by "Closures = overcomplicating it" with regards to "foo:bar()" and "foo:bar.baz()".

In any case, I'm not sure what your goal is here... ¯\_(?)_/¯

On 2018-12-01 7:25 p.m., Philippe Verdy wrote:
Closures are naturally complicate to understand it you still don't see that they are themselves objects (even if you cannot create any closure-type object in Lua directly because they are created by the compiler and handled by the underlying VM).
Closures are instrinsic internal constructions of the language (which refers to them by speaking about "lexical scopes", but at runtime the "lexical" scope is non-sense, it is backed by a effective structure. Look at Lua sources and search for "Kproto" and "LUA_TCLOSURE" in these sources: you'll find them only in the sources of the VM engine itself, and created by the compiler, from the abstract syntaxic tree AFTER parsing the language source, to determien when and where to create these objects.
Look at the LuaC interface and the description of the "Lua_Allocator" type (in C only, not in Lua itself where it is always inaccessible).
Look at benchmarks that measures the amount of work made by the GC: closures have a very significant impact and cause lot of work to be done in the GC, even in the smallest Lua programs that just run a single function around a large loop and don't allocate any object, don't modify or add any table entry. Look at how the bytecodes are builtup, distinshing upvalues and registers with distinct opcodes even if the value in them are the same type (e.g. number or string).
What is overcomplicating things is the Lua engine itself which is not optimized as much as it could to avoid most costs on heap usage and should maximum reuse of objects instead of constantly allocating and discarding them in the same thread!

Le sam. 1 déc. 2018 à 20:37, Soni "They/Them" L. <[hidden email]> a écrit :
Closures = overcomplicating it.

On 2018-12-01 2:51 p.m., Philippe Verdy wrote:
Ask yourself what are LUA_TCLOSURE objects (yes in the heap...) and see how they are correlated with the static "prototypes" generated by the compiler which instructs how to bind caller's variables to callee's upvalues. The actual values are on the stack but at unknown position from the callee, so a mapping  LUA_TCLOSURE object is allocated on the heap.

Ask yourself why these closure objects exist: this is because functions may be recursive (and not necessarily by trailing calls), so the variables in closures may refer not to the immediate parent caller frame but to some ancestor frame at arbitrary number of levels in the calls stack). To avoid every access to the closure varaible to pass through a chain, there's a need for a mapping which is created and initialized before each call to rebind each variable for the next inner call (according to the closure's prototype): such object is allocated only if there are external variables used by the function that are not local to the function itself, they are not directly within the stack window.

This explains also because the bytecode needs separate opcodes for accessing registers and upvalues: if they could be directly on the stack, it would be enough to reference upvalues with negative register indexes and then use the stack as a sliding window, like we do in C/C++ call conventions (except that stack indexes in C/C++ are negative for parameters, and positive for local variables, some of the former being cached in actual registers, but still supported by "shadow" variables on the stack, allocated at fixed positions in the stack fram by the compiler or just pushed before the actual parameters and popped to restore these registers after the call).

There's been performance tests that show that closures are not so fast, they can create massive amounts of garbage collected objects (with internal type LUA_TCLOSURE). I think this behavior very curious, and the current implementation that allocates the LUA_TCLOSURE objects on the heap is not the best option, the mapping could be allocated directly in the stack of local variables/registers of the caller (and all these closure objects used by the caller could be merged to a single one, i.e. as the largest closure object needed for inner calls, merged like in an union. The closure objects themselves to not hold any variable value, these are just simple mappings from a small fixed integer sets (between 1 and the number of upvalues of the called function) and variable integers (absolute indexes in the thread's stack where the actual variable is located).

The byte code is not as optimized as it could be: the register numbers are only positive, the upvalue numbers are also only positive, they could forml a single set (positive integers for local registers, negative integers for upvalues, meaning that they are used to index the entries in the closure object to get access to the actual varaible located anywhere on the stack, outside of the immediate parent frame). The generated bytecode is not as optimal as it could be because various operations can only work on registers or constants (like ADD r1,r2,r3) so temporary registers must be allocated by the compiler (let's remember that the number of registers is limited). As well Lua's default engine treat all registers the same, when most of them will work with a single r0 register (an "accumulator") which could be implicit for most instructions, and this would reduce the instruction sizes (currently 32-bit or 64 bits), which is inefficient as it uses too much the CPU L1 data cache.

I'm convinced that the current approach in the existing Lua VM engine and its internal instruction set can be largely improved for better performance, without really changing the language itself, to get better data locality (smaller instruction sizes, less wasted unused bit fields), and elimination of uses of the heap for closures (to dramatically reduce the stress on the garbage collector)



Le sam. 1 déc. 2018 à 16:51, Dirk Laurie <[hidden email]> a écrit :
Op Sa. 1 Des. 2018 om 17:08 het Philippe Verdy <[hidden email]> geskryf:

> If there's something to do to the Lua engine is to rework the way upvalues are allocated: they should be on the call stack as much as possible (e.g. for up to ~250 upvalues) and not on the heap.

The way I understand it, upvalues are not on the heap. They are on the
main execution stack below the bottom of the current function's stack
frame.


Reply | Threaded
Open this post in threaded view
|

Re: Lua Needs

Philippe Verdy
Then I'm not sure what you mean by this equality "Closures = overcomplicating it"; to whom are you targetting this visible critic given that  "foo:bar()" is well documented and supported, and  "foo:bar.baz()" is not supported and we can't guess at all what is the expected behavior (what is the effective value to given to "self" within "baz()").

So I have the same remark to address to you " I'm not sure what your goal is here..." , i.e. really to who you are addressing the issue and who could solve it or who you expect a solution (or workaround) to come from.to solve a problem (most probably related to the integration of Lua into another unspecified OOP language, with specific needs not coverabed by Lua with its implied semantics).

Clearly it seems that even the closures in Lua is an unnecessary byut costly syntaxic "toy" which could as well be supported by languages passing variables by *references* in function calls (in C, these references would turn into pointers because references and pointers in C are tightly linked to each other by unchecked typecasts, making C code difficult or impossible to enfore typesafely, many languages don't have or need pointers at all, they use references only to existing objects that have a know type, including Lua with its "nil" object which is a non-null reference to a static/atomized constant object with LUA_TNIL type and the only permitted instance of this type; instead of pointers, Lua can use accessors, i.e. methods to get/set pseudo typecasts to convert a given type to another by decomposing it into smaller underlying storage units, which can be part of an indexed array/table).

As well tables in Lua can be implemented efficiently using one or more tables without the necessary cost of the hashing function and hashtable, it dtoes that by differentiating integers from the larger set of types for table keys and then avoids the overhead of the hansing function and hashtable for integer keys, storing most values directly in a dense array if the integer keys are dense enough in a single interval and then using the hash table for all other key values including integers that are not part of the dense interval). So Lua can "emulate" pointers but more safely than what classic C compilers or assembly lanbuages do in a single untyped/unchecked array of "bytes" in a single space to store everything (data, code/functions, call stacks, objects, virtual memory mappings, threads, I/O handles...). C++ can be cleaner because we can use it without any pointer and any unsafe typecasts (but C++ still has obscure implicit casts that can become out of control, without using any accessor, and that can easily break the OOP design and type inferences).


Le sam. 1 déc. 2018 à 23:13, Soni "They/Them" L. <[hidden email]> a écrit :
> Sorry, accidentally replied directly instead of reply to list >.<
> I still stand by "Closures = overcomplicating it" with regards to "foo:bar()" and "foo:bar.baz()".
> In any case, I'm not sure what your goal is here... ¯\_(?)_/¯
123