load and upvalues

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

load and upvalues

Dennis Fischer
Greetings lua-l!

While considering how to do a few things in Lua for a future project I have in mind I got a bit distracted thinking about Lua upvalues and ended up at the following thought:

The last argument to `load` is passed as the first upvalue to the new closure that is returned. This is usually the environment (at least for what the manual refers to as "main chunks"), which is what this argument is supposedly meant for.

If, however, you dump any function as bytecode and load it again, its environment may be the second, third or fifth upvalue, or it may not have an environment at all (a function that accesses no globals).

So my question is, why make the first upvaue a special case in the first place? This seems very un-Lua to me. A more generic (and backwards-compatible) alternative would be for all additional arguments to load to be passed as upvalues to the new closure in order of appearance.

    load(chunk [, chunkname [, mode [, ...]]])

The code to set the n-th upvalue obviously already exists and gets called by the load function, so, I expect, to someone who's familiar with Luas C implementation this change should be trivial (At least one could assume so).

Of course, my reason for opsting this here is because I'd like to hear what others think, so please, *do* give your opinions :)

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Dirk Laurie-2
Op Wo. 10 Apr. 2019 om 11:21 het Dennis Fischer
<[hidden email]> geskryf:

> The last argument to `load` is passed as the first upvalue to the new closure that is returned. This is usually the environment (at least for what the manual refers to as "main chunks"), which is what this argument is supposedly meant for.
> If, however, you dump any function as bytecode and load it again, its environment may be the second, third or fifth upvalue, or it may not have an environment at all (a function that accesses no globals).
>
> So my question is, why make the first upvaue a special case in the first place?

If you load from Lua source code, which is the only case 99% of Lua
users ever utilize, the first upvalue is the only upvalue, and there
is no risk.

The thing is, upvalues are highly context-dependent. You can't tell
from reading the source code whether non-local variables are supposed
to be upvalues or globals. Thus you don't know, when you are setting
the second, third or fifth upvalue, which named variable that would
refer to, unless you use the debug library.

> This seems very un-Lua to me.

I don't think it is un-Lua to omit support for features that 99% per
cent of Lua users would never need. For example, the hyperbolic
functions were deprecated in Lua 5.3.

> A more generic (and backwards-compatible) alternative would be for all additional arguments to load to be passed as upvalues to the new closure in order of appearance.
>
>     load(chunk [, chunkname [, mode [, ...]]])
> The code to set the n-th upvalue obviously already exists and gets called by the load function, so, I expect, to someone who's familiar with Luas C implementation this change should be trivial (At least one could assume so).

You can (and should) check that upvalues have the names that you
expect whenever you set them.
    f = load(chunk [, chunkname [, mode [, env]]])
    assert ("myname"==debug.setupvalue(f,2,value)

Having extra arguments therefore adds no functionality, it merely
encourages users to set upvalues without checking their names.

No thanks.

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Egor Skriptunoff-2
In reply to this post by Dennis Fischer
On Wed, Apr 10, 2019 at 12:22 PM Dennis Fischer wrote:
So my question is, why make the first upvaue a special case in the first place? This seems very un-Lua to me. A more generic (and backwards-compatible) alternative would be for all additional arguments to load to be passed as upvalues to the new closure in order of appearance.

    load(chunk [, chunkname [, mode [, ...]]])


This makes sense.
I like this suggestion (although it would be useful to very small amount of Lua users).
Both string.dump() and load() should take into account function's upvalues.

If number of upvalues is known:

local bytecode, upvalues_num, upv1, upv2, upv3 = string.dump(some_func)
...
local restored_func = load(bytecode, "=loaded", "b", upv1, upv2, upv3)
 
If number of upvalues is unknown:
 
local dumped = {string.dump(some_func)}
...
local restored_func = load(dumped[1], "=loaded", "b", table.unpack(dumped, 3, dumped[2] + 2))
 
This approach is backward-compatible after the following modification:
If no upvalues are given to load() and bytecode represents main chunk (this could be easily determined by examining bytecode) then global environment is used as the first upvalue.

P.S.
Of course, this would not help when you need to dump and load two functions which share the same upvalue.
 
P.P.S.
Using debug library is considered as "dirty trick" and should be avoided whenever possible.
Meanwhile dumping/loading functions having upvalues is a pretty legal job, but we can't do it without debug library.

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Soni "They/Them" L.


On 2019-05-07 5:14 p.m., Egor Skriptunoff wrote:

> On Wed, Apr 10, 2019 at 12:22 PM Dennis Fischer wrote:
>
>     So my question is, why make the first upvaue a special case in the
>     first place? This seems very un-Lua to me. A more generic (and
>     backwards-compatible) alternative would be for all additional
>     arguments to load to be passed as upvalues to the new closure in
>     order of appearance.
>
>         load(chunk [, chunkname [, mode [, ...]]])
>
>
> This makes sense.
> I like this suggestion (although it would be useful to very small
> amount of Lua users).
> Both *string.dump()* and *load()* should take into account function's
> upvalues.
>
> If number of upvalues is known:
>
> local bytecode, upvalues_num, upv1, upv2, upv3 = string.dump(some_func)
> ...
> local restored_func = load(bytecode, "=loaded", "b", upv1, upv2, upv3)
>
> If number of upvalues is unknown:
>
> local dumped = {string.dump(some_func)}
> ...
> local restored_func = load(dumped[1], "=loaded", "b",
> table.unpack(dumped, 3, dumped[2] + 2))
>
> This approach is backward-compatible after the following modification:
> If no upvalues are given to *load()* and bytecode represents main
> chunk (this could be easily determined by examining bytecode) then
> global environment is used as the first upvalue.
>
> P.S.
> Of course, this would not help when you need to dump and load two
> functions which share the same upvalue.
> P.P.S.
> Using debug library is considered as "dirty trick" and should be
> avoided whenever possible.
> Meanwhile dumping/loading functions having upvalues is a pretty legal
> job, but we can't do it without debug library.
>

Sandbox killer.

What's next, string.dump(stack_offset)?

I don't believe string.dump should return upvalues. It's not a
persistence library. (Altho the idea of a pure-Lua persistence library
*does* seem interesting.)

On the other hand, it's perfectly safe to load arbitrary well-formed
bytecode with arbitrary upvalues.

See: https://github.com/SoniEx2/loadx/

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Egor Skriptunoff-2
On Tue, May 7, 2019 at 11:55 PM Soni "They/Them" L. wrote:

Sandbox killer.

I don't believe string.dump should return upvalues.


Returning upvalues would not become a "sandbox killer".

Even if string.dump() does not return upvalues,
when building a sandbox, you MUST modify string.dump() to reject dumping your own functions.
Otherwise all constants used in your functions will be known to untrusted code.
Untrusted code must be limited to dumping only its own functions.
Yes, you should keep a list of your own functions exposed to untrusted code.

For example, when untrusted code invokes
string.dump(your_function),
you should instead invoke the following
string.dump(function(...) return your_function(...) end)
This way, untrusted code could successfully dump and load your_function() without being able to extract any info about it.
Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Egor Skriptunoff-2
On Wed, May 8, 2019 at 1:07 AM Egor Skriptunoff wrote:
This way, untrusted code could successfully dump and load your_function() without being able to extract any info about it.


I foresee the question: "Isn't it dangerous to allow untrusted code to load arbitrary bytecode?"
Yes.  Maliciously crafted bytecode could crash the interpreter.
But on the other side, if sandbox is unable to run arbitrary Lua code, it is not a good sandbox.
The solution might be to append hash(pwd+bytecode) to every bytecode produced by your string.dump()
If your load() confirmed that hash is correct then this bytecode could be safely loaded.
This way untrusted code could load bytecodes that were previously dumped, but not crafted.

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Soni "They/Them" L.
In reply to this post by Egor Skriptunoff-2


On 2019-05-07 7:07 p.m., Egor Skriptunoff wrote:

> On Tue, May 7, 2019 at 11:55 PM Soni "They/Them" L. wrote:
>
>
>     Sandbox killer.
>
>     I don't believe string.dump should return upvalues.
>
>
>
> Returning upvalues would not become a "sandbox killer".
>
> Even if *string.dump()* does not return upvalues,
> when building a sandbox, you MUST modify *string.dump()* to reject
> dumping your own functions.
> Otherwise all constants used in your functions will be known to
> untrusted code.
> Untrusted code must be limited to dumping only its own functions.
> Yes, you should keep a list of your own functions exposed to untrusted
> code.
>
> For example, when untrusted code invokes
> string.dump(your_function),
> you should instead invoke the following
> string.dump(function(...) return your_function(...) end)
> This way, untrusted code could successfully dump and load
> your_function() without being able to extract any info about it.

Untrusted code can also just go on github and extract everything it
wants to know about it.

The secret is not in the code (constants).

I only need to modify string.dump to sign the stuff with an
authentication key (aka "salt" but really look into HMAC), and verify
that in load().

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Egor Skriptunoff-2
In reply to this post by Soni "They/Them" L.


On Tue, May 7, 2019 at 11:55 PM Soni "They/Them" L. <[hidden email]> wrote:

See: https://github.com/SoniEx2/loadx/


Ok, loadx is nice.
But where is dumpx?

The problem is that when you're writing Lua code, it's hard to predict what would be the order of upvalues inside a function.
And it's a chance to accidentally swap upv1 and upv2 every time when you're modifying the Lua code of the function.
That's why your loadx needs its counterpart dumpx to easily receive all upvalues in their correct order.
Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Soni "They/Them" L.


On 2019-05-07 7:43 p.m., Egor Skriptunoff wrote:

>
>
> On Tue, May 7, 2019 at 11:55 PM Soni "They/Them" L. <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     See: https://github.com/SoniEx2/loadx/
>
>
> Ok, loadx is nice.
> But where is dumpx?
>
> The problem is that when you're writing Lua code, it's hard to predict
> what would be the order of upvalues inside a function.
> And it's a chance to accidentally swap upv1 and upv2 every time when
> you're modifying the Lua code of the function.
> That's why your loadx needs its counterpart dumpx to easily receive
> all upvalues in their correct order.

Try It And See^TM

Then Try It Again based on the pcall result.

Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Sean Conner
In reply to this post by Egor Skriptunoff-2
It was thus said that the Great Egor Skriptunoff once stated:
> On Wed, May 8, 2019 at 1:07 AM Egor Skriptunoff wrote:
>
> > This way, untrusted code could successfully dump and load your_function()
> > without being able to extract any info about it.
> >
>
>
> I foresee the question: "Isn't it dangerous to allow untrusted code to load
> arbitrary bytecode?"

  That's why from Lua 5.2 onwards, load() has a parameter to restrict
loading of bytecode.  And a sandbox would not include load() (or a
restricted version of it).

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: load and upvalues

Egor Skriptunoff-2
On Wed, May 8, 2019 at 3:08 AM Sean Conner wrote:
> I foresee the question: "Isn't it dangerous to allow untrusted code to load
> arbitrary bytecode?"

  That's why from Lua 5.2 onwards, load() has a parameter to restrict
loading of bytecode.  And a sandbox would not include load() (or a
restricted version of it).


This way untrusted code must satisfy additional restrictions to be able to run inside your sandbox.
I doubt this is practical.
Usually the author of the untrusted code knows nothing about your sandbox and the restrictions it must comply with  :-)

Ideal sandbox must run almost everything; including the possibility that the untrusted code might create its own (nested) sandbox while running inside your sandbox.
All standard Lua library functions must be emulated, but not removed.