[BUG] unsafe metatable paradigm

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[BUG] unsafe metatable paradigm

Viacheslav Usov
The following code, in Lua 5.3.5,

local io = require 'io'
local mt = getmetatable(io.stdin)
mt.__gc = nil
io.tmpfile()
collectgarbage()
collectgarbage()
collectgarbage()
collectgarbage()


-- END

prevents the built-in IO library's finalizers from running on all the handles created subsequently (in the same Lua state), and at least some of the handles created previously (at a minimum, all the standard handles). This can be most easily seen by setting breakpoints at newprefile() and at f_gc() in liolib.c. When running the code above in a fresh Lua instance that is (gracefully) closed after the script returns, newprefile() is called four times (three times for each of the standard IO handles and one for the temp file) and f_gc() is not called at all. When the line

mt.__gc = nil

is commented out, f_gc() is called four times as expected.

I believe this (and the below) is a problem because Lua code should not be able to make C libraries leak resources.

The ultimate problem is in 

static void createmeta (lua_State *L) {
  luaL_newmetatable(L, LUA_FILEHANDLE);  /* create metatable for file handles */
  lua_pushvalue(L, -1);  /* push metatable */
  lua_setfield(L, -2, "__index");  /* metatable.__index = metatable */
  luaL_setfuncs(L, flib, 0);  /* add file methods to new metatable */
  lua_pop(L, 1);  /* pop new metatable */
}

Here, the metatable has the __gc method directly as a key (via flib), and the metatable is modifiable. The latter is what makes this a problem.

I write this up in details because I think a very similar pattern is used by many other (third party) libraries. For example, LuaSocket (latest GitHub at the time of writing; it would be nice to have accurate versioning there):

local so = require 'socket'
local mt = getmetatable(so.tcp())
mt.__gc = nil
collectgarbage()
collectgarbage()
collectgarbage()
collectgarbage()

-- END

A breakpoint at meth_close() never fires when that code runs (do a full life-cycle of a clean Lua state to be sure), yet it does when mt.__gc = nil is commented out. The code that creates the metatable is more complicated than in liolib, but ultimately it shares the same trait, being a modifiable metatable allowing the user code to remove (or replace) the __gc method.

In Programming in Lua 4, pages 333 - 336 and especially Listing 32.2 (the latter essentially identical with the code in PiL 1) explain how to use __gc and metatables when dealing with userdata. Unfortunately, that particular example, as explained in the last paragraph on page 334, is special in the sense that the userdata are never accessible to the user code, so its metatable cannot be manipulated like in my examples above. If those userdata were accessible to the user, that would have the exact same problem, making it possible for the user code to interfere with finalization. So in that sense the example is probably misleading.

The more complicated example in the next section 'An XML Parser' does not have the special circumstances of the previous example. The userdata are accessible to the user, and their metatable (see Listing 32.9) is modifiable, thus seemingly susceptible to the same pattern of abuse. The text around that example seems to consider the GC part already addressed so it does not focus on this at all, missing the chance to warn the reader of the pitfalls.

I do not think the problem is addressed in the manual (__metatable is, in my opinion, not mentioned where it should, nor is its purpose stressed at all). With the sample code in PiL and well-known Lua libraries having this problem as illustrated above, we seem to have a well-established unsafe paradigm and the question is how we can get out of this trouble.

Cheers,
V.
v
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

v
On Mon, 2019-06-24 at 19:16 +0200, Viacheslav Usov wrote:
> I do not think the problem is addressed in the manual (__metatable
> is, in my opinion, not mentioned where it should, nor is its purpose
> stressed at all). With the sample code in PiL and well-known Lua
> libraries having this problem as illustrated above, we seem to have a
> well-established unsafe paradigm and the question is how we can get
> out of this trouble.
>
> Cheers,
> V.

`userdata` object is somewhat "foreign" for Lua: no default operations
can be applied to it: no addition/subtraction/indexing/etc. I
personally think that would make perfect sense to disable its
modification with `getmetatable`/`setmetatable` (keep debug. versions
functional of course), as it isn't Lua-side code business to manage its
internal state (and metatables are part of its internal state).

--
v <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Egor Skriptunoff-2
In reply to this post by Viacheslav Usov
On Mon, Jun 24, 2019 at 8:17 PM Viacheslav Usov wrote:
the question is how we can get out of this trouble.

What trouble you are talking about?
A metatable of a host object must not be accessible to untrusted guest code inside a sandbox.
(independently of whether __gc field exists in the metatable or not)
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

nobody
In reply to this post by Viacheslav Usov
On 24/06/2019 19.16, Viacheslav Usov wrote:
> The following code, [simplified]
>
> getmetatable(io.stdin).__gc = nil
>
> prevents the built-in IO library's finalizers from running on all
> the handles created subsequently…
>
> I believe this […] is a problem because Lua code should not be able
> to make C libraries leak resources.

I do not think this is a problem.  If you mess with other people's
code's internals, don't be surprised if it breaks.

It boils down to this:  Do you trust the user to not intentionally break
things?

If yes, there is no problem.

If no, they shouldn't be running ANY kind of turing-complete code, or at
the very least you should mercilessly sandbox them.


[reordered]
> With the sample code in PiL and well-known Lua libraries having this
> problem as illustrated above, we seem to have a well-established
> unsafe paradigm and the question is how we can get out of this
> trouble.

For all people running the standard Lua interpreter, there is no
problem.  Even if everything was "protected" by __metatable, a simple

   debug.getmetatable(io.stdin).__gc = nil

would still produce the same behavior.  Even if you're in a somewhat
protected environment without debug.*, if you can load C modules, you
can still (re-)expose `lua_getmetatable`.

If even that is prevented, I'm gonna call that a sandbox.  Most likely a
broken / insecure one, but a sandbox – NOT a standard Lua environment.


> The ultimate problem is […that…] the metatable is modifiable.

This is a GOOD thing.  I can't remember how often I had to add __index,
__pairs, __tostring, … or added custom metamethods to make some other
library work with my code.



> I do not think the problem is addressed in the manual (__metatable
> is, in my opinion, not mentioned where it should, nor is its purpose
> stressed at all).

Basically every time someone used __metatable, that broke or seriously
interfered with my metaprogramming.

Most often, the value set as __meta"table" is NOT EVEN A TABLE.  Which
means that any custom function that uses a metamethod for customization
(like tostring / __tostring, pairs / __pairs do) cannot rely on the
interface enforced by setmetatable (the metatable MUST be nil or a
table) but has to explicitly check if someone messed with that, before
it can proceed to check if the custom metamethod is available.

Even if it is a table, you'll have a hard(er) time adding your own
metamethods (in such a way that BOTH Lua and C code can see them), which
means those "protected" values will either be second-class citizens that
need yet another dummy wrapper around them (making everything bigger and
slower) or you'll have to poke at it with debug.getmetatable (if you
can)…  Both are painful.

(
And yes, you *could* set a __metatable worthy of its name that permits
reading all non-critical fields, that permits adding safe fields
(anything that's not Lua core) and thus does not impede metaprogramming.
(A bit of __index, __pairs, __newindex and maybe one or two sets of
names as blacklist / graylist would do the job.)  But let's be honest,
no one is going to bother doing that when (it seems to them that)
__metatable = "HAHA protected!" achieves what they care about.  (Also,
that would be a metatable on the __metatable in a metatable, the former
potentially also protected by a __metatable…  I guess some people's
heads would just explode.)
)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Viacheslav Usov
In reply to this post by Egor Skriptunoff-2
On Mon, Jun 24, 2019 at 9:13 PM Egor Skriptunoff <[hidden email]> wrote:

> What trouble you are talking about?

The trouble is the pervasive use of the unsafe paradigm in extant library code and the obscurity of the subject.

> A metatable of a host object must not be accessible to untrusted guest code inside a sandbox.
> (independently of whether __gc field exists in the metatable or not)

I actually disagree with this as stated.

Regardless, PiL4 says this: "Such behavior is unacceptable for any Lua library. No matter how we use a library, it should neither corrupt C data nor cause the Lua system to crash" (p. 268). This statement is  absolute without restrictions such as "untrusted guest code inside a sandbox". The examples I gave earlier all corrupt C data. So this issue has nothing to do with sandboxing and much to do with writing a C library properly, and the means available for that purpose.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Viacheslav Usov
In reply to this post by nobody
On Mon, Jun 24, 2019 at 10:14 PM nobody <[hidden email]> wrote:

> This is a GOOD thing.  I can't remember how often I had to add __index, __pairs, __tostring, … or added custom metamethods to make some other library work with my code.

As I said in another message, I do not mind having the metatable modifiable. I take issue with being able to suppress a C-language finalizer set by a C library to clean up its C data. Generalising from here would not be relevant in my opinion. One could argue the metatable is not how finalization of userdata ought to be controlled to begin with (the ship might have sailed, though).

> It boils down to this:  Do you trust the user to not intentionally break things?

"Intentionally" may or may not be part of this. But even if we stick with it, and even if we say "it should then be a sandbox", what is the solution?

I am not looking for an arbitrarily strict sandbox. My whole purpose is to run finalizers set up by C libs when they should run and reclaim associated resources. If the intended (still sticking with it) use of a library (i.e., without leaks) can bring my sandbox down, be it. Call it an intended mode of failure. I do not want unintended failure modes, especially not those caused by resource leaks.

So, why is the standard IO library not sandbox-friendly (in the strict sense clarified above) out of the box? Is it going to lose any (intended, since we are sticking with it) functionality by being sandbox-friendly? Is it because it is (too) hard to make it so? Perhaps we might think of a mechanism to make this straightforward? Or at least explain the tradeoffs somewhere?

As an aside, it is not accidental that I brought up this issue today. It should be viewed in the context of the general discussion of garbage collection, finalization and "closing" in Lua 5.4.

> Basically every time someone used __metatable, that broke or seriously interfered with my metaprogramming.

You are probably making a whole lot of good points in the discussion of __metatable that follows, and which is entirely absent in the manual and PiL (and Google, if my inept command of it counts for anything), which again demonstrates what I mentioned earlier: all this is arcane knowledge and one basically starts with the unsafe paradigm when writing a C library. Or a sandbox-friendly C library, if you must insist.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Roberto Ierusalimschy
> So, why is the standard IO library not sandbox-friendly (in the strict
> sense clarified above) out of the box? Is it going to lose any (intended,
> since we are sticking with it) functionality by being sandbox-friendly? Is
> it because it is (too) hard to make it so? Perhaps we might think of a
> mechanism to make this straightforward?

do
  local io = require 'io'
  local mt = getmetatable(io.stdin)
  mt.__metatable = "not your business"
end

Problem solved? Too hard?


> Or at least explain the tradeoffs somewhere?

Cons - we lose the flexibility to add new metamethods to files (no big deal);
Pros - we satisfy some nitpicking (as big a deal as desired).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Andrew Gierth
In reply to this post by Viacheslav Usov
>>>>> "Viacheslav" == Viacheslav Usov <[hidden email]> writes:

 Viacheslav> So, why is the standard IO library not sandbox-friendly (in
 Viacheslav> the strict sense clarified above) out of the box?

I suspect most sandboxes don't expose the IO library at all (mine
certainly does not), since unmediated file access is usually one of the
things that sandboxing exists to prevent.

(The string library is another matter; some kinds of sandbox do need to
add a __metatable element to the string metatable to ensure that
sandboxed code can't reach any global values.)

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Philipp Janda
In reply to this post by Roberto Ierusalimschy
Am 25.06.19 um 15:14 schröbte Roberto Ierusalimschy:
>
> do
>    local io = require 'io'
>    local mt = getmetatable(io.stdin)
>    mt.__metatable = "not your business"
> end
>
> Problem solved? Too hard?

Not good enough.

io.stdin.__index.__gc = nil


Philipp



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Viacheslav Usov
In reply to this post by Roberto Ierusalimschy
On Tue, Jun 25, 2019 at 3:15 PM Roberto Ierusalimschy <[hidden email]> wrote:

> Problem solved?

Not, I am afraid. The block of questions that you quoted and apparently intended to answer was about solving the problem within the library code.

> Too hard?

Given your response, I am tempted to say that it is. Why, indeed, are you suggesting a workaround in the user code rather than a fix in the library?

My personal take is that it is too hard, too. Conceptually, the cleanup required by the C library before a userdatum is freed is its internal implementation detail. However, the requirement to use the metatable mechanism for this leaks the implementation detail out, where it becomes a vulnerability. At this point the fact may not even be understood by the library's author, or found easier to ignore because it seems surprisingly nasty to deal with properly.

Note that the other methods that are defined in a metatable are not exactly in the same league. Unlike __gc, which is an implementation detail, they are usually a public interface, tampering with which is more likely to make the userdatum unusable (for the user) than to break the implementation.

With this in mind, the following extension might be considered: introduce C API to get/set/reset a "cleanup" callback for a userdatum, which is a C function accepting a pointer to the userdatum. The callback is called immediately before the userdatum is freed (after the (optional) __gc finalizer runs), so no complications related to marking userdata for finalziation and resurrection. This is fully compatible with the existing code (__gc still works), and it seems that many C libraries will only need this callback rather than __gc, this callback is not leaked through the public interface, it even seems simpler in use for a C programmer, and the C library's author is freed from having to think about protecting it from the user code, and from having to come up with excuses for not doing so.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Viacheslav Usov
In reply to this post by Andrew Gierth
On Tue, Jun 25, 2019 at 11:18 PM Andrew Gierth <[hidden email]> wrote:

> I suspect most sandboxes don't expose the IO library at all (mine certainly does not), since unmediated file access is usually one of the things that sandboxing exists to prevent.

I can certainly think of cases when the IO library might perfectly well be accessible to sandboxed Lua code. But, anyway, I used the standard IO library merely as a concrete and well-known example. The issue is general and applies to any C library whose userdata must be cleaned up before they are freed by Lua.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] unsafe metatable paradigm

Roberto Ierusalimschy
In reply to this post by Philipp Janda
> Not good enough.
>
> io.stdin.__index.__gc = nil

Indeed; thanks. We will fix that.

-- Roberto