On implementing a functions whitelist for a sandbox

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

On implementing a functions whitelist for a sandbox

Kynn Jones
Hi!

I am learning Lua by working my way through Programming in Lua (4th edition).

In chapter 25 there's a discussion on implementing a sandbox, based on defining a "white list" (my wording) of allowable functions.

This seems straightforward in principle, but in practice I am having a hard time getting hold of some core functions that *need* to go into the whitelist.  The code below is a very pared-down example that shows one of these elusive functions in actions.

    #!/usr/bin/env lua5.3
    local loaded_chunk = assert(load('nonexistent()', "=(load)", "t", {}))
    local function callhook ()
      local info = debug.getinfo(2, "fnS")
      if info.func == loaded_chunk then return end
      error(string.format("calling disallowed function (%s:%d): %s (%s)",
                          info.short_src,
                          info.linedefined,
                          (info.name or "?"),
                          info.func))
    end
    debug.sethook(callhook, "c")
    loaded_chunk()

If I run this script on my Debian system I get the following output:

    lua5.3: ./snippet.lua:6: calling disallowed function ([C]:-1): nonexistent (function: 0x55918f70aef0)
    stack traceback:
        [C]: in function 'error'
        ./snippet.lua:6: in function <./snippet.lua:3>
        [C]: in global 'nonexistent'
        (load):1: in local 'loaded_chunk'
        ./snippet.lua:13: in main chunk
        [C]: in ?

Now, this output is somewhat disconcerting (to me at least, since it gives the impression that the true-to-its-name function `nonexistent` actually lives at 0x55918f70aef0), but the important thing is that what has triggered the call hook is a "mystery function X" called in response to the `attempt to call a nil value` error.  It is this mystery function X that I want to include into the allowed functions whitelist.

How can I determine this function and refer to it in my code?

Muito obrigado in advance!

kj

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Sean Conner
It was thus said that the Great Kynn Jones once stated:

> Hi!
>
> I am learning Lua by working my way through Programming in Lua (4th
> edition).
>
> In chapter 25 there's a discussion on implementing a sandbox, based on
> defining a "white list" (my wording) of allowable functions.
>
> This seems straightforward in principle, but in practice I am having a hard
> time getting hold of some core functions that *need* to go into the
> whitelist.  The code below is a very pared-down example that shows one of
> these elusive functions in actions.
>
>     #!/usr/bin/env lua5.3
>     local loaded_chunk = assert(load('nonexistent()', "=(load)", "t", {}))
>     local function callhook ()
>       local info = debug.getinfo(2, "fnS")
>       if info.func == loaded_chunk then return end
>       error(string.format("calling disallowed function (%s:%d): %s (%s)",
>                           info.short_src,
>                           info.linedefined,
>                           (info.name or "?"),
>                           info.func))
>     end
>     debug.sethook(callhook, "c")
>     loaded_chunk()
>
> If I run this script on my Debian system I get the following output:
>
>     lua5.3: ./snippet.lua:6: calling disallowed function ([C]:-1):
> nonexistent (function: 0x55918f70aef0)
>     stack traceback:
>         [C]: in function 'error'
>         ./snippet.lua:6: in function <./snippet.lua:3>
>         [C]: in global 'nonexistent'
>         (load):1: in local 'loaded_chunk'
>         ./snippet.lua:13: in main chunk
>         [C]: in ?
>
> Now, this output is somewhat disconcerting (to me at least, since it gives
> the impression that the true-to-its-name function `nonexistent` actually
> lives at 0x55918f70aef0), but the important thing is that what has
> triggered the call hook is a "mystery function X" called in response to the
> `attempt to call a nil value` error.  It is this mystery function X that I
> want to include into the allowed functions whitelist.
>
> How can I determine this function and refer to it in my code?

  Is the only purpose of getting this function [1] to satisfy the method you
are using to identify missing functions?

  When I sandbox code, I know the list of functions I want the sandboxed
code to run, and only include those functions.  If I want to give the
sandboxed code access to the functions print() and os.getenv(), I'll
explicitiely add them to the environment:

        env = { print = print , os = { getenv = os.getenv } }
        f   = assert(loadfile(file,"t",env))
        assert(pcall(f))

  I modified your example a bit:

        [spc]lucy:/tmp>lua
        Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio
        > loaded_chunk = assert(load('nonexistent()',"=(load)","t",{}))
        > assert(pcall(loaded_chunk))
        stdin:1: (load):1: attempt to call a nil value (global 'nonexistent')
        stack traceback:
                [C]: in function 'assert'
                stdin:1: in main chunk
                [C]: in ?
        >

  -spc

[1] The function is an internal function written in C.  It has no name
        in Lua.

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Kynn Jones
Hi Sean,

Regarding your question, it seems to me that the whitelist should
include (a) functions that the loaded code invokes, directly or
indirectly; (b) functions that get called by the interpreter in the
process of running the loaded code (e.g. functions that intercept
errors in the loaded code).

In other words, the loaded code should not be blocked just because the
interpeter called its own functions in the process of interpreting the
loaded code.

---

In any case, I tried your idea, after first modifying the snippet so
that it looks more like a sandbox.

    #!/usr/bin/env lua5.3
    local loaded_chunk = assert(load('nonexistent()', "=(load)", "t", {}))
    local whitelist = {
                       [loaded_chunk] = true,
                       [pcall] = true,
                       [assert] = true,
                       -- ... other allowed functions
                      }
    local function callhook ()
      local info = debug.getinfo(2, "fnS")
      if not whitelist[info.func] then
        error(string.format("calling disallowed function (%s:%d): %s (%s)",
                            info.short_src,
                            info.linedefined,
                            (info.name or "?"),
                            info.func))
      end
    end
    debug.sethook(callhook, "c")
    assert(pcall(loaded_chunk))

Now, the snippet fails with

    lua5.3: ./snippet.lua:12: calling disallowed function ([C]:-1): ? (function: 0x56179333bef0)
    stack traceback:
        [C]: in function 'error'
        ./snippet.lua:12: in function <./snippet.lua:9>
        [C]: in ?
        [C]: in function 'assert'
        ./snippet.lua:20: in main chunk
        [C]: in ?

Again, the nature of the underlying error (the attempt to call
nonexistent) gets obscured.  Instead, we get a `calling disallowed
function` error, which is not entirely accurate.

Thank you once more,

kj

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Sean Conner
It was thus said that the Great Kynn Jones once stated:
> Hi Sean,
>
> Regarding your question, it seems to me that the whitelist should
> include (a) functions that the loaded code invokes, directly or
> indirectly; (b) functions that get called by the interpreter in the
> process of running the loaded code (e.g. functions that intercept
> errors in the loaded code).

  The purpose of a sandbox is to run code in a restricted environment,
whether that's because the code is untrusted, or it's supposed to do one
thing and only one thing.  Generally, such code only needs access to
functions to do its job.

  In Lua, global variables are stored in a table that's normally visible to
all code.  When you do:

        chunk = load('print("hellow world!")',"=(load)","t")
        chunk()
        hello world!

  This will compile the code and return it as a callable function, that when
run, will pring "hellow world!" to the screen.  It can do this because the
chunk is compiled with the current global table, which includes the print()
function.

  When you give an empty table to load() as the last parameter, what you are
doing is giving the chunk of code with no global variables defined.

        chunk = load('print("hellow world!")',"=(load)","t",{})
  chunk()
        (load):1: attempt to call a nil value (global 'print')
        stack traceback:
                (load):1: in function 'chunk'
                (...tail calls...)
                [C]: in ?

  To fix this, you preload the table to load() with the "global" functions
and data you want the chunk to see:

        env = { print = print } -- make print available to the chunk of code
        chunk = load('print("hellow world!")',"=(load)","t",env)
        chunk()
        hello world!

  As an example, if the code you want to sandbox is only supposed to
calculate math formulas, you will probably want to make the math library,
the table library, and the print() function available to the code---it
shouldn't do anything else, so stuff like the io library, or the os library,
or the package library, shouldn't be included in the sandbox:

        env = { math = math , table = table , print = print }
        chunk = load(chunk_of_code,"=(load)","t",env)

  If the code in chunk_of_code tries to call some function not in math,
table, or print(), the code will compile, but when you try to run it, it
will fail with an eeor, like:

        (load):15: attempt to call a nil value (global 'select')

  Then you can either make the given function available to the code (in this
case, it's probably okay to add select() to its "global environment") or
not.  If you do:

        env =
        {
          math   = math,
          table  = table,
          print  = print,
          select = select,
        }

  The above is what is meant as "whitelisting".

> In other words, the loaded code should not be blocked just because the
> interpeter called its own functions in the process of interpreting the
> loaded code.
> ---
>
> In any case, I tried your idea, after first modifying the snippet so
> that it looks more like a sandbox.
>
>     #!/usr/bin/env lua5.3
>     local loaded_chunk = assert(load('nonexistent()', "=(load)", "t", {}))
>     local whitelist = {
>                        [loaded_chunk] = true,
>                        [pcall] = true,
>                        [assert] = true,
>                        -- ... other allowed functions
>                       }
>     local function callhook ()
>       local info = debug.getinfo(2, "fnS")
>       if not whitelist[info.func] then
>         error(string.format("calling disallowed function (%s:%d): %s (%s)",
>                             info.short_src,
>                             info.linedefined,
>                             (info.name or "?"),
>                             info.func))
>       end
>     end
>     debug.sethook(callhook, "c")
>     assert(pcall(loaded_chunk))
>
> Now, the snippet fails with
>
>     lua5.3: ./snippet.lua:12: calling disallowed function ([C]:-1): ?
> (function: 0x56179333bef0)
>     stack traceback:
>         [C]: in function 'error'
>         ./snippet.lua:12: in function <./snippet.lua:9>
>         [C]: in ?
>         [C]: in function 'assert'
>         ./snippet.lua:20: in main chunk
>         [C]: in ?
>
> Again, the nature of the underlying error (the attempt to call
> nonexistent) gets obscured.  Instead, we get a `calling disallowed
> function` error, which is not entirely accurate.

  Your code is trying to duplicate what the Lua interpreter will do for you.
Of course, the error won't be "calling disallowed function"---is THAT what
you are trying to get?

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Kynn Jones
On Wed, Aug 7, 2019 at 12:22 AM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Kynn Jones once stated:
> Regarding your question, it seems to me that the whitelist should
> include (a) functions that the loaded code invokes, directly or
> indirectly; (b) functions that get called by the interpreter in the
> process of running the loaded code (e.g. functions that intercept
> errors in the loaded code).

Let me elaborate on this a bit.  What I meant to say is that the
functions included in the whitelist can be classified into two groups.

The first group consists of functions that we have included in the
whitelist because we consider them integral to what we have decided to
allow the loaded code to do (for example perform mathematical
operations); these are functions that the loaded code invokes directly
or indirectly.

The second group of functions in the whitelist are other functions,
which are *not* invoked directly or indirectly by the loaded code, but
that get called nonetheless by the Lua interpreter in the process of
running the loaded code.  These include functions that get invoked
when the loaded code has a runtime error.

For example, if functions A and B are defined as

    function A (x) return B(x) end

    function B (x) return math.random(x) end

...and the loaded code is the string 'print(A(1/0))', then `print`,
`A`, and integer division are being *directly* invoked by the loaded
code, while `B` and `math.random` are being *indirectly* invoked by
the loaded code.

On the other hand, the "unnamed code X" responsible for catching the
ensuing error

    bad argument #1 to 'random' (number has no integer representation)

...and printing the useful error message for it is not being invoked,
either directly or indirectly (at least according to the way I'm
defining these terms here), by the loaded code.  Nevertheless, this
"unnamed code X" is carrying out Lua's desired behavior, and should
not be disallowed by the sandbox.  In other words, the functions in
"unnamed code X" should be in included in the whitelist.

My only point is that, in order to put the whitelisting idea into
practice, one needs to be able to include in the whitelist functions
belonging to either of the two categories described above.

Currently I know how to include functions of the first category in the
whitelist, not those of the second category.  Maybe the
whitelist-based sandbox idea can only be implemented properly in C?

kj

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Kynn Jones
Here is an implementation of the example I described in my previous
message, hopefully making clear the distinction I am making between
directly and indirectly invoked functions on the one hand, and
"unnamed code X" functions on the other.

    #!/usr/bin/env lua5.3
    -- Filename: example.lua

    function A (x) return B(x) end

    function B (x) return math.random(x) end

    -- this environment is probably more generous than it needs to be for this
    -- example, but just in case...
    local environment = {
                          A = A,
                          B = B,
                          math = math,
                          print = print,
                        }

    local snippet = string.format('print(A(%s))', arg[1])
    local loaded_chunk = assert(load(snippet, "=(load)", "t", environment))

    local whitelist = {
      [loaded_chunk] = true,
      [A] = true,
      [B] = true,
      [math.random] = true,
      [print] = true,
      [tostring] = true,
    }

    local function callhook (event)
      local info = debug.getinfo(2, "fnS")
      print(string.format("%-15s\t%-15s\t%s",
                          info.source, info.name or "?", info.func))
      if not whitelist[info.func] then
        error(string.format("calling disallowed function (%s:%d): %s (%s)",
                            info.short_src, info.linedefined, (info.name or "?"),
                            info.func))
      end
    end

    debug.sethook(callhook, "c")
    loaded_chunk()

If I run this script with argument, say, "1000", I get the following
output:

    =(load)            loaded_chunk       function: 0x5588e62a2b80
    @./example.lua     A                  function: 0x5588e62a2260
    @./example.lua     ?                  function: 0x5588e62a2780
    =[C]               random             function: 0x5588e59bba80
    =[C]               print              function: 0x5588e59b7cc0
    =[C]               ?                  function: 0x5588e59b7830
    841

...where 841 is, basically, `math.random(1000)`.  This confirms that
the loaded chunk does not invoke, *directly or indirectly*, any
function that is not in `whitelist`.

But if I run the script with argument "-1", I get the following
output

    =(load)            loaded_chunk       function: 0x5565a23f3b80
    @./example.lua     A                  function: 0x5565a23f3260
    @./example.lua     ?                  function: 0x5565a23f3780
    =[C]               random             function: 0x5565a10c9a80
    =[C]               ?                  function: 0x5565a10adef0
    lua5.3: ./example.lua:34: calling disallowed function ([C]:-1): ? (function: 0x5565a10adef0)
    stack traceback:
        [C]: in function 'error'
        ./example.lua:34: in function <./example.lua:29>
        [C]: in ?
        [C]: in function 'math.random'
        ./example.lua:6: in function 'B'
        (...tail calls...)
        (load):1: in local 'loaded_chunk'
        ./example.lua:41: in main chunk
        [C]: in ?

Whatever is triggering the error whose traceback is shown above is a
call to some "unnamed code X" function, which occurred when Lua tried
to evaluate `math.random(-1)`.

My one and only point is that in order to properly implement a
whitelist strategy, one needs to include functions like this "unnamed
code X" function in `whitelist`.  I am trying to find out whether this
is at all possible, and if so, how to do it.

kj


On Wed, Aug 7, 2019 at 4:08 AM Kynn Jones <[hidden email]> wrote:
On Wed, Aug 7, 2019 at 12:22 AM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Kynn Jones once stated:
> Regarding your question, it seems to me that the whitelist should
> include (a) functions that the loaded code invokes, directly or
> indirectly; (b) functions that get called by the interpreter in the
> process of running the loaded code (e.g. functions that intercept
> errors in the loaded code).

Let me elaborate on this a bit.  What I meant to say is that the
functions included in the whitelist can be classified into two groups.

The first group consists of functions that we have included in the
whitelist because we consider them integral to what we have decided to
allow the loaded code to do (for example perform mathematical
operations); these are functions that the loaded code invokes directly
or indirectly.

The second group of functions in the whitelist are other functions,
which are *not* invoked directly or indirectly by the loaded code, but
that get called nonetheless by the Lua interpreter in the process of
running the loaded code.  These include functions that get invoked
when the loaded code has a runtime error.

For example, if functions A and B are defined as

    function A (x) return B(x) end

    function B (x) return math.random(x) end

...and the loaded code is the string 'print(A(1/0))', then `print`,
`A`, and integer division are being *directly* invoked by the loaded
code, while `B` and `math.random` are being *indirectly* invoked by
the loaded code.

On the other hand, the "unnamed code X" responsible for catching the
ensuing error

    bad argument #1 to 'random' (number has no integer representation)

...and printing the useful error message for it is not being invoked,
either directly or indirectly (at least according to the way I'm
defining these terms here), by the loaded code.  Nevertheless, this
"unnamed code X" is carrying out Lua's desired behavior, and should
not be disallowed by the sandbox.  In other words, the functions in
"unnamed code X" should be in included in the whitelist.

My only point is that, in order to put the whitelisting idea into
practice, one needs to be able to include in the whitelist functions
belonging to either of the two categories described above.

Currently I know how to include functions of the first category in the
whitelist, not those of the second category.  Maybe the
whitelist-based sandbox idea can only be implemented properly in C?

kj

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Coda Highland
In reply to this post by Kynn Jones


On Wed, Aug 7, 2019 at 3:09 AM Kynn Jones <[hidden email]> wrote:
On Wed, Aug 7, 2019 at 12:22 AM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Kynn Jones once stated:
> Regarding your question, it seems to me that the whitelist should
> include (a) functions that the loaded code invokes, directly or
> indirectly; (b) functions that get called by the interpreter in the
> process of running the loaded code (e.g. functions that intercept
> errors in the loaded code).

Let me elaborate on this a bit.  What I meant to say is that the
functions included in the whitelist can be classified into two groups.

The first group consists of functions that we have included in the
whitelist because we consider them integral to what we have decided to
allow the loaded code to do (for example perform mathematical
operations); these are functions that the loaded code invokes directly
or indirectly.

The second group of functions in the whitelist are other functions,
which are *not* invoked directly or indirectly by the loaded code, but
that get called nonetheless by the Lua interpreter in the process of
running the loaded code.  These include functions that get invoked
when the loaded code has a runtime error.

For example, if functions A and B are defined as

    function A (x) return B(x) end

    function B (x) return math.random(x) end

...and the loaded code is the string 'print(A(1/0))', then `print`,
`A`, and integer division are being *directly* invoked by the loaded
code, while `B` and `math.random` are being *indirectly* invoked by
the loaded code.

On the other hand, the "unnamed code X" responsible for catching the
ensuing error

    bad argument #1 to 'random' (number has no integer representation)

...and printing the useful error message for it is not being invoked,
either directly or indirectly (at least according to the way I'm
defining these terms here), by the loaded code.  Nevertheless, this
"unnamed code X" is carrying out Lua's desired behavior, and should
not be disallowed by the sandbox.  In other words, the functions in
"unnamed code X" should be in included in the whitelist.

My only point is that, in order to put the whitelisting idea into
practice, one needs to be able to include in the whitelist functions
belonging to either of the two categories described above.

Currently I know how to include functions of the first category in the
whitelist, not those of the second category.  Maybe the
whitelist-based sandbox idea can only be implemented properly in C?

kj


I'm pretty sure we get what you're trying to do, and why you're trying to do it. We're trying to say you shouldn't be doing that. You don't NEED to do it.

If there's no way for the user code to invoke the function in the first place, why do you need to be so aggressive about detecting it? Moreover, with your scheme, it means that users can't even define functions for their own use, even if it's just "function double(x) return 2 * x end".

/s/ Adam

 
Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Sean Conner
In reply to this post by Kynn Jones
It was thus said that the Great Kynn Jones once stated:

> On Wed, Aug 7, 2019 at 12:22 AM Sean Conner <[hidden email]> wrote:
>
> > It was thus said that the Great Kynn Jones once stated:
> > > Regarding your question, it seems to me that the whitelist should
> > > include (a) functions that the loaded code invokes, directly or
> > > indirectly; (b) functions that get called by the interpreter in the
> > > process of running the loaded code (e.g. functions that intercept
> > > errors in the loaded code).
> >
>
> Let me elaborate on this a bit.  What I meant to say is that the
> functions included in the whitelist can be classified into two groups.
>
> The first group consists of functions that we have included in the
> whitelist because we consider them integral to what we have decided to
> allow the loaded code to do (for example perform mathematical
> operations); these are functions that the loaded code invokes directly
> or indirectly.
>
> The second group of functions in the whitelist are other functions,
> which are *not* invoked directly or indirectly by the loaded code, but
> that get called nonetheless by the Lua interpreter in the process of
> running the loaded code.  These include functions that get invoked
> when the loaded code has a runtime error.

  First thing to understand are scopes---there are three concepts here,
global, local and upvalues.  

        a = 1
        local b = 2

        local function f(c)
          local d = a + b + c
          return d
        end

Here, a is a global, b is local.  By declaring function f() (which itself is
local), we are creating a new scope for the locals c (even though it's a
parameter, it's still a local) and d.  From the point of view of d, a is
still global, c is a local, and b is an upvalue.  An upvalue is a local
variable from an outer scope.  

> For example, if functions A and B are defined as
>
>     function A (x) return B(x) end
>
>     function B (x) return math.random(x) end

  I'm going to change things up a bit here.

        local B(x) return math.random(x) end
        local A(x) return B(x) end

        local test = [[print(A(1/0))]]

test will contain the code we're going to load and execute, and to do so, it
needs a reference to print() and A().  This we can do:

        local env  = { print = print , A = A }
        local f    = load(test,"=(load)","t",env)
        f()

        stdin:2: bad argument #1 to 'random' (number has no integer representation)
        stack traceback:
                [C]: in function 'math.random'
                stdin:2: in function <stdin:2>
                (...tail calls...)
                (load):1: in local 'f'
                stdin:7: in main chunk
                [C]: in ?

  This is as expected.  Notice how we have given a limited "global"
environment to the loaded code that consists of print() and A().  print() is
a Lua function written in C, and it has access to everything it needs.  With
respect to A(), B() is an upvalue, and A() has a reference to it, so no need
to include B() in the limited "global" environment.

  Now B().  In B(), math is a global and now I have to go into a slight
digression about how globals work in Lua.  Each function has an implicit
upvalue to a table of global variables.  It is defined by the Lua system on
behalf of functions.  It has a name (_ENV) but it is not necessarily the
first upvalue, nor is it the first upvalue that's a table.  So B() has
access to math via its implicit upvalue for _ENV---it's global state.
Upvalues can be shared among functions:

        local function C(x) return math.sin(x * 2) end
        local function D(x) return math.cos(x / 2) end

  Here, both C() and D() share the same _ENV upvalue (normally, all
functions share the same _ENV upvalue, unless you overwrite it).  So A() has
a reference to B(), and B() has a reference to math, and everything works
out fine.  If, instead of giving the chunk a global environment with print()
and A(), you give it an empty global environment, you'll see:

        (load):1: attempt to call a nil value (global 'A')
        stack traceback:
                (load):1: in local 'f'
                stdin:6: in main chunk
                [C]: in ?

  If A() and B() were global variables, then you would need to include both
A() and B() in the limited "global" environment for the code to work.  And
for the following chunk of code:

        test = [[
        local function B(x) return math.random(x) end
        local function A(x) return B(x) end
       
        print(A(1/0))
        ]]

  You'll need print() and math in the limited "global" environment for this
to work:

        local env = { print = print , math = math }
        local f   = load(test,"=(load)","t",env)
        f()

        (load):1: bad argument #1 to 'random' (number has no integer representation)
        stack traceback:
                [C]: in function 'math.random'
                (load):1: in function <(load):1>
                (...tail calls...)
                (load):4: in local 'f'
                stdin:10: in main chunk
                [C]: in ?

> ...and the loaded code is the string 'print(A(1/0))', then `print`,
> `A`, and integer division are being *directly* invoked by the loaded
> code, while `B` and `math.random` are being *indirectly* invoked by
> the loaded code.
>
> On the other hand, the "unnamed code X" responsible for catching the
> ensuing error
>
>     bad argument #1 to 'random' (number has no integer representation)
>
> ...and printing the useful error message for it is not being invoked,

 Which happens.  Try running the code presented.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Kynn Jones
Hi Sean,

Thank you for the detailed explanation.

I think that the disconnect here is that I am trying (**really,
really, really hard**) to implement what Roberto Ierusalimschy
recommends in chapter 25 (pp. 261-264) of *Programming in Lua*, 4th
ed., while you (and Coda Highland) are of the opinion (I think) that
such a strategy is unnecessary.  (Apologies if I misunderstood.)

In other words, I am not trying to solve any practical problem; I am
only trying to understand the book's approach to sandboxing.  At the
moment, I think that I still don't understand it (because I cannot put
it into practice satisfactorily).

kj

Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Sean Conner
It was thus said that the Great Kynn Jones once stated:
> Hi Sean,
>
> Thank you for the detailed explanation.

  You're welcome.

> I think that the disconnect here is that I am trying (**really,
> really, really hard**) to implement what Roberto Ierusalimschy
> recommends in chapter 25 (pp. 261-264) of *Programming in Lua*, 4th
> ed., while you (and Coda Highland) are of the opinion (I think) that
> such a strategy is unnecessary.  (Apologies if I misunderstood.)

  A sandbox is just a limited environment in which you run a program.  For
Lua, this can be anything from an environment devoid of *any* prewritten
functions to one with most functions available.  Here's a table of Lua
functions that will still be useful, but limits the program from creating
new files, excecuting other programs, or accessing some low level Lua
functions:

        newenv =
        {
          -- include these functions
         
          _VERSION = _VERSION,
          assert   = assert,
          error    = error,
          ipairs   = ipairs,
          pairs    = ipairs,
          next     = next,
          pcall    = pcall,
          print    = print, -- we can write to stdout
          select   = select,
          tonumber = tonumber,
          tostring = tostring,
          type     = type,
          xpcall   = xpcall,
         
          -- these modules in their entirety
         
          math   = math,
          table  = table,
          string = string,
          utf8   = utf8,
         
          -- and these with limited functionality
         
          io =
          {
            stdin  = io.stdin, -- keep this around
            stdout = io.stdout, -- and this
            stderr = io.stderr, -- and this
          },
         
          os =
          {
            clock     = os.clock,
            date      = os.date,
            difftime  = os.difftime,
            exit      = os.exit,
            getnev    = os.getenv,
            setlocale = os.setlocale,
            time      = os.time,
          },
        }
       
        newenv._G = newenv
       
        sandboxcode = loadfile(untrustedcode,'t',newenv)
        sandboxcode()
       
  Not all functions are included, and certainly, arguments could be made for
including stuff I left out, or excluding stuff I left in.  But there's still
a lot of Lua code that could conceivably run with just the functions listed
above.  For instance, I left out require(), but it's easy to provide one
that restricts which modules can be used in the sandboxed code:

        local okaymods =
        {
          ['lpeg']           = true, -- just a sample of useful modules
          ['org.conman.env'] = true,
          ['argparse']       = true,
         
          ['math']           = true, -- some code might require these
          ['table']          = true, -- standard modules, so let them
          ['string']         = true, -- through even though they're
          ['utf8']           = true, -- already available "globally"
        }
       
        function newenv.require(modname)
          if okaymods[modname] then
            return require(modname)
          elseif modname == 'io' then -- in case some code is trying to
            return newenv.io          -- be cute ...
          elseif modname = 'os' then
            return newenv.os
          else
            error(string.format("Module %s not allowed",modname))
          end
        end

> In other words, I am not trying to solve any practical problem; I am
> only trying to understand the book's approach to sandboxing.  At the
> moment, I think that I still don't understand it (because I cannot put
> it into practice satisfactorily).

  I can't help with that, as I do not own the book.  But I am familiar with
the concept of sandboxing, as are some other people on this mailing list.
And depending upon your threat model, sandboxing is pretty easy to very darn
difficult.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Dennis Fischer
In reply to this post by Kynn Jones
Greetings mailing list,

first of all, I'd like to point out that this has already been discussed
(with limited success) on
[stackoverflow](https://stackoverflow.com/questions/57325179/).

Considering all the available information it seems that:

- Whenever an error happens in Lua, some unknown function X is called
internally
- This function X does not appear on the function whitelist, thus
causing another error
- The new error shadows the previous error message, making sandboxed
code near impossible to debug

As OP pointed out in another reply,
the purpose of this exercise seems to be getting a better understanding
of the Lua language,
not to satisfy any real world use-case.

I personally also find this problem very interesting.

Trying to call `error 'foobar'` with the hook set the error changes only
slightly:

"calling bad function ([C]:-1): ?"

Some logical conclusions:

- `error` seems to be written in C; as such we won't be able to just
peek into its upvalues
- X is most likely not a plain C function, but a C closure within the
Lua state, since the hook is called
- X has no meaningful name here because it's called from a C function


Reply | Threaded
Open this post in threaded view
|

Re: On implementing a functions whitelist for a sandbox

Sean Conner
It was thus said that the Great Dennis Fischer once stated:
> Greetings mailing list,
>
> first of all, I'd like to point out that this has already been discussed
> (with limited success) on
> [stackoverflow](https://stackoverflow.com/questions/57325179/).

  Ah, now that I see what code was presented in _Programming in Lua_ for
sandboxing, I can see where the confusion comes froms.  The approach in the
book is *not* one I would take for sandboxing---not only is it slow, but it
can have weird results (although I see it was also attempting to limit the
amount of CPU time the sandboxed code would run, so perhaps that's why that
particular approach was taken [1]).

> Considering all the available information it seems that:
>
> - Whenever an error happens in Lua, some unknown function X is called
> internally

  I was able to track down said function, msghandler() (from lua.c)
but it's not one that is listed as a public API.

> - This function X does not appear on the function whitelist, thus causing
> another error
> - The new error shadows the previous error message, making sandboxed
> code near impossible to debug

  Yup.

> As OP pointed out in another reply, the purpose of this exercise seems to
> be getting a better understanding of the Lua language, not to satisfy any
> real world use-case.
>
> I personally also find this problem very interesting.

  Sandboxing?

> Trying to call `error 'foobar'` with the hook set the error changes only
> slightly:
>
> "calling bad function ([C]:-1): ?"
>
> Some logical conclusions:
>
> - `error` seems to be written in C; as such we won't be able to just
> peek into its upvalues

  Actually, you can with debug.getupvalue() but error() doesn't have any
upvalues.

> - X is most likely not a plain C function, but a C closure within the
> Lua state, since the hook is called
> - X has no meaningful name here because it's called from a C function

  Yup.

  -spc

[1] There are other ways to limit the CPU time, but they tend to be
        system dependent.