Make "package.loaded" weak

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Make "package.loaded" weak

Martin
Hello all,

I offer to make values of table "package.loaded" weak to free
unreachable data at garbage collection cycle.

My usage case is many (over 350) small (about 30 lines) files which
is lua code which typically returns table or function. Conceptual
problem is that result of each of that loaded files occupies
"package.loaded" forever. I think it is not very clean from abstract
point of view.

>From the other side I understand that after making "package.loaded"
weak, multiple loads of the same code may occur. For my usage case
this is not a problem, but you guys may have serious objections
which I'd like to hear.

---
Here is artificial example.

Suppose we have code chunk that creates some big table and uses it
in work:

  local do_work =
    function()
      local result = {}
      for i = 1, 1e6 do result[i] = {} end
      return result
    end

  print(collectgarbage('count'))

  do_work()
  collectgarbage()

  print(collectgarbage('count'))

It does not create memory problems yet.

But now we wish to move do_work() to separate module

  --[[ "garbage_making_module" ]]--
  local do_work =
    function()
      local result = {}
      for i = 1, 1e6 do
        result[i] = {}
      end
      return result
    end

  return do_work()

  --[[ main ]]--
  print(collectgarbage('count'))
  do
    local garbage = require('garbage_making_module')
  end
  collectgarbage()
  print(collectgarbage('count'))

Now no memory is freed!

But if we add

  setmetatable(package.loaded, {__mode = 'v'})

in start of the "main" code, it fixes issue.

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Dirk Laurie-2
2017-02-19 4:24 GMT+02:00 Martin <[hidden email]>:

> For my usage case this is not a problem, but you guys may have
> serious objections which I'd like to hear.

Couldn't you just do this?

   setmetatable(package.loaded,{__mode="v"})

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Rena
In reply to this post by Martin
On Feb 18, 2017 10:24, "Martin" <[hidden email]> wrote:
Hello all,

I offer to make values of table "package.loaded" weak to free
unreachable data at garbage collection cycle.

My usage case is many (over 350) small (about 30 lines) files which
is lua code which typically returns table or function. Conceptual
problem is that result of each of that loaded files occupies
"package.loaded" forever. I think it is not very clean from abstract
point of view.

>From the other side I understand that after making "package.loaded"
weak, multiple loads of the same code may occur. For my usage case
this is not a problem, but you guys may have serious objections
which I'd like to hear.

---
Here is artificial example.

Suppose we have code chunk that creates some big table and uses it
in work:

  local do_work =
    function()
      local result = {}
      for i = 1, 1e6 do result[i] = {} end
      return result
    end

  print(collectgarbage('count'))

  do_work()
  collectgarbage()

  print(collectgarbage('count'))

It does not create memory problems yet.

But now we wish to move do_work() to separate module

  --[[ "garbage_making_module" ]]--
  local do_work =
    function()
      local result = {}
      for i = 1, 1e6 do
        result[i] = {}
      end
      return result
    end

  return do_work()

  --[[ main ]]--
  print(collectgarbage('count'))
  do
    local garbage = require('garbage_making_module')
  end
  collectgarbage()
  print(collectgarbage('count'))

Now no memory is freed!

But if we add

  setmetatable(package.loaded, {__mode = 'v'})

in start of the "main" code, it fixes issue.

-- Martin


This would cause modules to be potentially reloaded every time require() is called, which might cause some interesting issues if a module is doing things like creating files or setting variables in its init.
Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Francisco Olarte
In reply to this post by Martin
On Sun, Feb 19, 2017 at 3:24 AM, Martin <[hidden email]> wrote:
> I offer to make values of table "package.loaded" weak to free
> unreachable data at garbage collection cycle.

I do not think this is nice. In some of my code I explicitly do some
requires for the sole purpose of populating package.loaded, and
sometimes I populate it directly ( although I'm switching to
package.preload which is cleaner ).

> My usage case is many (over 350) small (about 30 lines) files which
> is lua code which typically returns table or function. Conceptual
> problem is that result of each of that loaded files occupies
> "package.loaded" forever. I think it is not very clean from abstract
> point of view.

Your problem seems to be in the design of your code. If you do not
know the required lifetime of the result of the code, return a
function which calculates it.

> >From the other side I understand that after making "package.loaded"
> weak, multiple loads of the same code may occur. For my usage case
> this is not a problem, but you guys may have serious objections
> which I'd like to hear.

I would not like lua recompiling my modules each time I require them.
I think your very peculiar case is better solved by nil-ing
package.loaded when needed.

> Here is artificial example.

Certainly artificial and purposely built to try to make a point.

> Suppose we have code chunk that creates some big table and uses it
> in work:
>   local do_work =
>     function()
>       local result = {}
>       for i = 1, 1e6 do result[i] = {} end
>       return result
>     end

FALSE, it creates (uses it in work ) AND returns a big table. Using it
to work will be something like an array used in sieving for primes,
which gets discarded when the primes list is returned.

>   print(collectgarbage('count'))
>   do_work()

So you make a function which sole purpose is calculating a big array
and then discard its result?

>   collectgarbage()
>
>   print(collectgarbage('count'))
>
> It does not create memory problems yet.



> But now we wish to move do_work() to separate module
>   --[[ "garbage_making_module" ]]--
>   local do_work =
>     function()
>       local result = {}
>       for i = 1, 1e6 do
>         result[i] = {}
>       end
>       return result
>     end
>
>   return do_work()

BAD dessign, you shall return do_work, so caller chooses calling time
( and wheter to keep the result as in the example above )

>   --[[ main ]]--
>   print(collectgarbage('count'))
>   do
>     local garbage = require('garbage_making_module')

And here do require(...)(), call require return value.

>   end
>   collectgarbage()
>   print(collectgarbage('count'))

And the result would be the same.

> Now no memory is freed!

Neither would be if you had stored do_work() result above. I do not
think this code proves any potential problem with require. You may
have that pattern in some code due to technical debt, or impositions
from above or whichever, but I do not think modifying package.loaded
is the way to go. Where I forced to use some untouchable modules
writen as above I would probably do a weak_require wrapper to load it
without caching, seemd easy to do by just requiring and erasing from
p.l before return or just using package.searchers.

Summarising, I do not think making package.loaded weak is a good idea,
and I think your example does not merit it. Maybe I could be convinced
by a better example.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Martin
On 02/18/2017 09:29 AM, Francisco Olarte wrote:
> Maybe I could be convinced
> by a better example.

Thank you for analysis and reply!

Let's try more "real" example.

Suppose we're implementing CRC calculation algorithm with predefined
polynomial (0xEDB88320). By trading space for time calculation may be
sped up by using precalculated values for any given byte of input
(256 values), or two bytes of input (65536 values) and so forth.

Say I do not wish to store 65536 elements table in same source
file with calculating function (which is near dozen lines). And
also I do not wish to calculate this table at load time (as it
is constant and may be precalculated and saved as separate file
by stand-alone code). Moreover, it can be saved as lua code
"return {<crc-values>}".

So in CRC calculation routine I just wish to say something like
"local crc_table = require('CRC_2bytes_EDB88320')" and then start
xor-ing input with it.

Yes, I may nullify "package.loaded" record for that module after
work but it seems like struggling with consequences, not eliminating
reason.

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Sean Conner
It was thus said that the Great Martin once stated:

> On 02/18/2017 09:29 AM, Francisco Olarte wrote:
> > Maybe I could be convinced
> > by a better example.
>
> Thank you for analysis and reply!
>
> Let's try more "real" example.
>
> Suppose we're implementing CRC calculation algorithm with predefined
> polynomial (0xEDB88320). By trading space for time calculation may be
> sped up by using precalculated values for any given byte of input
> (256 values), or two bytes of input (65536 values) and so forth.

  First off, I would measure the time difference between a 256 entry table
and a 65536 entry table.  At work, we use this CRC (I think that's the
polynomial used---the code was writeen some seven years ago and hasn't been
touched since) with 256 precalculated values and never had an issue with
respect to performance [1].

> Say I do not wish to store 65536 elements table in same source
> file with calculating function (which is near dozen lines). And
> also I do not wish to calculate this table at load time (as it
> is constant and may be precalculated and saved as separate file
> by stand-alone code). Moreover, it can be saved as lua code
> "return {<crc-values>}".
>
> So in CRC calculation routine I just wish to say something like
> "local crc_table = require('CRC_2bytes_EDB88320')" and then start
> xor-ing input with it.
>
> Yes, I may nullify "package.loaded" record for that module after
> work but it seems like struggling with consequences, not eliminating
> reason.

  So ... you do a one time calculation (or few) and that's it?  Once?  No
more during the lifetime of the program?  Seems like an odd case (or another
weak example).  Because if that's the case, then I would do something like:

        local get_crc = require "get_crc"

        local crc_table = get_crc("CRC_2bytes_CRC_2bytes_EDB88320")

  And get_crc() will load up the precalucated table (say, with loadfile())
and return it.  So you could, for example, later on do:

        local crc_table = get_crc("CRC_2bytes_CRC_2bytes_12345678")
        -- not sure if that's a good CRC or not, it's just an example

  Here, let me try for a better argument---I want to use a single function
out of some larger package (say, on the size of PenLight [2]).  I like the
idea of using a tested and maintained function, but *not* of having to load
a ton of unused functions.  This not only wastes memory needlessly, but also
leaves a larger attack surface for potential exploits [3].  Weakly loading
the larger module means that all the unused portions of the large package
will go away, saving memory and reducing the attack surface of the program.

  -spc

[1] Our code is in the call path for one of the Monopolistic Phone
        Companies here in the US.  The protocols used for the backend use
        this CRC code.  

[2] But PenLight allows you to load smaller subsets of itself.

[3] How?  I don't know, but those crackers are nothing if persistent.
        Non-executable stacks?  Okay, we'll use ROP.  Oh, randomize all
        function addresses?  We can work around that too (in JavaScript no
        less [4]).

[4] https://www.vusec.net/projects/anc/

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Jerome Vuarand
In reply to this post by Martin
2017-02-19 7:03 GMT+00:00 Martin <[hidden email]>:
> So in CRC calculation routine I just wish to say something like
> "local crc_table = require('CRC_2bytes_EDB88320')" and then start
> xor-ing input with it.
>
> Yes, I may nullify "package.loaded" record for that module after
> work but it seems like struggling with consequences, not eliminating
> reason.

If you weren't using require in the first place, you wouldn't have to
struggle with consequences. Just use dofile+searchpath. The only
difference with require is that require does caching, which apparently
you don't want.

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Daurnimator
In reply to this post by Martin
On 19 February 2017 at 13:24, Martin <[hidden email]> wrote:
> I offer to make values of table "package.loaded" weak to free
> unreachable data at garbage collection cycle.
... snip ...
>   setmetatable(package.loaded, {__mode = 'v'})

I like the idea; though there is the relatively major issue of modules
that don't like being reloaded.

Infact, *most* C modules don't cope well with being reloaded in the
general case.
e.g. a module that registers a metatable with `luaL_setmetatable`
inside of it's `luaopen_` function.
On a reload, this would replace the current metatable in the registry
with a new one.
This would mean that object instances created before the module was
reloaded would not pass luaL_checkudata, even for their own methods!

This fact means that the idea shouldn't be implemented with the
current status-quo for C modules.
But it does bring up the issue I mentioned above, and there *are* ways
to fix it going forward:
Instead of using luaL_checkudata, you can store the metatable as an
upvalue for your C functions, and check that, this has the added
benefit of being *faster*, it also has the benefit of not requiring an
allocation.
A few existing modules do this, e.g. cqueues:
https://github.com/wahern/cqueues/blob/00ad5b9425ba863ed74f3bcbea2049eedcb6c53b/src/cqueues.h#L313

I wonder if future lua versions could encourage this pattern with the
right helpers e.g.
  - A luaL_newlib-like that has an #upvalues argument (like
luaL_setfuncs already does)
  - A luaL_testudata-like that takes an upvalue index to use instead
of a registry key

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Martin
In reply to this post by Sean Conner
On 02/18/2017 01:50 PM, Sean Conner wrote:
>   So ... you do a one time calculation (or few) and that's it?  Once?  No
> more during the lifetime of the program?  Seems like an odd case (or
another
> weak example).  Because if that's the case, then I would do something
like:
>
>   local get_crc = require "get_crc"
>
>   local crc_table = get_crc("CRC_2bytes_EDB88320")
>
>   And get_crc() will load up the precalucated table (say, with loadfile())
> and return it.  So you could, for example, later on do:
>
>   local crc_table = get_crc("CRC_2bytes_12345678")

Yes, this is reliable design solution. More rugged than flexible.
But what if I wish maximum flexibility Lua can offer? CRC calculation
loop by nature dont care about contents of CRC-table. It only has to
be indexable and contain xor-able values.

So my intention for this abstract example is to provide CRC-table
as independent argument (of class field).

  local crc
  do
    local custom_tbl = {<values created by hand>}
    crc = require('get_crc')(custom_tbl)
  end
  print('crc', crc)

But code with mix of constants and logic for me looks ugly and
unflexible. So my intention is to separate constants. And moving
constants to separate module looks natural idea. But then memory
pollution starts.


Probably I cannot afford another more "strong" example.

For me even a one unneeded record in memory is a reason to think how
it occurred there. Why modules that was called in load stage of
code execution (say some format parser intrinsics) still in memory
in save stage? Example with CRC table pollution may be avoided
by design or by implementation. But my intention is not to force
some changes by providing unsolvable case but only point to and
discuss some topics.


On 02/18/2017 01:50 PM, Sean Conner wrote:
>  those crackers are nothing if persistent.
>   Non-executable stacks?  Okay, we'll use ROP.  Oh, randomize all
>   function addresses?  We can work around that too (in JavaScript no
>   less [4]).

They don't create problems, they demonstrate it.

--

I'll try to summarize my views and recommendations received:


Why use "require()" not "dofile()", "load()" or "io.read()"?

  1. With "require()" you do not have to think about paths for
    "dofile()" or string source for "load()" or create custom format
    loader for "io.read()".
  2. "require()" may (or must?) return cache of loaded data for
    repeated calls. This eliminates redundant file I/O operations.

Why not nil "package.loaded"?

  1. Appropriate solution if implement it as "weak_require()" wrapper,
    as Francisco recommended.

Why not make "package.loaded" weak?

  1. Daurminator mentioned possible problems with C modules.
  2. Rena mentioned possible problems with modules creating files
    or setting global variables at load.

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Daurnimator
In reply to this post by Daurnimator
On 19 February 2017 at 18:06, Daurnimator <[hidden email]> wrote:
> e.g. a module that registers a metatable with `luaL_setmetatable`
> inside of it's `luaopen_` function.

Oops, I meant luaL_newmetatable here.

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Daurnimator
On 19 February 2017 at 20:09, Daurnimator <[hidden email]> wrote:
> On 19 February 2017 at 18:06, Daurnimator <[hidden email]> wrote:
>> e.g. a module that registers a metatable with `luaL_setmetatable`
>> inside of it's `luaopen_` function.
>
> Oops, I meant luaL_newmetatable here.

And now reading the docs
(http://www.lua.org/manual/5.3/manual.html#luaL_newmetatable) I
noticed that luaL_newmetatable *doesn't* do anything if the key
already exists.
This is slightly better for the reloading situation: it means that
things won't break as I mentioned earlier with luaL_checkudata.

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

Francisco Olarte
In reply to this post by Martin
On Sun, Feb 19, 2017 at 8:03 AM, Martin <[hidden email]> wrote:
> On 02/18/2017 09:29 AM, Francisco Olarte wrote:
>> Maybe I could be convinced by a better example.
> Thank you for analysis and reply!

"A mandar!", that's what we're here for.


> Let's try more "real" example.
... <aside>I have not timed lua, but in every code I've used to do
CRC's calculating
the table is way faster than mmaping ( and do not even try loading )
it from disk </aside>
....
> So in CRC calculation routine I just wish to say something like
> "local crc_table = require('CRC_2bytes_EDB88320')" and then start
> xor-ing input with it.

This is more convincing. But you realize the module which *uses*
crc_table will tie it in memory? So either you have your routine
inlined and you nullify it or, if you are requiring the crc routines,
you provide code to init the table and re-read it on demmand?

A 64k entries data table is not too much in C, it may be way bigger in
lua, although there are string tricks to alleviate it. But I see in
similar algorithms with much better tables you can get much bigger
usage.

> Yes, I may nullify "package.loaded" record for that module after
> work but it seems like struggling with consequences, not eliminating
> reason.

But there is a reason to make require cache its contents, otherwise
the Lua team would have opted for the easier route of just loading the
code from require each time. A couple of my usages come to mind. One
is a pretty printing function, huge, which I only use in debug runs.
As such when I log I just do <require
"debug_dumper".dump(data_value)"in debug builds, inside the innermost
scope, to avoid loading until needed. But when needed I do not want to
have it garbage-collected, I like to know it's cached.

All in all I think making package.loaded weak-refed will not be good,
as it would change the semantics of the language ( and it can make
programs fail. Currently you can require a lot of modules and chroot,
your approach will make programs doing that fail ). I think it would
be a much useful approach to split require in two parts. My C is
really rusty, but looking at [hidden email] it seems the
searcher exercising code can easily by split to a separate function
and then used to build an ll_require_load or whichever function which
could be exported for cases like yours, I want a function and I want
to find it using the same mechanisms as require ( which the added
bonus that having the result as a function opens the way for some
interesting usages ).

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: Make "package.loaded" weak

William Ahern
In reply to this post by Daurnimator
On Sun, Feb 19, 2017 at 08:12:39PM +1100, Daurnimator wrote:

> On 19 February 2017 at 20:09, Daurnimator <[hidden email]> wrote:
> > On 19 February 2017 at 18:06, Daurnimator <[hidden email]> wrote:
> >> e.g. a module that registers a metatable with `luaL_setmetatable`
> >> inside of it's `luaopen_` function.
> >
> > Oops, I meant luaL_newmetatable here.
>
> And now reading the docs
> (http://www.lua.org/manual/5.3/manual.html#luaL_newmetatable) I
> noticed that luaL_newmetatable *doesn't* do anything if the key
> already exists.
> This is slightly better for the reloading situation: it means that
> things won't break as I mentioned earlier with luaL_checkudata.

IIRC, the problem with cqueues that you ran into before was that some
cqueues Lua script modules interpose C-based methods to provide enhanced
behavior (particularly yielding for Lua 5.1). That interposition caused
headaches when reloading because when the metatable already existed the
internal routine short-circuited. The cqueues' metatable loading code did

  if (luaL_newmetatable()) {
    luaL_setfuncs();
  }

instead of

  luaL_newmetatable();
  luaL_setfuncs();

However, the Lua script-based code unconditionally interposed methods,

  local _accept; _accept = socket.interpose("accept", function ()
     _accept()
  end)

On every reload (which in your case was part of some testing framework,
IIRC) the interposition stack kept growing, sometimes resulting in odd or
broken behavior. (No infinite loops; I think side effects sometimes
conflicted.)

The short-term fix was to unconditionally reload C routines into metatables
in the C code, just as the related Lua code unconditionally interposed
methods. Long-term fix requires, IMO, rethinking some ancillary issues, not
just namespacing and interposition.

The reason I originally short-circuited was because of metatable
cross-dependencies. Particularly for the optimization of caching a metatable
as an upvalue, a submodule might need to initialize the metatable for a
related submodule at load-time. Rather than loading the submodule outright,
which is awkard from C (especially before Lua 5.2's luaL_requiref, but also
if require is overloaded to do network I/O or otherwise yield), I just
called the internal C routine that specifically loaded the particular
metatable. That routine, which might be invoked indirectly many times, was
written to be a noop after the first invocation. The design of
luaL_newmetatable seems to have contemplated this conditional behavior.

Side note on reloading: I originally wrote cqueues modules to be able to
reload themselves without having access to the original .lua file.

  -- foo.lua
  local loader = function(loader, ...)
    local foo = {} -- foo module
    foo.loader = loader
    return foo
  end
  return loader(loader, ...)

I did this so that if an application wanted to create new Lua VMs after
sandboxing itself (e.g. after entering a chroot jail), then foo.loader could
be serialized and copied into the new VM.

  local loader_s = string.dump(foo.loader)
  ... -- create new Lua VM and copy serialized code
  local loader_f = loadstring(loader_s)
  package.loaded["foo"] = loader_f(loader_f, "foo")

C-based modules would already be available inside the sandbox as POSIX
requires that implememtations cache and index shared libraries by file path,
and the implementations I tested didn't care whether the file remained
accessible or not.

The benefit of this approach was that it worked without relying on or
modifying the application environment; it's entirely self-contained. The
disadvantage is that it's confusing and only works for modules that follow
the pattern--i.e. basically only internal modules.