Pooling of strings is good

classic Classic list List threaded Threaded
103 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

steve donovan
On Sat, Aug 23, 2014 at 9:47 AM, Coroutines <[hidden email]> wrote:
> All I'm thinking about is things like this:
> string.sub(io.stdout, 1) -- dumping the lua_Stream struct into a Lua string

Ah, but then you're depending on the hidden binary representation of
userdata. In this case, files are represented differently in Lua 5.1,
5.2 and LuaJIT 2 (which is why using Lua file objects from C is a
delicate business). I think this encapsulation is a feature, not a
bug;  otherwise the representation cannot change or evolve since
client code has come to depend on particular byte layouts.

> On another side-topic: It makes me really unhappy that
> lua_lock/unlock() are macros and not dummy functions.

Au contraire, it makes me happy, because then Lua is not slowed down
by having to call those dummies.  What is so hard about recompiling
Lua?  Granted, one cannot then do one's favourite tricks with the
stock Lua interpreter, but excessive flexibility hurts general
performance.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Jay Carlson
In reply to this post by Coroutines


On Aug 22, 2014 7:13 AM, "Coroutines" <[hidden email]> wrote:
>
> On Fri, Aug 22, 2014 at 3:59 AM, Jay Carlson <[hidden email]> wrote:
> > Mutable strings are...a folly of a quaint, bygone era. They are warts on
> > otherwise timeless designs like Scheme and Smalltalk.
> >
> > Get a mutable octet-vector object type. You'll be much happier.
>
> Again I get this feeling like people love to speak in generalizations
> on this list :\ 

> Perhaps you know this:

Yes. "I am aware of all Internet traditions."

> In networking/socket stuff
> it's useful to reuse a buffer if you know you're receiving frames of a
> fixed size, packets of a fixed size

Yes. It is also a relatively common error to reuse bufs/pool members before all references to them have dropped.

> I work
> with other libraries that expect strings

...and may break with a mutable string type.

Changing Lua 6.0 to include a mutable string type would in turn force those libraries to change.

> Strings are a thin abstraction over userdata

No. Strings have value semantics. From a semantic point of view they are not allocated or deallocated. Nobody owns them.

IIRC a long time ago (4.0), userdata acted more like strings. You could not distinguish two userdatas if the C contents were == at the C level; however, userdata was a void*, not a block of memory.

> If I
> wanted to minimally change Lua how it is now, I'd make it so userdata
> can be used in the same places strings can.

Any userdata? Or just a specific mutable octet vector type?

Not my problem; go for it!

Jay

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Javier Guerra Giraldez
In reply to this post by Coroutines
On Sat, Aug 23, 2014 at 12:59 AM, Coroutines <[hidden email]> wrote:
> So then you know what a pain it is to not be able to represent a
> simple ring buffer, or provide DMA access to the NIC's ring buffer
> hosted through userdata to Lua


hey, i do that!

using LuaJIT's FFI, of course, which isn't exactly userdata... but
when i've used any buffer-like userdata object, it's trivially easy to
add a :substr([i,[j]]) method and a :setslice(s[,i[,j]]) too, if i
want it to be mutable.  the [,j] optional argument only goes there if
it's legal to shift content around, obviously.

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Tim Hill
In reply to this post by Sean Conner

On Aug 23, 2014, at 12:11 AM, Sean Conner <[hidden email]> wrote:


I wish userdata could be used in the same
place a string is expected.  I would love an "invisible" memcmp() if
it's a userdata-string comparison (==) or to be able to directly use
the string.*() functions on them.


One of the basic goals of userdata is that it NOT be mutable (or even viewable) from Lua. The contract is that C code can let Lua act as custodian for some state which it knows Lua *cannot* access; it is opaque to Lua by design. Of course, C code can expose the data indirectly, but that is the business of the C code.

—Tim



Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
On Sat, Aug 23, 2014 at 9:58 AM, Tim Hill <[hidden email]> wrote:

> One of the basic goals of userdata is that it NOT be mutable (or even
> viewable) from Lua. The contract is that C code can let Lua act as custodian
> for some state which it knows Lua *cannot* access; it is opaque to Lua by
> design. Of course, C code can expose the data indirectly, but that is the
> business of the C code.

I would just like to [politely] point out that I stated earlier in
this chain (or the other thread?) that I realize userdata is used for
encapsulation in many ways.  I recognize that it would be an
insecurity to expose the contents of every userdata, so I wish for a
way to control this through the debug interface -- I don't know what
that would look like.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Jay Carlson
On Sat, Aug 23, 2014 at 4:28 AM, Jay Carlson <[hidden email]> wrote:

> Yes. It is also a relatively common error to reuse bufs/pool members before
> all references to them have dropped.

This is one of those "I know better than you -- we should avoid the
potential for users to screw themselves." retorts.

Why do you care if I shoot my foot?  It's not yours :p

> Changing Lua 6.0 to include a mutable string type would in turn force those
> libraries to change.

Oh god, not change!  I can't haaaandle the change!  -- no but really I
was entertaining the idea that this could be a transparent change.

>> Strings are a thin abstraction over userdata
>
> No. Strings have value semantics. From a semantic point of view they are not
> allocated or deallocated. Nobody owns them.

I shall take a step back here -- I was incorrect when I said that.

> Any userdata? Or just a specific mutable octet vector type?

I would like the option of peering into and modifying any userdata
from Lua as if it were a string, with the string functions.  This
would break security so I'd want to control this ability through the
debug library.

> Not my problem; go for it!

Well that's a semi-polite way to say go f#ck yourself :p  You're not
wrong though.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by steve donovan
On Sat, Aug 23, 2014 at 1:11 AM, steve donovan
<[hidden email]> wrote:

> Ah, but then you're depending on the hidden binary representation of
> userdata. In this case, files are represented differently in Lua 5.1,
> 5.2 and LuaJIT 2 (which is why using Lua file objects from C is a
> delicate business). I think this encapsulation is a feature, not a
> bug;  otherwise the representation cannot change or evolve since
> client code has come to depend on particular byte layouts.

I understand that platform and architectural differences exist, I
simply want the freedom to shoot myself in the foot without touching C
:p

PS: dumping the FILE* struct is just an example, I'd want it to be
that easy -- I have other userdata I'd like to introspect that would
go much faster in Lua if they could be treated like strings (with a
debug library switch for security reasons).

> Au contraire, it makes me happy, because then Lua is not slowed down
> by having to call those dummies.  What is so hard about recompiling
> Lua?  Granted, one cannot then do one's favourite tricks with the
> stock Lua interpreter, but excessive flexibility hurts general
> performance.

One cannot package a threading library that uses LD_PRELOAD against
the distributed stock Lua -- one has to convince the user to recompile
their Lua :p  One is quite sad.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Javier Guerra Giraldez
On Sat, Aug 23, 2014 at 8:31 AM, Javier Guerra Giraldez
<[hidden email]> wrote:

> using LuaJIT's FFI, of course, which isn't exactly userdata... but
> when i've used any buffer-like userdata object, it's trivially easy to
> add a :substr([i,[j]]) method and a :setslice(s[,i[,j]]) too, if i
> want it to be mutable.  the [,j] optional argument only goes there if
> it's legal to shift content around, obviously.

Ahhh I wish I could enjoy luajit, most of my friends are on "stock
Lua" though :(  I really need to take a look at the ffi alternatives
:3

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Vadi
In reply to this post by Coroutines
I'd just like to throw in a point that Lua is used by a lot of non-professional programmers in various environments. For them, good defaults which achieve performance are pretty important, such as pooling of strings by default.
Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
On Sun, Aug 24, 2014 at 3:04 PM, Vadim Peretokin <[hidden email]> wrote:
> I'd just like to throw in a point that Lua is used by a lot of
> non-professional programmers in various environments. For them, good
> defaults which achieve performance are pretty important, such as pooling of
> strings by default.

My position has changed as this thread as continued -- I am not asking
for a default.  I've been thinking of functions I might add to the
debug library to read userdata, but I cannot think of a way to trick
the string library functions to think they are operating on strings
(rather than userdata).  Currently the only method I have is to
promote the userdata into a string, and this double-allocation pains
me.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Tim Hill
In reply to this post by Coroutines

On Aug 23, 2014, at 12:47 AM, Coroutines <[hidden email]> wrote:

>
> I'm saying I want to be able to write 'cat' == io.stdout and have it
> do a memcmp() of their contents (the data section after the header).
>

So override __tostring() in your userdata metatable. And if it’s a userdata supplied by someone else then they probably don’t want you messing around inside it. There is good reason for this; userdata is MEANT to be opaque to Lua code.

—Tim



Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Sean Conner
In reply to this post by Coroutines
It was thus said that the Great Coroutines once stated:
> On Sat, Aug 23, 2014 at 4:28 AM, Jay Carlson <[hidden email]> wrote:
>
> > Yes. It is also a relatively common error to reuse bufs/pool members before
> > all references to them have dropped.
>
> This is one of those "I know better than you -- we should avoid the
> potential for users to screw themselves." retorts.
>
> Why do you care if I shoot my foot?  It's not yours :p

  Reusing a buffer was the cause of the recent Heartbleed bug that affected
OpenSSL and just about everything that relied upon it.

  You want to shoot your own foot off?  Have at it.  But hopefully shooting
off your foot won't lead to much collateral damage.

> > Any userdata? Or just a specific mutable octet vector type?
>
> I would like the option of peering into and modifying any userdata
> from Lua as if it were a string, with the string functions.  This
> would break security so I'd want to control this ability through the
> debug library.

  Okay, so you can interate over the actual bytes that make up a userdata.
Fine.  But what, exactly, are you attempting to do, other than "because I
can"?  

  Fine, let's say you can magically do this:

        x = io.stdout:sub(1,4)

  What do you have?  On a 32-bit system, you have the raw FILE * that you
really can't do anything with from Lua, and modifying the raw bytes of the
userdata would cause problems.  There is no facility to follow such a
pointer from Lua, and even if there were, again, what's the point?  You have
to know the structure layout at such a low level that any code you write is
not really portable in any meaningful sense.  And on a 64-bit system, you
have half a FILE * pointer, but unless you know the platform, you can't know
if it's the upper half (big endian system) or lower half (little endian
system).

  The issue I have with this (and I can't speak for others) is that it
doesn't really *buy* you anthying but pointless complexity.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Javier Guerra Giraldez
On Sun, Aug 24, 2014 at 9:25 PM, Sean Conner <[hidden email]> wrote:
>   Reusing a buffer was the cause of the recent Heartbleed bug that affected
> OpenSSL and just about everything that relied upon it.


I think it was more like failing to sanitize an input (http://xkcd.com/1354/)

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Sean Conner
In reply to this post by Coroutines
It was thus said that the Great Coroutines once stated:
> On Sat, Aug 23, 2014 at 12:11 AM, Sean Conner <[hidden email]> wrote:
>
> >   It doesn't bother me, because the convenience it affords is worth the
> > price (in my opinion).
>
> What convenience?  I'm saying the interface doesn't exist to make
> mutable data (userdata) as accessible as immutable, pooled strings.

  The convenience of string manipulations.  
 
> >   I swear, the way you are talking, it's as if you equate "userdata" with
> > "Lua's internal string representation" and it jars me every time.  I have
> > plenty of userdata types that have nothing to do with strings; where
> > "userdata == string" just does not make any semamtic sense what-so-ever [1].
>
> I should word this more carefully: I mean to say that userdata and
> strings (the internal structures and how they are handled in Lua) are
> very similar.

  And in C, structures and strings are similar as well (a continuous
sequence of bytes) but heaven help you if you mix the two up.

> >   I use 22 userdata types at work (just counted).  It's not a small concern.
>
> I mean to say -- how  many of those are you defining personally?  

  22.  And I now realized I forgot a few from some third party code I'm also
using.  So, say, 25 or 26.

> How
> many are brought in from third-parties?  I don't think you'd have much
> problems deciding the names for each..

  No, for the ones I defined, the names are based off the module they're in,
so I have (for example):

        org.conman.net:addr
        org.conman.net:sock
        org.conman.signal:sigset
        org.conman.fsys:dir
        org.conman.fsys:expand

  It doesn't bother me that they're long, because 1) they're #defined in the
C code and 2) that's the only place they're exposed (except you can see them
in the registry with debug.getregistry() from Lua).

> Related: It would be nice to be able to change the typename at runtime
> (even so that checkudata passes) -- without having to recompile the
> third-party library that defines the userdata type.

  Good lord!  Is there nothing you don't want to change?  Yes, I realize
that every computer science problem can be solved by another layer of
indirection [1] but it does slow things down.

> >   I never said it was static.  That "buffer" there is an auto---declared on
> > the stack of the socklua_recv() function [5].  No static buffer here.
>
> My bad, I thought I read static :\  Stack overflows are fun... :p  I
> was just thinking offhandedly that you could create a mmap, expose it
> to Lua as a file to read and write from -- and pass it to the
> networking lib as the buffer to recv into.  

  Um, you map files into memory with mmap().  You can also mmap() memory
without a file to allocate memory.  You do one or the other.

> Hmm.  And it could be
> shared depending on the threading setup -- but that's an afterthought.

  By definition, all threads in a process share memory; there's nothing to
set up.  It's sharing memory among processes that's tough.

  -spc

[1] "All problems in computer science can be solved by another level of
        indirection." --Butler Lampson

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Dirk Laurie-2
2014-08-25 5:06 GMT+02:00 Sean Conner <[hidden email]>:

>> I mean to say -- how  many of those are you defining personally?
>
>   22.

I knew there had to be a catch somewhere :-)

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Axel Kittenberger
In reply to this post by Javier Guerra Giraldez
coroutines> This is one of those "I know better than you -- we should avoid the
coroutines>potential for users to screw themselves." retorts.
coroutines>
coroutines>Why do you care if I shoot my foot?  It's not yours :p

However, in that anology what you are asking is "remove all safeguards on fire weapons for everybody, because I want to be free to shoot into my leg as fast as possible, without having to alter the gun myself".

If you absolutely must have mutable buffers, you can do freely with exactly userdata, provide in c a metatable for all operations and copy it to strings for interaction with other lua API or override functions like print() yourself to directly support your mutables.

Making a general memory inspection call for luadata is also quite easy to do, you only have to understand a little C. I estimate it to be around one single screenpage of code or so. The actual function to inspect the memory return it as string, array of bytes or whatever could even be a one-liner. And sorry, you won't get around it. At most if absolutely unwilling to find someone to write a library with that c call for you can require(). But there is no need to put it in stock Lua for your special case.

javier> I think it was more like failing to sanitize an input (http://xkcd.com/1354/)

Despite of the actual bug if newly allocated memory would have been zeroed, the bug wouldn't have mattered much.

This is one of the examples, what is wrong with, "after doing coding for a few decades and experienced a lot, I concluded it is wise to generally follow these guidelines. [point because this], [point because this], [point because this]..."

Of course to every general wise guideline there is this one extreme (constructured) case where its better to break it. I won't come up with an analogy of general life, but its much the same. 

One of the things I'd say, if you write a security related software and you have already a costum allocater, for the love of security, put on zero on any new allocated memory. Even if everything would be bug free it should not be needed.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Tim Hill
On Sun, Aug 24, 2014 at 6:44 PM, Tim Hill <[hidden email]> wrote:

> So override __tostring() in your userdata metatable. And if it’s a userdata supplied by someone else then they probably don’t want you messing around inside it. There is good reason for this; userdata is MEANT to be opaque to Lua code.

I'm not sure how many times I can tell different people I understand
that userdata is meant to be opaque before it gets annoying :p

I will look into what I can do with __tostring, but right now I think
I still have to convert it *to* a string....  hrrm.

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Sean Conner
On Sun, Aug 24, 2014 at 7:25 PM, Sean Conner <[hidden email]> wrote:

>   The issue I have with this (and I can't speak for others) is that it
> doesn't really *buy* you anthying but pointless complexity.

As I said in another post, looking at the contents of the io.stdout
userdata was just an example.  Ideally I'd want to present a buffer I
recv() into from my socket library to Lua to be used like a string --
without becoming a string -- so I can avoid reallocating for that
buffer over and over.

This discussion is kind of cycling now...

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Javier Guerra Giraldez
On Sun, Aug 24, 2014 at 7:50 PM, Javier Guerra Giraldez
<[hidden email]> wrote:
> On Sun, Aug 24, 2014 at 9:25 PM, Sean Conner <[hidden email]> wrote:
>>   Reusing a buffer was the cause of the recent Heartbleed bug that affected
>> OpenSSL and just about everything that relied upon it.
>
>
> I think it was more like failing to sanitize an input (http://xkcd.com/1354/)

On a related note, I think because of Heartbleed we should never reuse
buffers ever again.  Seems logical, right?  Can't trust programmers
anyway, we're all just simple beings that can't figure out better...

(I'm not mad at you, I think Sean is getting ridiculous -- Appeal to
Fear and all that...)

Reply | Threaded
Open this post in threaded view
|

Re: Pooling of strings is good

Coroutines
In reply to this post by Sean Conner
On Sun, Aug 24, 2014 at 8:06 PM, Sean Conner <[hidden email]> wrote:

>   Good lord!  Is there nothing you don't want to change?  Yes, I realize
> that every computer science problem can be solved by another layer of
> indirection [1] but it does slow things down.

You can "rename" modules at runtime but not userdata internal
typenames -- you would have to recompile whatever defines those types.
This seems like a blindspot to me..

>   Um, you map files into memory with mmap().  You can also mmap() memory
> without a file to allocate memory.  You do one or the other.

Yes, I was thinking you could create a page-sized buffer with mmap(),
then wrap the fd in a lua_Stream -- then write to that memory with
recv() -- and be able to :read() from it in Lua -- now you're dealing
with the file type that Lua has a friendly interface to.  Not a simple
black box that is "normal userdata".  It's just sad that :read() would
create strings for you from that -- I still want to use the string.*()
functions on userdata somehow.

123456