Alternative string concat (was "out of memory ")

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Alternative string concat (was "out of memory ")

Thomas Wrensch-2
This is in response to the fragement of the "out of memory" thread that
is discussing the problems with doing string concatenation and possible
alternatives.

Since the speed of the string .. operation is slow (a problem shared by
several languages) you could fix the problem by increasing the speed of
the existing .. operation or you could use some other, faster technique.

The problem with increasing the speed of .. is that it is hard given the
current structure of Lua strings, garbage collection, etc. The problem
with the second is that you have to do it yourself.

My suggestion would be to write a function "join" or some such that does
just what you want: it takes some number of strings as arguments and
returns a string equal to all those strings concatenated together.

Thus, rather than doing this:

text = "You peek carefully around the corner "
    .. "hoping\n to see the reason for the "
    .. "strange sounds.\n and you see "
    .. description(obj2)

You do this:

text = join(
    "You peek carefully around the corner ",
    "hoping\n to see the reason for the ",
    "strange sounds.\n and you see ",
    description(obj2))

If join was written in C it could do the concatenation
quickly, and then you'd only be interning the result string. It should
speed things up considerably. You could even write it in Lua and still
cut down on the
number of strings interned using a binary tree approach (I think Roberto
posted something like that six or eight months ago).

  - Tom


>>> [hidden email] 10/07/02 06:23 AM >>>
> What's wrong with my suggestion to use extra allocation and reference
> counting for strings?

Extra allocation: you can only use extra allocation if you allocate
space
for variables, not for values. In Perl, each variable has its own
buffer,
and a string assignment actually copies the string from one buffer to
the other. So it can do extra allocation in a variable's buffer. In Lua,
each variable points to a string; assignment assigns the pointer. So,
it doesn't make sense to have "extra allocation" in the space used by
a fixed string.

The scheme used by Perl makes some string operations much faster than
in Lua, but makes several other operations much slower. For instance,
the simple creation/release of local variables take some time. (If you
simply declare a dummy `my' variable in a function it slowdowns the call
by more than 10%).

Reference counting: Lua is not statically typed. To do reference
counting for strings, you will have the overhead of reference counting
everywhere. Then it is better to simply change the GC mechanism to
reference counting.

-- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: Alternative string concat (was "out of memory ")

Luiz Henrique de Figueiredo
>Since the speed of the string .. operation is slow (a problem shared by
>several languages) you could fix the problem by increasing the speed of
>the existing .. operation or you could use some other, faster technique.

The string .. operation is not slow in Lua. It's actually very fast, even
for multiple concatenations. The virtual machine operation is actually for
multiple concatenation. Here is the code for
	return "a" .."b" .."c" .."d"

        1       [2]     LOADK           0 0     ; "a"
        2       [2]     LOADK           1 1     ; "b"
        3       [2]     LOADK           2 2     ; "c"
        4       [2]     LOADK           3 3     ; "d"
        5       [2]     CONCAT          0 0 3

So, this is a linear (not quadratic) operation.
Bottom line: there's no reason not to use multiple concatenations for long
strings. (Except that there is a limit on the number of concatenations.)
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Alternative string concat (was "out of memory ")

Enrico Colombini
>The string .. operation is not slow in Lua. It's actually very fast, even
>for multiple concatenations. The virtual machine operation is actually for
>multiple concatenation. Here is the code for
>	return "a" .."b" .."c" .."d"
>
>        1       [2]     LOADK           0 0     ; "a"
>        2       [2]     LOADK           1 1     ; "b"
>        3       [2]     LOADK           2 2     ; "c"
>        4       [2]     LOADK           3 3     ; "d"
>        5       [2]     CONCAT          0 0 3
>
>So, this is a linear (not quadratic) operation.
>Bottom line: there's no reason not to use multiple concatenations for long
>strings. (Except that there is a limit on the number of concatenations.)

Very interesting. It seems I tried to solve a non-problem :-)
What is the limit?

Thanks to all for the information, suggestions... and patience.

  Enrico



Reply | Threaded
Open this post in threaded view
|

Re: Alternative string concat (was "out of memory ")

Philippe Lhoste
In reply to this post by Thomas Wrensch-2
On lun. 07 oct. 2002 22:09:40, "Thomas Wrensch" <[hidden email]> wrote:

> My suggestion would be to write a function "join" or some such that does
> just what you want: it takes some number of strings as arguments and
> returns a string equal to all those strings concatenated together.
> 
> Thus, rather than doing this:
> 
> text = "You peek carefully around the corner "
>     .. "hoping\n to see the reason for the "
>     .. "strange sounds.\n and you see "
>     .. description(obj2)
> 
> You do this:
> 
> text = join(
>     "You peek carefully around the corner ",
>     "hoping\n to see the reason for the ",
>     "strange sounds.\n and you see ",
>     description(obj2))
> 
> If join was written in C it could do the concatenation
> quickly, and then you'd only be interning the result string. It should
> speed things up considerably. You could even write it in Lua and still
> cut down on the
> number of strings interned using a binary tree approach (I think Roberto
> posted something like that six or eight months ago).

Lua 5.0's table.concat{"Foo", "Bar", "Doh"} is already doing this, if I 
understand correctly what you are saying.
Of course, for static strings like those I give, using .. is more efficient, 
as Luiz answered. This function is more useful when you put programmatically 
the strings in a table, then concatenate them at once, with an optional 
separator.

I was thinking about this issue since the LTN of Roberto, wanting to write 
something about it but always postponing. Seems like a good opportunity to 
give a go :-)

If compared to C, which is unfair (to compare a low level compiled language 
to a high level scripting one), Lua's *core* support of strings is strangely 
lacunar. I don't know for other scripting languages, my experience is too 
limited here.

Basically, you can declare and set a string, use it as index, compare two 
strings (case-sensitive), concatenate them.
That's about all. No base support for getting the length of a string, 
nothing to access an individual character, even less, given the immutable 
nature of strings, the capability to change a given char.
So, unlike C, you can't write a string library in Lua, you have to rely on C 
to build it.

Now, these are limitations we can live with, I understand (more or less) the 
deep nature of string in Lua doesn't allow it, I don't ask to change that...

But perhaps the support of a core "buffer" type would be useful.
I don't dream, this is hightly unlikely it will ever happen, but still, it 
may be useful to have it in the base Lua, allowing semi-transparent 
conversions between this type and the string type, and (perhaps) avoiding 
the overhead of function calls (using eg. stringId[5, 9] to get a sub-
string).
Plus it may allow to avoid duplicating functions for regular strings and 
buffers (libraries check what they get), literal strings used only in buffer 
manipulations would not be hashed/garbage collected (ie. would exists only 
at source/bytecode level), etc.

Now, realistically, this would be a library, the buffer being a userdata.
Such a buffer would be a C-like (or more probably C++-like) string: you 
define it with a given size, but it can grow automatically (or not, if told 
not to).
You can get its size and length, (safely) concatenate strings to it, access 
any char or sub-string inside (nil if not yet set? or initialize at 
predefined value) as read or write (behavior to define when setting a byte 
inside an undefined area), read and write part or whole buffer from/to a 
file, etc.
It would be perfect to build a string (which can be binary) piece by piece 
(eg. creating an image before writing it to a file, or the Life sample).

Perhaps the need for such a feature isn't general, ie. some applications may 
have no use for it. So putting it in the core isn't such a brilliant idea 
(and probably not doable anyway).

But to have such a library would be a good idea. I may give a try someday, 
if nobody does it before, and if nobody destroys the idea as useless :-)
I have first to write detailled specifications, to have an idea of where I 
go.

-- 
--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/



Reply | Threaded
Open this post in threaded view
|

Re: Alternative string concat (was "out of memory ")

Luiz Henrique de Figueiredo
In reply to this post by Thomas Wrensch-2
>>(Except that there is a limit on the number of concatenations.)
>
>What is the limit?

255 concatenations at present.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Alternative string concat (was "out of memory ")

Thomas Wrensch-2
Agreed that the concatination operation is not particularly slow (I ran
a few tests, its more fun to argue with real data). Also the way
multiple concatenation works makes the join function I suggested
completely unnecessary.

However, there will be cases where using concatenation is not
appropriate, and does not produce overall linear (as in O(N)) behavior.
This is really only noticable for large numbers of concatenations.

There are two reasons why a group of N iterations might run in greater
than O(N) time.

First, in a situation like this:

    local str = ""
    for i=1,100000 do
        str = str.."x"
    end

The size of the string that must be created and copied grows as N grows.
With a sufficiently large N this should approach quadratic (as in
O(N*N)) time.

Running the above code for different values of N gave me these times:

    Iterations    Time
    ----------    ------
     10000         0.13
     20000         0.84
     30000         1.92
     40000         7.95
     50000        13.46
     60000        22.13
     70000        30.86
     80000        41.75
     90000        54.35
    100000        68.43

(I thought the jump from 30000 to 40000 was particularly interesting. I
was able to repeat the same behavior, thought the times varied up to 10%
or so).

I wish the above case could be called pathological, but it is not
unreasonable to do something like this to build up a string from
individual characters being read from somewhere. Note that in the above
case my join()
function would have exactly the same problem, but a userdata that
implemented a string buffer could operate in linear time.

Another possible case (I didn't collect data on this one) is when each
new string is kept around. Such as:

    names = {}
    for i=1,10000 do
        names[i] = names..tostring(i)
    end

In this case all operations are (I think) linear, with the exception of
interning the string. If I understand the way it is done, that would be
O(N log N) where N is the number of strings in the system.

I don't think the above case can really be called pathological either,
though it seems less useful than the first case.

In the first case some kind of string buffering system would speed
things up considerably. In the second case I think you're stuck, since
Lua requires strings to be interned and by definition you're keeping
them all around and available.

  - Tom Wrensch

>>> [hidden email] 10/07/02 13:38 PM >>>
>Since the speed of the string .. operation is slow (a problem shared by
>several languages) you could fix the problem by increasing the speed of
>the existing .. operation or you could use some other, faster
technique.

The string .. operation is not slow in Lua. It's actually very fast,
even
for multiple concatenations. The virtual machine operation is actually
for
multiple concatenation. Here is the code for
	return "a" .."b" .."c" .."d"

        1       [2]     LOADK           0 0     ; "a"
        2       [2]     LOADK           1 1     ; "b"
        3       [2]     LOADK           2 2     ; "c"
        4       [2]     LOADK           3 3     ; "d"
        5       [2]     CONCAT          0 0 3

So, this is a linear (not quadratic) operation.
Bottom line: there's no reason not to use multiple concatenations for
long
strings. (Except that there is a limit on the number of concatenations.)
--lhf