string buffers using string.gsub faster than table.concat

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

string buffers using string.gsub faster than table.concat

David Manura
Are there faster ways to concatenate a large number of strings in Lua
than the usually recommended table.concat approach described in PiL
2nd 11.6 "String Buffers"?  The following tests show that a trick with
string.gsub can be a little faster:

-- test.lua
local mode = arg[1] or 'A'
local N = arg[2] or 1000000
local M = arg[3] or 1
local result
if mode == 'A' then
  for i=1,M do
    result =  ((' '):rep(N):gsub('() ', function(i)
      return string.char(i % 256)
    end))
  end
elseif mode == 'B' then
  for i=1,M do
    local ts = {}
    for i=1,N do ts[#ts+1] = string.char(i % 256) end
    result = table.concat(ts)
  end
elseif mode == 'C' then
  for i=1,M do
    local ts = {}
    for i=1,N do
      ts[#ts+1] = string.char(i % 256)
      while #ts >= 2 and #ts[#ts] ==  #ts[#ts-1] do
        ts[#ts-1] = ts[#ts-1] .. ts[#ts]
        ts[#ts] = nil
      end
    end
    for i=#ts,2,-1 do
      ts[i-1] = ts[i-1] .. ts[i]
      ts[i] = nil
    end
    result = ts[1]
  end
elseif mode == 'D' then
  for i=1,M do
    local s = ""
    for i=1,N do s = s .. string.char(i % 256) end
    result = s
  end
end

# results for Lua 5.1, x86
$ for x in A B C D; do time test.lua $X; done
A: 0.72 0.79 0.78 s
B: 1.2 1.4 1.4 s
C: 3.0 2.6 2.6 s
D: (very long)

# for luajit 2.0-beta10
A: 0.49 0.49 0.47
B: 0.65 0.90 0.65
C: 1.7 1.9 1.8
D: (very long)

Reply | Threaded
Open this post in threaded view
|

Re: string buffers using string.gsub faster than table.concat

Javier Guerra Giraldez
I think you're benchmarking more than just the string concatenation,
and the overloading of the 'i' variable makes the code do a very
different thing on each 'mode'.

your code (Lua 5.1.4, amd64):

A: 0.481
B: 0.532
C: 1.142
D: (very long)

just replacing each inner 'i' by 'j', and using it for the table index
instead of #ts+1, i got:

A: 0.449
B: 0.293
C: 0.469
D: (still too long)

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: string buffers using string.gsub faster than table.concat

David Manura
On Sun, Aug 26, 2012 at 1:29 AM, Javier Guerra Giraldez
<[hidden email]> wrote:
> just replacing each inner 'i' by 'j', and using it for the table index
> instead of #ts+1, i got:

The outer loop i's (provided merely for benchmarking) are not
referenced anywhere, but you're right about the #ts+1 overhead being
significant.  Here's an updated benchmark:

-- test.lua
local mode = arg[1] or 'A'
local N = arg[2] or 1000000
local M = arg[3] or 10
local result
if mode == 'A' then
  for _=1,M do
    result =  ((' '):rep(N):gsub('().', function(i) return
string.char(i % 256) end))
  end
elseif mode == 'B' then
  for _=1,M do
    local ts = {}
    for i=1,N do ts[i] = string.char(i % 256) end
    result = table.concat(ts)
  end
end

Results: (xeon/cygwin/make posix)
A: 5.3 5.4 5.3 (lua 5.2.1)
A: 5.1 5.1 5.1 (lua 5.1.5)
B: 4.6 4.6 4.6 (lua 5.2.1)
B: 4.5 4.4 4.4 (lua 5.1.5)

By further moving the `(' '):rep(N)` out of the loop, we get only a
small improvement:

A: 5.2 5.1 5.3 (lua 5.2.1)
A: 5.0 4.9 5.0 (lua 5.1.5)
B: 4.5 4.6 4.6 (lua 5.2.1)
B: 4.4 4.4 4.4 (lua 5.1.5)

So, the string.gsub approach is not faster, but it's close.  One thing
can be said: it has a more functional style.


This also reminds me that string regular expression libraries can be
abused in other ways, such as to solve Boolean satisfiability problems
(SAT) [1-2], but Lua's pattern matching library lacks support for
alternation on backreferences.

[1] http://nikic.github.com/2012/06/15/The-true-power-of-regular-expressions.html
[2] http://perl.plover.com/NPC/NPC-3SAT.html