Dump large tables quickly

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Dump large tables quickly

Ravi Joshi
Hi all,

I am using Lua 5.2. I have large tables (1-dimensional arrays) of size 800,000. I want to dump these tables quickly. I found an article on Wiki titled "Save Table To File" [1] and used it but found not up to the mark. The tables saved using this method, i.e., table.save( table , filename ) are shared in my DropBox here [2]. 

Since my primary concern is speed, hence I am ready to adopt binary file serialization if such exists.

References-
[1] http://lua-users.org/wiki/SaveTableToFile
[2] https://www.dropbox.com/s/efdfvgy1n4m0zbx/table.dump?dl=0


-
Thanks
Ravi Joshi

Reply | Threaded
Open this post in threaded view
|

Re: Dump large tables quickly

Niccolo Medici
On 6/30/18, Ravi Joshi <[hidden email]> wrote:

> Hi all,
>
> I am using Lua 5.2. I have large tables (1-dimensional arrays) of size
> 800,000. I want to dump these tables quickly. I found an article on Wiki
> titled "Save Table To File" [1] and used it but found not up to the mark.
> The tables saved using this method, i.e., table.save( table , filename ) are
> shared in my DropBox here [2].
>
> Since my primary concern is speed, hence I am ready to adopt binary file
> serialization if such exists.
>
> References-
> [1] http://lua-users.org/wiki/SaveTableToFile
> [2] https://www.dropbox.com/s/efdfvgy1n4m0zbx/table.dump?dl=0


On my ancient computer, the following code takes 8.5 seconds:

    local t = assert(loadfile('table.dump'))()[1]

    for i = 1, #t do
      print(t[i], ",")
    end

And using this simple-minded C code, it takes 1.3 seconds:

    static int
    l_dump(lua_State * L)
    {
       int len;
       int idx;

       printf("{");
       len = lua_rawlen(L, -1);
       for (idx = 1; idx <= len; idx++) {
         lua_Number n;
         lua_rawgeti(L, -1, idx);
         printf("%g,\n", (double) lua_tonumber(L, -1));
         lua_pop(L, 1);
       }
       printf("}");

       return 0;
    }

That's my first try: I guess this code can be made faster (and you
should make it a luarock, and also make it portable among the Lua
versions). But first search in luarocks.org to see if somebody has
done this already.

Reply | Threaded
Open this post in threaded view
|

Re: Dump large tables quickly

Ravi Joshi
Thanks, Niccolo Medici. It was wonderful to hear your suggestion.

I extended your suggestion and made a sample Lua C binding. Below is the Lua code-

my_table = {p = {11, 22, 33, 44}, q = {0.12, 0.23, 0.34, 0.45, 0.56}}
require "savetable"
mytask.do_it(my_table)

Please note that in the above code for debugging purpose, I have defined two very small 1-dimensional arrays. However, in reality, the table contains two 1-dimensional arrays in which each array contains approximately 800,000 elements.

I am not putting the C code using Lua C bindings in this message since it can take many lines here. Instead, please check following Pastebin links to get the code-

(1) savetable.c: https://pastebin.com/sBWA3uaM
(2) wrapper.lua: https://pastebin.com/uC4vpacC

I am looking for suggestions, to make the table serialization much faster. I am using Lua 5.1 (sorry, I said 5.2 previously by mistake) on 64 bit Ubuntu PC. 

PS: I am new to Lua community, I couldn't find much help on the luarocks.org. Do you know any such library?

Thanks again

-
Regards
Ravi





On Sunday, 1 July, 2018, 4:49:01 PM GMT+9, Niccolo Medici <[hidden email]> wrote:





On 6/30/18, Ravi Joshi <[hidden email]> wrote:

> Hi all,
>
> I am using Lua 5.2. I have large tables (1-dimensional arrays) of size
> 800,000. I want to dump these tables quickly. I found an article on Wiki
> titled "Save Table To File" [1] and used it but found not up to the mark.
> The tables saved using this method, i.e., table.save( table , filename ) are
> shared in my DropBox here [2].
>
> Since my primary concern is speed, hence I am ready to adopt binary file
> serialization if such exists.
>
> References-
> [1] http://lua-users.org/wiki/SaveTableToFile
> [2] https://www.dropbox.com/s/efdfvgy1n4m0zbx/table.dump?dl=0



On my ancient computer, the following code takes 8.5 seconds:

    local t = assert(loadfile('table.dump'))()[1]

    for i = 1, #t do
      print(t[i], ",")
    end

And using this simple-minded C code, it takes 1.3 seconds:

    static int
    l_dump(lua_State * L)
    {
      int len;
      int idx;

      printf("{");
      len = lua_rawlen(L, -1);
      for (idx = 1; idx <= len; idx++) {
        lua_Number n;
        lua_rawgeti(L, -1, idx);
        printf("%g,\n", (double) lua_tonumber(L, -1));
        lua_pop(L, 1);
      }
      printf("}");

      return 0;
    }

That's my first try: I guess this code can be made faster (and you
should make it a luarock, and also make it portable among the Lua
versions). But first search in luarocks.org to see if somebody has
done this already.



Reply | Threaded
Open this post in threaded view
|

Re: Dump large tables quickly

Niccolo Medici
On 7/3/18, Ravi Joshi <[hidden email]> wrote:
>
> my_table = {p = {11, 22, 33, 44}, q = {0.12, 0.23, 0.34, 0.45, 0.56}}
> require "savetable"
> mytask.do_it(my_table)
>
> Please note that in the above code [...] in reality,
> the table contains two 1-dimensional arrays [...]

Tip: in C you can write just the function that serializes the
"primitive" array. If such arrays are contained inside more complex
tables, then you can provide a utility function, writen in Lua (as
it's easier), that serializes the whole table (I sent you an email
with a rock that demonstrates how to bundle Lua code together with
your C code). You can detect a "primitive" array using heuristics, for
example, a table whose #1 element is a number.

>
> (1) savetable.c: https://pastebin.com/sBWA3uaM

It'd be more useful if your function wrote the data into a string and
returned it (string are binary-safe). Then the programmer will be free
to use it however she wants.

>
> I am using Lua 5.1 (sorry, I said 5.2 previously
> by mistake)

There are small differecnes between the APIs. Test your rock with all
the versions you want to target. You can make your C code portable by
using "#ifdef LUA_VERSION < ..." (or whatever the name of the verison
def is called).

>
> I am looking for suggestions, to make the table serialization much faster.

The bottlneck is extracting the numbers from the table, and I guess
you can't do too much about this.

You can implemnt the array data-structure in C, and expose it as a Lua
userdata. You don't need much code for this. Then serialization should
be fast (as you'd store it in memory as a linear sequence of doubles).
It's hard for me to believe that nobody has done this already (numpy
anyone?). So, again: check in luarocks.org.

>
> on 64 bit Ubuntu PC.

BTW, if you want the serialized output to be portable among systems
you need to consider endianness (google it).

>
> (1) savetable.c: https://pastebin.com/sBWA3uaM

I have criticism for your C code, of course, but I used up all my time
already ;-) If people here don't comment, email me in a few days with
your updated code and I'll comment on it.

(BTW, your email ended up in my spam folder (I'm using gmail). Maybe
that's why you haven't got much replies. Things to try: don't send
HTML, don't top-post.)

Reply | Threaded
Open this post in threaded view
|

Yahoo addresses are broken (was: Dump large tables quickly)

Jonathan Goble
On Tue, Jul 3, 2018 at 10:31 AM Niccolo Medici <[hidden email]> wrote:
(BTW, your email ended up in my spam folder (I'm using gmail). Maybe
that's why you haven't got much replies. Things to try: don't send
HTML, don't top-post.)

This is a known issue with Yahoo email addresses, which Ravi is using. Yahoo set a DMARC reject policy several years ago, which breaks mailing lists. [1]

The only workaround on Ravi's end is to not use Yahoo with mailing lists. There are a few workarounds that can be applied by the list administrators to fix the issue, none of them ideal. [2]


Reply | Threaded
Open this post in threaded view
|

Re: Yahoo addresses are broken (was: Dump large tables quickly)

Pierre-Yves Gérardy
On Tue, Jul 3, 2018 at 8:44 PM, Jonathan Goble <[hidden email]> wrote:
> The only workaround on Ravi's end is to not use Yahoo with mailing lists.
> There are a few workarounds that can be applied by the list administrators
> to fix the issue, none of them ideal. [2]

It can be fixed on the receiving end by setting up a filter that
prevents lua-l messages from being marked as spam. The list is well
managed, and no actual spam comes through it anyway, so it is safe to
do.

—Pierre-Yves