Optimizing Csv to Field Table

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing Csv to Field Table

Billy
Anyone know how i can optimize this? Or if i even wrote this correctly. I wanted
to create a function that would return the table corresponding with the fields
that the csv was created

fromCSV function can be found at http://lua-users.org/wiki/CsvUtils

-- Converts CSV file to a Field Table
-- example
-- Name,Class,FormID
-- Warrior,CombatWarrior1H,0005003
-- Archer,CombatArcher,0005002
-- returns {
-- [1] = { Name = Warrior, Class = CombatWarrior1H, FormID = 0005003 },
-- [2] = { Name = Archer, Class = CombatArcher, FormID = 0005002}
-- }
function CSVToTable(file_in)
        local t = {}
        local lineCount = 1
        local count = 1

        local fieldCount = 0
        local fields = {}
        for line in io.lines(file_in) do
                local csv = fromCSV(line)
                if lineCount == 1 then
                        fieldCount = #csv
                        if fieldCount == 0 then -- incase table size is nil,
lets recount
                                for k,v in pairs(csv) do
                                        fieldCount = fieldCount + 1
                                end
                        else
                                for k,v in pairs(csv) do
                                        fields[k] = Trim(v)
                                end
                        end
                else
                        if not t[count] then
                                t[count] = {}
                        end
                        for k,v in pairs(csv) do
                                t[count][fields[k]] = csv[k]
                        end
                        count = count + 1
                end


                lineCount = lineCount + 1
        end
                return t
end


Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Csv to Field Table

David Favro
On 12/16/2012 08:31 PM, Billy wrote:
> Anyone know how i can optimize this? Or if i even wrote this correctly. I wanted
> to create a function that would return the table corresponding with the fields
> that the csv was created
>
> fromCSV function can be found at http://lua-users.org/wiki/CsvUtils

There are a lot of things that you could do to optimize this function (and
simplify it), but I'd guess that the single best thing you could do is to
just make a modified version of fromCSV() that directly constructs your
table, rather than using its output table to construct another one.

That said, if you want to simplify your function, a few things off the top
of my head:

* Get rid of linecount and just read the first line outside the loop

* Get rid of that "let's recount" code and just abort (or continue) if
fieldcount==0

* Replace all of pairs() with ipairs()

* Get rid of count and references to t[count] (just construct each new
record as a local table, then append it to t, e.g. with table.insert)

* You need some kind of handling of the case where the number of fields in a
detail line doesn't match the number of fields in the header, especially if
it's larger.

-- David

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Csv to Field Table

Michael Wolf
In reply to this post by Billy
2012/12/17 Billy <[hidden email]>
Anyone know how i can optimize this? Or if i even wrote this correctly. I wanted
to create a function that would return the table corresponding with the fields
that the csv was created

If you don't need a table with actual fields but only a way to access the fields by name you could set a metatable for each row where the __index does the named field lookup. As you don't need to create another table for each row and copy every field its much faster to create and also uses half the memory in my testcase.

function CSVToTableMT(file_in)
local file=assert(io.open(file_in,"r"))
local line=assert(file:read("*l"))
local header=fromCSV(line)
local headerlookup={}
for i,field in ipairs(header) do
headerlookup[field]=i
end
local mt={
__index=function(tbl,key)
local idx=headerlookup[key]
if idx==nil then
return nil
else
return tbl[idx]
end
end
}
local tbl={}
line=file:read("*l")
while line~=nil do
table.insert(tbl,setmetatable(fromCSV(line),mt))
line=file:read("*l")
end
file:close()
return tbl
end

Test:
Windows xp with luajit 2.0
30MB csv file with 19 columns and 400000 rows
reading all rows in a table and then writing all out again, indexing each field of each row by name.

Your approach:
7.54 seconds 400MB memory usage

With optimisations mentioned by  David Favro:
7.27 seconds 398MB memory usage

Metatable solution:
5.9 seconds 177MB memory usage

so at least under LuaJit 2.0 ist the fastest approach which also uses much less memory.

LG,
Michael

ps:
thats the optimized function with the original approach i used:

function CSVToTable2(file_in)
local file=assert(io.open(file_in,"r"))
local line=assert(file:read("*l"))
local header=fromCSV(line)
local tbl={}
line=file:read("*l")
while line~=nil do
local fields={}
for i,field in ipairs(fromCSV(line)) do
fields[header[i]]=field
end
table.insert(tbl,fields)
line=file:read("*l")
end
file:close()
return tbl
end
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Csv to Field Table

Billy
Thank you for your help, never thought of using metatable, What profiling
program did you use to compare memory usage / execution time?



Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Csv to Field Table

Michael Wolf
For runtime i used a high precision timer luajit library I wrote and for memory usage i used ProcessExplorer a advanced TaskManager under Windows.

Cheers,
Michael



2012/12/18 Billy <[hidden email]>
Thank you for your help, never thought of using metatable, What profiling
program did you use to compare memory usage / execution time?