Repeated processing of large datasets

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Repeated processing of large datasets

John Logsdon
Greetings to all

I am processing some large datasets that are currently stored as .csv
files and I can slurp them all into memory.  I only want specific columns.

The datasets are typically a few million records, each with up to 100
columns of which I am only interested in 20 or so.

So the first thing I do is to slurp it all into memory and discard the
unwanted data thus:

local function readValues(f)
local Line = f:read("*l")
if Line ~= nil then
  Line = split(string.gsub(string.gsub(Line,"[\n\r]","")," +"," "),",")
  return {Line[i1],Line[i2],Line[i3]}
end

where i1, i2, i3 etc have been pre-calculated from the header line

Then in the main program I read each line at a time:



Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675