Imitation of AWK in Lua 5.1 using a PC.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Imitation of AWK in Lua 5.1 using a PC.

LunaticCoder
Hi,

This is my first post, please be gentle. I am trying to make a simple
imitation of AWK in Lua 5.1 using a PC.

My aim is to write a table of instructions each comprising a pattern and
action couple - written in Lua. I am not aiming to emulate the full AWK
syntax or make Lua read AWK scripts.

>From AWK it seems sensible to recreated BEGIN,END,RS,FS,ORS,OFS,FNR and
I have added FN - a Lua table to hold field names.

I would then pass the instructions table to awk.lua with a source for
processing. e.g.

instructions = {
['BEGIN'] = "print('Test run') print('\\n')",
['RS'] = "\r?\n",
['FS'] = "%s",
['ORS'] = "+",
['OFS'] = "_",
['END'] = "print('\\nCompleted @'..os.date():sub(9))",
-- {"",""}
{[[(string.match(line,"roof") and string.match(line,"On"))]] ,
[[print("\n"..FNR.."\t"..line)]]},
}

print(awk("The Cat sat\nOn the roof\nAnd smiled",instructions))
print(awk({"The Cat sat","On the roof","And frowned"},instructions))
print(awk("input.txt",instructions))

As you can see I would like to read the source from a string, table or
file. I will assume the table is correctly broken into records and
ignore the RS (input record separator).

This script needs a string.split() function (which I have slightly
modified to return a table and the number of fields) and an io.exists()
function:

function string.split(inputstr, sep)
-- https://stackoverflow.com/questions/1426954/split-string-in-lua
    if sep == nil then
        sep = "%s"
    end

    local t={}
    local i=1

    for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
       t[i] = str
       i = i + 1
    end

    if i>1 then
        return t,i
    else
        return inputstr
    end
end

function io.exists(filename)
-- Source:
http://stackoverflow.com/questions/4990990/lua-check-if-a-file-exists
    if type(filename)~="string" then
       return false
    end
       return os.rename(filename,filename) and true or false
    end
end

So to my first attempt at awk.lua.

The pattern() function may return any true value to trigger the action,
as in AWK it doesn't have to be a text match.

For default output and print(), I have used a custom function extending
io.write() to accept multiple parameters, add ORS, suppress the extra
EOL character and to allow redirection.

I have coded the default instructions {"",""} and {"","print(line)"}.
You will see I have altered the "line" variable in the ENV for action(),
to substitute the FS with OFS, this may cause problems if you manipulate
the line variable in an action() and expect it to be the original from
the input stream.

Other actions written in Lua should work within the assigned ENV.

        function awk(source,instructions,output)
        -- awk source as string, table or file, instructions as table,
optional output

            -- Check for instructions table
            if type(instructions)~="table" then
                print("Error: Instructions need to be in a table")
                return
            end

            -- Check for special table entries or apply defaults
            RS  = instructions.RS or "\r?\n"            -- Try to
match awk default
            FS  = instructions.FS or "%s"               -- Try to
match awk default
            ORS = instructions.ORS or "\n"
            OFS         = instructions.OFS or " "
            BEGIN = instructions.BEGIN or "return "
            END         = instructions.END or "return "
            BEGIN = assert(loadstring(BEGIN)) or nil
            END         = assert(loadstring(END))  or nil
            FN  = instructions.FN or {}         -- Extra variable for
field names e.g. CSV data

            -- Check type of source and define iterator
            if io.exists(source) then
                FILENAME = source
                -- Load the whole file
                local f = assert(io.open(source, "r"))
                local t = io.read(f,"*all")
                io.close(f)
                -- Split using RS
                source = table.split(t,RS)
            else
                FILENAME = "Lua string"
            end

            if type(source)=="string" then
                source = string.split(source,RS)
            end
            if type(source)=="table" then
                -- Leave: assumes RS is correct
            end

            -- Check output
            if output then
                if io.exists(output) then
                    assert(io.open(outout, "w"))
                else
                    print("Can't find the output file: "..output)
                    return
                end
            end

            -- Set up the sandbox for pattern and actions functions
            --   the print function could be changed to concat using OFS
            awkENV = {
                ['ORS']         = ORS,
                ['OFS']         = OFS,
                ['output']      = output,
                ['print']       = print,
                ['string']      = string,
                ['math']        = math,
                ['os']          = os,
                ['io']          = io,
                ['table']       = table,
            }

            -- Call the BEGIN Function
            if type(BEGIN)=="function" then
                setfenv(BEGIN,awkENV)
                BEGIN()
            end

            -- Process the source lines
            for FNR,line in ipairs(source) do
                -- Handle empty lines


                -- Process the line into a table of fields
                field, NF = string.split(line,FS)

                -- Add to the sandboxed ENV
                awkENV.FNR = FNR
                awkENV.line = line
                awkENV.field = field

                -- Apply the instructions if pattern == true
                for i,v in ipairs(instructions) do

                    -- Set default for empty pattern
                    pattern     = v[1]
                    if pattern=="" then
                        pattern="true"
                    end

                    -- Set default for empty action
                    action = v[2]
                    if action =="" then
                        action ="print(table.concat(field,OFS))"
                    end

                    -- Check for last source record and don't print ORS
                    if next(source,FNR) ==nil then
                        awkENV['print'] = function(...)
                                local arg={...}
                                local text = table.concat(arg,"\t")
                                io.write(text)
                        end
                        awkENV['io.write']      = function(...)
                                local arg={...}
                                local text = table.concat(arg,"\t")
                                io.write(text)
                        end
                    else
                        awkENV['print'] = function(...)
                                local arg={...}
                                local text = table.concat(arg,"\t")
                                io.write(text..ORS)
                        end
                        awkENV['io.write']      = function(...)
                                local arg={...}
                                local text = table.concat(arg,"\t")
                                io.write(text..ORS)
                        end
                    end

                    -- Set the environment for pattern
                    pattern = assert(loadstring("return "..pattern),
"Could not load pattern: "..pattern)
                    awkENV.line = line
                    setfenv(pattern,awkENV)

                    -- Set the environment for actions - not echange to "line"
                    action = assert(loadstring("return "..action),
"Could not load action: "..action)
                    awkENV.line = table.concat(field,OFS)
                    setfenv(action,awkENV)

                    -- Evaluate each pattern and act
                    if pattern() then
                        action()
                    end

                end
            end

            -- Call the END function
            if type(END)=="function" then
                setfenv(END,awkENV)
                END()
            end

            io.close()

        end

I would be grateful for some advice about how to manage the ORS. You
will see that for the last source line I avoid appending the ORS. However,
if the last line doesn't generate any output I can't get back the previously
output ORS!

For a generalized approach I would also value some advice about creating
an iterator to process a line at a time, while preserving the ability to
send a string or table or file name.

Kind Regards

Gavin

Reply | Threaded
Open this post in threaded view
|

Re: Imitation of AWK in Lua 5.1 using a PC.

Sean Conner
It was thus said that the Great Gavin Holt once stated:
> Hi,
>
> This is my first post, please be gentle. I am trying to make a simple
> imitation of AWK in Lua 5.1 using a PC.

  [ snip ]
 
> I would be grateful for some advice about how to manage the ORS. You
> will see that for the last source line I avoid appending the ORS. However,
> if the last line doesn't generate any output I can't get back the previously
> output ORS!
>
  You could try the following (simplified to make it easier to follow):

        awkENV.ORS = "" -- initially nothing

        for _,pattern i pairs(instructions) do
          if pattern[1](...) then
            local output = pattern[2](...)
            if output then
              -- write output
              -- write awkENV.ORS
              awkENV.ORS = instructions.ORS or "\n" -- set ORS for next
interation
            end
          end
        end

  You always send the ORS, but the first time through, it's an empty string.
On each subsequent pass, it's set to the value you want.  Yes, it's an extra
assignment on each pass, but it avoids quite a bit of tricky logic, so I
think it's a win overall.

> For a generalized approach I would also value some advice about creating
> an iterator to process a line at a time, while preserving the ability to  
> send a string or table or file name.

  An interator is a function that returns a function and some state data.
Something like:

function next_record(source,separator)
  separator = separator or "\n"
 
  -- -----------------------------------------------------------------------
  -- our state is the source table; var is the record count.  We can't just
  -- use next() because the traveral of keys is unordered, even for
  -- numerical indicies.  So let's just do it ourselves.
  -- -----------------------------------------------------------------------
 
  local next_from_table(state,var)
    var = var + 1
    if state[var] then
      return var,state[var]
    end
  end
 
  -- -----------------------------------------------------------------------
  -- Here, our state is a function that returns the next record, and again,
  -- var is just the record count.
  -- -----------------------------------------------------------------------
 
  local next_from_file_or_string(state,var)
    var = var + 1
    local rec = state()
    if rec then
      return var,state()
    end
  end
 
  -- -----------------------------------------------------------------------
  -- This is used to handle error cases---there's nothing to iterate, so it
  -- just exits.
  -- -----------------------------------------------------------------------
 
  local next_nothing()
    return nil
  end
 
  -- ---------------------------------------------
  -- See section 2.4.5 (Lua 5.1)
  --             3.3.5 (Lua 5.2)
  --             3.3.5 (Lua 5.3)
  -- for more information about iterators
  -- ---------------------------------------------
 
  if type(source) == 'table' then
      return next_from_table,source,0
 
  -- ----------------------------------------------------------------------
  -- For a string, if we can open it, treat it as a file.  If the input
  -- separator is "\n", use io.lines() for state (a function that will
  -- return the next line); otherwise, read the entire input, and use
  -- str:gmatch() (a function that will return the next match) as the state.
  -- If we can't open it, assume a non-filename, and use str:gmatch() as the
  -- state.
  -- -----------------------------------------------------------------------
 
  elseif type(source) == 'string' then
    if file_exists(source) then
      if separator == '\n' then
        return next_from_file_or_string,io.lines(source),0
      else
        local f = io.open(source,"r")
        if f then
          local d = f:read("*a")
          f:close()
          return next_from_file_or_string,d:gmatch("[^" .. separator .. "]+"),0
        else
          return next_nothing
        end
      end
    end
   
    return next_from_file_or_string,source:gmatch("[^" .. separator .. "]+"),0
    end
 
  -- ---------------------------------
  -- For any other source, just bail
  -- ---------------------------------
 
  else
    return next_nothing
  end
end

  The code is untested, but it *should* work (or be close enough to get it
working).  It's to be used like:

        -- source can be a table, a filename or a string
       
        for recnum,record in next_record(source,"\n") do
          ...
        end
       
  -spc (Hope this helps some)
 
 

Reply | Threaded
Open this post in threaded view
|

Re: Imitation of AWK in Lua 5.1 using a PC.

Jay Carlson
In reply to this post by LunaticCoder
On Jul 10, 2017, at 5:08 PM, Gavin Holt <[hidden email]> wrote:

> I would be grateful for some advice about how to manage the ORS. You
> will see that for the last source line I avoid appending the ORS. However,
> if the last line doesn't generate any output I can't get back the previously
> output ORS!

Here’s a closure version? It seemed easier than the object version.

function ors_stream(ORS, output)
    local needs_ors = false;
    return function(s)
        if needs_ors then output(ORS) end
        needs_ors=true
        output(s)
    end
end

my_stream = ors_stream("\n----\n", io.write)
my_stream("hello, world")
my_stream("hello, again")

Jay