quoting unquoted token?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

quoting unquoted token?

Petite Abeille
Hello,

What would be a reasonable way to, hmmm, quote unquoted tokens?

Say a token is an uninterrupted sequence of alphanumeric characters (%w) or a quoted token. A quoted token is any sequence of characters inside quotes, minus any quote characters.

For example:

'foo "hello world" bar'

How to turn the above into:

'"foo" "hello world" "bar"'

Thoughts?




Reply | Threaded
Open this post in threaded view
|

Re: quoting unquoted token?

Peter Odding-3
> Hello,
>
> What would be a reasonable way to, hmmm, quote unquoted tokens?
>
> Say a token is an uninterrupted sequence of alphanumeric characters
> (%w) or a quoted token. A quoted token is any sequence of characters
> inside quotes, minus any quote characters.
>
> For example:
>
> 'foo "hello world" bar'
>
> How to turn the above into:
>
> '"foo" "hello world" "bar"'
>
> Thoughts?

You can do this in plain Lua but it's not nice, I would prefer LPeg:

----

lpeg = require 'lpeg'
whitespace = lpeg.S'\r\n\f\t '^1
unquoted = lpeg.R('AZ', 'az', '09')^1
single_quoted = "'" * (1 - lpeg.P"'")^0 * "'"
double_quoted = '"' * (1 - lpeg.P'"')^0 * '"'
any = lpeg.C(whitespace + unquoted + single_quoted + double_quoted)

function quote_tokens(input)
   local i = 1
   local output = {}
   while true do
     local match = any:match(input, i)
     if not match then
       break
     else
       i = i + #match
       if match:find '%S+' then
         if match:find '^[A-Za-z0-9]+$' then
           match = '"' .. match .. '"'
         end
         output[#output + 1] = match
       end
     end
   end
   return table.concat(output, ' ')
end

assert(quote_tokens 'foo bar baz' == '"foo" "bar" "baz"')
assert(quote_tokens 'foo "bar" baz' == '"foo" "bar" "baz"')
assert(quote_tokens "foo 'bar' baz" == '"foo" \'bar\' "baz"')

----

One notable advantage of LPeg is that it's dead easy to extend this
example with support for escape sequences and stuff like that :-)

  - Peter

Reply | Threaded
Open this post in threaded view
|

Re: quoting unquoted token?

Martijn Hoekstra
In reply to this post by Petite Abeille
Depends on how much sophistication you want. Something like this could
probably be done fastest with a quick simple parser. Runtime
performance wouldn't be great, but it would be fairly straightforward.

something like

function parse_string(mystring)
  var tokens = {}
  do
    next_token, mystring = consume_token(mystring)
    table.add(tokens, next_token)
  until next_token = nil
  return tokens
end

and write the consume_token ad hoc (switch on first character ",
consume and add all letters to the returnvalue until you find a ",
and return your value, and stripped string.

The back and forth stringwalking here is horrible, as wikll runtime
complexity be, but for shortish tokens it should be doable.


On Wed, Oct 5, 2011 at 8:02 PM, Petite Abeille <[hidden email]> wrote:

> Hello,
>
> What would be a reasonable way to, hmmm, quote unquoted tokens?
>
> Say a token is an uninterrupted sequence of alphanumeric characters (%w) or a quoted token. A quoted token is any sequence of characters inside quotes, minus any quote characters.
>
> For example:
>
> 'foo "hello world" bar'
>
> How to turn the above into:
>
> '"foo" "hello world" "bar"'
>
> Thoughts?
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: quoting unquoted token?

Petite Abeille
In reply to this post by Peter Odding-3

On Oct 5, 2011, at 10:25 PM, Peter Odding wrote:

> You can do this in plain Lua but it's not nice, I would prefer LPeg:

Thank you very much for the example. And yes, LPeg is on my "to learn" list :)

Here is my feeble attempt in plain Lua:

local aLine = 'foo "hello world" bar'

print( 1, aLine )

aLine = aLine:gsub( '(%b"")', function( aValue ) return aValue:gsub( ' ', '\\032' ) end )
aLine = aLine:gsub( '([^ ]+)', function( aValue ) if not aValue:find( '^"' ) then return ( '"%s"' ):format( aValue ) else return aValue:gsub( '\\032', ' ' ) end end )

print( 2, aLine )

for aToken in aLine:gmatch( '(%b"")' ) do
  print( 3, aToken )
end

> 1 foo "hello world" bar
> 2 "foo" "hello world" "bar"
> 3 "foo"
> 3 "hello world"
> 3 "bar"

In other words, first, encode all the quoted space - assuming the space character is the token separator. Second, quote all the sequences between spaces, if not already quoted.

Rather clunky, but it seems to be doing the bare minimum.

> One notable advantage of LPeg is that it's dead easy to extend this example with support for escape sequences and stuff like that :-)

Yes, LPeg looks very powerful. Need to invest some time in it :))


Reply | Threaded
Open this post in threaded view
|

Re: quoting unquoted token?

Duncan Cross
In reply to this post by Martijn Hoekstra
On Wed, Oct 5, 2011 at 9:29 PM, Martijn Hoekstra
<[hidden email]> wrote:
> Depends on how much sophistication you want. Something like this could
> probably be done fastest with a quick simple parser. Runtime
> performance wouldn't be great, but it would be fairly straightforward.

LPEG has already been suggested, and I would definitely agree with
that. However, if sticking to standard functions is preferable, here
is my attempt at a generic iterator-based tokenizer:


 local function itokens_aux(str, startpos)
   local token, nextpos = string.match(str, '^%s*"(.-)"()', startpos)
   if not token then
     token, nextpos = string.match(str, '^%s*(%w+)()', startpos)
   end
   return nextpos, token
 end

 function itokens(str)
   return itokens_aux, str, 1
 end


Note that the first value generated by the iterator is not useful and
should be ignored by assigning it to _.
So, an example of usage:


 t = {}
 for _, token in itokens [[foo "hello world" bar]] do
   t[#t+1] = [["]] .. token .. [["]]
 end
 print(table.concat(t, ' '))


-Duncan

Reply | Threaded
Open this post in threaded view
|

Re: quoting unquoted token?

Luiz Henrique de Figueiredo
In reply to this post by Peter Odding-3
> You can do this in plain Lua but it's not nice, I would prefer LPeg:

That's one reason I wrote my lcl in C. See
        http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/#lcl

(BTW, I have been working on a new version of lcl but it seems no one
is using it, so I nevr finished it. Perhaps I will when 5.2 is out.)

Reply | Threaded
Open this post in threaded view
|

RE: quoting unquoted token?

Dirk Laurie
In reply to this post by Peter Odding-3
Peter Odding wrote

>> For example:
>>
> >'foo "hello world" bar'
>>
>> How to turn the above into:
>>
>> '"foo" "hello world" "bar"'
>>
>> Thoughts?
>
> You can do this in plain Lua but it's not nice, I would prefer LPeg:

If you demand full Lua quoting inside your source, certainly, it's not nice.  If you use only
the double-quote as in your example, I prefer Lua.

~~~~
local append=table.insert

function tokenize(s)
-- Turn 'foo "hello world" bar' into {'foo','hello world','bar'}
    local t={}
    while #s>0 do
        start, quoted, stop = s:match('(.*)(%b"")(.*)')
        if not stop then break end
        for item in start:gmatch('%S+') do append(t,'"'..item..'"') end
        append(t,quoted)
        s = stop
        end
    for item in s:gmatch('%S+') do append(t,'"'..item..'"') end
    return t
    end

print (table.concat(tokenize 'foo "hello world" bar', ' ')) -- "foo" "hello world" "bar"
~~~~

Dirk