Finding end of string

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Finding end of string

Soni "They/Them" L.
local m, pos
repeat -- TODO fix
   m, pos = string.match(word_eol[2],
"%f[\\"..cw:sub(1,1).."](\\*)"..cw:sub(1,1).."()", pos)
until #m % 2 == 0

I'm having issues making this work.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Dirk Laurie-2
2017-10-10 14:52 GMT+02:00 Soni L. <[hidden email]>:
> local m, pos
> repeat -- TODO fix
>   m, pos = string.match(word_eol[2],
> "%f[\\"..cw:sub(1,1).."](\\*)"..cw:sub(1,1).."()", pos)
> until #m % 2 == 0
>
> I'm having issues making this work.

The title of the post describes a trivial problem (answer: #str+1).

I can't work by reading it what problem the code tries to solve.

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Soni "They/Them" L.


On 2017-10-10 02:44 PM, Dirk Laurie wrote:

> 2017-10-10 14:52 GMT+02:00 Soni L. <[hidden email]>:
>> local m, pos
>> repeat -- TODO fix
>>    m, pos = string.match(word_eol[2],
>> "%f[\\"..cw:sub(1,1).."](\\*)"..cw:sub(1,1).."()", pos)
>> until #m % 2 == 0
>>
>> I'm having issues making this work.
> The title of the post describes a trivial problem (answer: #str+1).
>
> I can't work by reading it what problem the code tries to solve.
>

Sorry, I meant end of string literal.

After a few hours I came up with this, which works:

local m, pos
repeat
   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
until m == nil or #m % 2 == 0

(cw:sub(1,1) being one of " or ')

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Dirk Laurie-2
function end_of_string_literal(word,quote)
  local m,pos
  repeat
    m, pos = string.match(word, "(\\*)"..quote.."()", pos or 2)
  until m == nil or #m % 2 == 0
  return pos
end

end_of_string_literal("abc'defgh'ijkl","'")
5

Should not the answer rather be 11?

And would not

function end_of_string_literal(word,quote)
  local pattern = '()%b' .. quote:rep(2) ..'()'
  return select(2,word:match(pattern))
end

be somewhat simpler?



2017-10-10 20:25 GMT+02:00 Soni L. <[hidden email]>:

>
>
> On 2017-10-10 02:44 PM, Dirk Laurie wrote:
>>
>> 2017-10-10 14:52 GMT+02:00 Soni L. <[hidden email]>:
>>>
>>> local m, pos
>>> repeat -- TODO fix
>>>    m, pos = string.match(word_eol[2],
>>> "%f[\\"..cw:sub(1,1).."](\\*)"..cw:sub(1,1).."()", pos)
>>> until #m % 2 == 0
>>>
>>> I'm having issues making this work.
>>
>> The title of the post describes a trivial problem (answer: #str+1).
>>
>> I can't work by reading it what problem the code tries to solve.
>>
>
> Sorry, I meant end of string literal.
>
> After a few hours I came up with this, which works:
>
> local m, pos
> repeat
>   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
> until m == nil or #m % 2 == 0
>
> (cw:sub(1,1) being one of " or ')
>
>
> --
> Disclaimer: these emails may be made public at any given time, with or
> without reason. If you don't agree with this, DO NOT REPLY.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Soni "They/Them" L.


On 2017-10-10 04:13 PM, Dirk Laurie wrote:

> function end_of_string_literal(word,quote)
>    local m,pos
>    repeat
>      m, pos = string.match(word, "(\\*)"..quote.."()", pos or 2)
>    until m == nil or #m % 2 == 0
>    return pos
> end
>
> end_of_string_literal("abc'defgh'ijkl","'")
> 5
>
> Should not the answer rather be 11?
>
> And would not
>
> function end_of_string_literal(word,quote)
>    local pattern = '()%b' .. quote:rep(2) ..'()'
>    return select(2,word:match(pattern))
> end
>
> be somewhat simpler?

It says "end". It implies you already know the start.

"\""

>
>
>
> 2017-10-10 20:25 GMT+02:00 Soni L. <[hidden email]>:
>>
>> On 2017-10-10 02:44 PM, Dirk Laurie wrote:
>>> 2017-10-10 14:52 GMT+02:00 Soni L. <[hidden email]>:
>>>> local m, pos
>>>> repeat -- TODO fix
>>>>     m, pos = string.match(word_eol[2],
>>>> "%f[\\"..cw:sub(1,1).."](\\*)"..cw:sub(1,1).."()", pos)
>>>> until #m % 2 == 0
>>>>
>>>> I'm having issues making this work.
>>> The title of the post describes a trivial problem (answer: #str+1).
>>>
>>> I can't work by reading it what problem the code tries to solve.
>>>
>> Sorry, I meant end of string literal.
>>
>> After a few hours I came up with this, which works:
>>
>> local m, pos
>> repeat
>>    m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
>> until m == nil or #m % 2 == 0
>>
>> (cw:sub(1,1) being one of " or ')
>>
>>
>> --
>> Disclaimer: these emails may be made public at any given time, with or
>> without reason. If you don't agree with this, DO NOT REPLY.
>>
>>

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Egor Skriptunoff-2
On Tue, Oct 10, 2017 at 10:19 PM, Soni L. wrote:
After a few hours I came up with this, which works:

local m, pos
repeat
   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
until m == nil or #m % 2 == 0

(cw:sub(1,1) being one of " or ')

 
Should not the answer rather be 11?

And would not

function end_of_string_literal(word,quote)
   local pattern = '()%b' .. quote:rep(2) ..'()'
   return select(2,word:match(pattern))
end

be somewhat simpler?

It says "end". It implies you already know the start.

"\""



function end_of_string_literal (text, start_pos, quote)
   return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
end
 
print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))  --> 8
print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))  --> 20
print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))  --> 26


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Martin
On 10/12/2017 07:56 PM, Egor Skriptunoff wrote:
> function end_of_string_literal (text, start_pos, quote)
>    return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
> end
>  
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))  --> 8
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))  --> 20
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))  --> 26

What task this code solves?

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

John Logsdon
In reply to this post by Egor Skriptunoff-2
What is wrong with

str:find("$")-1

or string.find(str,"$")-1

?



> On Tue, Oct 10, 2017 at 10:19 PM, Soni L. wrote:
>
>> After a few hours I came up with this, which works:
>>>
>>> local m, pos
>>> repeat
>>>    m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos
>>> or
>>> 2)
>>> until m == nil or #m % 2 == 0
>>>
>>> (cw:sub(1,1) being one of " or ')
>>>
>>>
>
>> Should not the answer rather be 11?
>>>
>>> And would not
>>>
>>> function end_of_string_literal(word,quote)
>>>    local pattern = '()%b' .. quote:rep(2) ..'()'
>>>    return select(2,word:match(pattern))
>>> end
>>>
>>> be somewhat simpler?
>>>
>>
>> It says "end". It implies you already know the start.
>>
>> "\""
>>
>>
>
> function end_of_string_literal (text, start_pos, quote)
>    return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
> end
>
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))  -->
> 8
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))  -->
> 20
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))  -->
> 26
>


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Egor Skriptunoff-2
In reply to this post by Martin
On Thu, Oct 12, 2017 at 10:16 PM, Martin wrote:
On 10/12/2017 07:56 PM, Egor Skriptunoff wrote:
> function end_of_string_literal (text, start_pos, quote)
>    return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
> end
>
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))  --> 8
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))  --> 20
> print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))  --> 26

What task this code solves?


The task is to simplify Sony's code:

local m, pos
repeat
  m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
until m == nil or #m % 2 == 0

(cw:sub(1,1) being one of " or ')

What this code does?
It is probably a part of some parser (or should I say scanner?)
A text (in variable word_eol[2]) starts with quote-delimited string literal (the quote is cw:sub(1,1))
This code finds the position where the string literal terminates.
String literal syntax implied here is allowing backslash escaping.
Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Dirk Laurie-2
2017-10-14 7:38 GMT+02:00 Egor Skriptunoff <[hidden email]>:

> The task is to simplify Sony's code:

I would have participated more enthusiastically if the OP had provided
(a) a decent description of what the code is supposed to do
(b) a nontrivial example of input and expected ouptut
(c) an indication of what word_eol[2] means

You have now provided (a) and (c), but my enthusiasm flickered
out long ago.

> local m, pos
> repeat
>   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
> until m == nil or #m % 2 == 0
>
> (cw:sub(1,1) being one of " or ')
>
> What this code does?
> It is probably a part of some parser (or should I say scanner?)
> A text (in variable word_eol[2]) starts with quote-delimited string literal
> (the quote is cw:sub(1,1))
> This code finds the position where the string literal terminates.
> String literal syntax implied here is allowing backslash escaping.

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Sean Conner
In reply to this post by Egor Skriptunoff-2
It was thus said that the Great Egor Skriptunoff once stated:

> On Thu, Oct 12, 2017 at 10:16 PM, Martin wrote:
>
> > On 10/12/2017 07:56 PM, Egor Skriptunoff wrote:
> > > function end_of_string_literal (text, start_pos, quote)
> > >    return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
> > > end
> > >
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))
> > --> 8
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))
> > --> 20
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))
> > --> 26
> >
> > What task this code solves?
> >
> >
> The task is to simplify Sony's code:
>
> local m, pos
> repeat
>   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
> until m == nil or #m % 2 == 0
>
> (cw:sub(1,1) being one of " or ')
>
> What this code does?
> It is probably a part of some parser (or should I say *scanner*?)
> A text (in variable word_eol[2]) starts with quote-delimited string literal
> (the quote is cw:sub(1,1))
> This code finds the position where the string literal terminates.
> String literal syntax implied here is allowing backslash escaping.

  First off, %z was deprecated in Lua 5.2 (see section 8.2), and it's not
mentioned at all in the Lua 5.3 manual (although my version of Lua 5.3
does run the above code).  Here is some code that works (and maybe even Soni
would like it, as it's not limited to '"' as the quote character---it can be
any string, and said string can appear escaped!):

local lpeg = require "lpeg"
local Carg = lpeg.Carg
local Cmt  = lpeg.Cmt
local P    = lpeg.P
local S    = lpeg.S
local V    = lpeg.V

-- **********************************************************************
-- Compare the next bit if input with our quote character.  We can't use
-- string.find(), as that scans ahead in the string.  I don't use
-- string.match() because otherwise, I would have to scan the quote string
-- and escape any special characters.
-- **********************************************************************

local function mq(subject,position,quote)
  if quote == subject:sub(position,position + #quote - 1) then
    return position + #quote
  end
end

-- **********************************************************************
-- Our LPeg grammar.  It expects a <quotechar> (passed in to the grammar),
-- and a sequence of characters.  The <quotechar> (which can be any length)
-- can be esscaped by '\'.  Try that using normal Lua patterns!
-- **********************************************************************  

local qs = P {
  "string",
  char = P[[\]] * Cmt(Carg(1),mq) -- match \<quotechar>
       * V"qs"                    -- and qs (forward reference)
       * P[[\]] * Cmt(Carg(1),mq) -- and end with \<quotechar>
       + P[[\]] * P(1)            -- or escape char
       + (P(1) - (P[[\]] + Cmt(Carg(1),mq))), -- or character other than \ or <quotechar>
       
  qs = V"char"^0, -- any number of chars (see above)
 
  string = Cmt(Carg(1),mq)  -- match our quote character
         * V"qs"            -- plus a qs
         * Cmt(Carg(1),mq), -- and finally our quote character
 
}

function eosl(text,pos,quote)
  return qs:match(text,pos or 1,quote or '"')
end

print(eosl [["This" should return 7]])
print(eosl([[<q>This<q> should return 11]],1,"<q>"))
print(eosl([[<q>a\<q>b\<q>c<q>d]],1,"<q>"))
print(eosl [["This \"string\" here" should return 23]])
print(eosl [["This \"really \\\"embedded\\\" string\" here" should return 47]])
print(eosl([[<q>This \<q>embedded\<q> string<q> returns 35]],1,"<q>"))
print(eosl([[This here "string" should return 19]],11))
print(eosl [["This""is" 7]])

  -spc (But I again failed to read Soni's mind so this is probably incorrect
        somehow ...)

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Egor Skriptunoff-2
On Sat, Oct 14, 2017 at 11:13 AM, Sean Conner wrote:

  First off, %z was deprecated in Lua 5.2 (see section 8.2), and it's not
mentioned at all in the Lua 5.3 manual (although my version of Lua 5.3
does run the above code).

I hope %z and %Z patterns will stay forever in Lua:
1) They are very handy in use (as binary zero is a very special symbol deserving its own pattern).
2) Without %z and %Z we will not be able to write a binary-zeroes-related pattern working in both Lua 5.1 and Lua 5.3.

Removing %z and %Z from Lua has only disadvantages.
Actually, we will have to complicate our scripts to check Lua version in runtime and apply version-specific pattern to make our code work in all Lua versions.
That's a headache.

I see no benefit of removing %z and %Z from Lua .

 
  Here is some code that works (and maybe even Soni
would like it, as it's not limited to '"' as the quote character---it can be
any string, and said string can appear escaped!):


Your code does not look like a simplification of Sony's code  ;-)

Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Soni "They/Them" L.
In reply to this post by Dirk Laurie-2


On 2017-10-14 05:08 AM, Dirk Laurie wrote:

> 2017-10-14 7:38 GMT+02:00 Egor Skriptunoff <[hidden email]>:
>
>> The task is to simplify Sony's code:
> I would have participated more enthusiastically if the OP had provided
> (a) a decent description of what the code is supposed to do
> (b) a nontrivial example of input and expected ouptut
> (c) an indication of what word_eol[2] means
>
> You have now provided (a) and (c), but my enthusiasm flickered
> out long ago.

The Lua test suite is a good place to find nontrivial examples of input
and expected output, at least once the string goes through the rest of
the parser, which turns the string literal into a string.

>
>> local m, pos
>> repeat
>>    m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
>> until m == nil or #m % 2 == 0
>>
>> (cw:sub(1,1) being one of " or ')
>>
>> What this code does?
>> It is probably a part of some parser (or should I say scanner?)
>> A text (in variable word_eol[2]) starts with quote-delimited string literal
>> (the quote is cw:sub(1,1))
>> This code finds the position where the string literal terminates.
>> String literal syntax implied here is allowing backslash escaping.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Finding end of string

Sean Conner
In reply to this post by Egor Skriptunoff-2
It was thus said that the Great Egor Skriptunoff once stated:

> On Sat, Oct 14, 2017 at 11:13 AM, Sean Conner wrote:
> >
> >   First off, %z was deprecated in Lua 5.2 (see section 8.2), and it's not
> > mentioned at all in the Lua 5.3 manual (although my version of Lua 5.3
> > does run the above code).
>
> Removing %z and %Z from Lua has only disadvantages.
> Actually, we will have to complicate our scripts to check Lua version in
> runtime and apply version-specific pattern to make our code work in all Lua
> versions.
> That's a headache.
>
> I see no benefit of removing %z and %Z from Lua .

  I have no opinion on this.  I've never used %z/%Z in my code.  Just a data
point in this discussion.

> >   Here is some code that works (and maybe even Soni would like it, as
> > it's not limited to '"' as the quote character---it can be any string,
> > and said string can appear escaped!):
> >
> Your code does not look like a simplification of Sony's code  ;-)

  If by "simplification" you mean "shorter than
 
  text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
 
then yes, that is true.  But there are issues with that code frament.
First, you are generating a modified string of the original data.  Second,
there are failure modes, for instance:

        print(end_of_string_literal ('"\0\0" is the string',2,'"'))

should print 5, not 3.  Admittedly, this is a corner case, but the code I
presented does handle it correctly.  Third, I don't think the as presented
is all that difficult.  Sure, it may look unfamiliar, but the code, with the
LPeg manual [1] as reference, one should be able to puzzle out how the code
works (like I had to do with your version).

  -spc (The LPeg version has the added benefit of being composible with
        other LPeg expressions, something that the one based on Lua patterns
        cannot do.)

[1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html