Special characters in patterns: what about \?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Special characters in patterns: what about \?

Frank Küster
Hi,

is this a bug or am I misunderstanding anything?  According to the
reference manual, 

,----
| x: (where x is not one of the magic characters ^$()%.[]*+-?)
|    represents the character x itself.
`----

so \ is not mentioned as a special character.  However, in the "not set"
pattern [^\] it seems to "escape" the closing square bracket:

***********************
#!/usr/bin/lua

function deluaficate(oldpat)
   local newpat
   newpat = string.gsub(oldpat,'([^\])%-','%1%%%-')
   print (newpat)
   return newpat
end

deluaficate('lm-hist')
***********************

    (this is supposed to be the first step in using a user-supplied
    filename, which might contain a hyphen, as a pattern for
    string.match(); I want to give power users full access to lua patterns,
    therefore '\-' should not be replaced). 

Surprisingly, this gives:

$ ./deluaficate 
/usr/bin/lua: ./deluaficate:5: malformed pattern (missing ']')
stack traceback:
	[C]: in function 'gsub'
	./deluaficate:5: in function 'deluaficate'
	./deluaficate:10: in main chunk
	[C]: ?
~$ lua -v
Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio


What am I missing?

Regards, Frank

-- 
Frank Küster
Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich
Debian Developer (teTeX/TeXLive)


Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

David Kastrup
Frank Küster <[hidden email]> writes:

> Hi,
>
> is this a bug or am I misunderstanding anything?  According to the
> reference manual, 
>
> ,----
> | x: (where x is not one of the magic characters ^$()%.[]*+-?)
> |    represents the character x itself.
> `----
>
> so \ is not mentioned as a special character.  However, in the "not set"
> pattern [^\] it seems to "escape" the closing square bracket:
>
> ***********************
> #!/usr/bin/lua
>
> function deluaficate(oldpat)
>    local newpat
>    newpat = string.gsub(oldpat,'([^\])%-','%1%%%-')
>    print (newpat)
>    return newpat
> end

dak@lisa:~$ lua
Lua 5.1.1  Copyright (C) 1994-2006 Lua.org, PUC-Rio
> print('([^\])%-')
([^])%-
> 
dak@lisa:~$ 

The backslash is already gone from the string before the pattern gets
interpreted, and ] as the first element in a set is taken literally.

-- 
David Kastrup


Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Jools
In reply to this post by Frank Küster

I think the problem is that you need to escape the \ with another \.


([^\])%-','%1%%%- needs to be ([^\\])%-','%1%%%-

You can see the logic for this in llex.c function read_string().

Cheers,

--Jools


On 6/27/07, Frank Küster <[hidden email]> wrote:
Hi,

is this a bug or am I misunderstanding anything?  According to the
reference manual,

,----
| x: (where x is not one of the magic characters ^$()%.[]*+-?)
|    represents the character x itself.
`----

so \ is not mentioned as a special character.  However, in the "not set"
pattern [^\] it seems to "escape" the closing square bracket:

***********************
#!/usr/bin/lua

function deluaficate(oldpat)
   local newpat
   newpat = string.gsub(oldpat,'([^\])%-','%1%%%-')
   print (newpat)
   return newpat
end

deluaficate('lm-hist')
***********************

    (this is supposed to be the first step in using a user-supplied
    filename, which might contain a hyphen, as a pattern for
    string.match(); I want to give power users full access to lua patterns,
    therefore '\-' should not be replaced).

Surprisingly, this gives:

$ ./deluaficate
/usr/bin/lua: ./deluaficate:5: malformed pattern (missing ']')
stack traceback:
        [C]: in function 'gsub'
        ./deluaficate:5: in function 'deluaficate'
        ./deluaficate:10: in main chunk
        [C]: ?
~$ lua -v
Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio


What am I missing?

Regards, Frank

--
Frank Küster
Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich
Debian Developer (teTeX/TeXLive)

Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Mauro Iazzi
In reply to this post by Frank Küster
The backslash is still considered escape at string constructor level.
It is discarded before it arrives to gsub. Then gsub does not even
know it was there and interprets the ^ as negating next char (it needs
at least one). Anytime you want a backslash in a string just insert it
twice.

Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Philippe Lhoste
On 27/06/2007 17:14, Mauro Iazzi wrote:
The backslash is still considered escape at string constructor level.
It is discarded before it arrives to gsub. Then gsub does not even
know it was there and interprets the ^ as negating next char (it needs
at least one). Anytime you want a backslash in a string just insert it
twice.

Or just use literal strings, they are great for such use (and for Windows paths!)

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --


Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Frank Küster
In reply to this post by Mauro Iazzi
"Mauro Iazzi" <[hidden email]> wrote:

> The backslash is still considered escape at string constructor level.
> It is discarded before it arrives to gsub. Then gsub does not even
> know it was there and interprets the ^ as negating next char (it needs
> at least one). Anytime you want a backslash in a string just insert it
> twice.

Ah, indeed.  I should once more read the manual from the beginning...

Regards, Frank

-- 
Frank Küster
Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich
Debian Developer (teTeX/TeXLive)


Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Luiz Henrique de Figueiredo
In reply to this post by Philippe Lhoste
> Or just use literal strings, they are great for such use (and for 
> Windows paths!)

You mean "long strings", which are a form of literal strings. The other is
"short string", which are quoted with "..." or '...'.

Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Tony Finch
In reply to this post by Mauro Iazzi
On Wed, 27 Jun 2007, Mauro Iazzi wrote:

> The backslash is still considered escape at string constructor level.
> It is discarded before it arrives to gsub. Then gsub does not even
> know it was there and interprets the ^ as negating next char (it needs
> at least one). Anytime you want a backslash in a string just insert it
> twice.

As an aside, I have to say that the idea of using % for escaping in Lua
patterns is a pretty neat way of reducing the multi-layer escape problem
that's common in other languages (such as emacs lisp and exim configs).

Tony.
-- 
f.a.n.finch  <[hidden email]>  http://dotat.at/
SOUTHEAST ICELAND: NORTH 6 OR 7, OCCASIONALLY GALE 8 AT FIRST. ROUGH OR VERY
ROUGH. SHOWERS. GOOD.

Reply | Threaded
Open this post in threaded view
|

Re: Special characters in patterns: what about \?

Philippe Lhoste
In reply to this post by Luiz Henrique de Figueiredo
On 27/06/2007 18:17, Luiz Henrique de Figueiredo wrote:
Or just use literal strings, they are great for such use (and for Windows paths!)

You mean "long strings", which are a form of literal strings. The other is
"short string", which are quoted with "..." or '...'.

Ah, I always used 'literal string' as synonymous of 'long strings', because they always take 'literally' everything you put inside, without trying to convert escape sequences or whatnot. If the usage of this expression is incorrect, I should try and avoid its use.

The expression is used in the Lua lexer of Scintilla, which is the probable source of my confusion. Indeed, I see in the manual:
"Literal strings can be delimited by matching single or double quotes"
I suppose it means strings given explicitly in the script, as opposed to built strings.

OK, so now there will be "long strings" only for such construct.
Thanks for pointing out my incorrect usage. I will suggest and correct related Lua stuff in Scintilla. (Along with the correction of the bug of not highlighting a keyword if it ends the file, which has been recently raised again in the Scintilla list.)

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --