Ambiguous syntax

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Ambiguous syntax

Dirk Laurie-2
Can someone who understands the matter please explain
why Lua 5.1 sometimes gave the message

    ambiguous syntax (function call x new statement)

whereas Lua 5.2 never does?

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Coda Highland
On Tue, Dec 11, 2012 at 10:00 PM, Dirk Laurie <[hidden email]> wrote:
> Can someone who understands the matter please explain
> why Lua 5.1 sometimes gave the message
>
>     ambiguous syntax (function call x new statement)
>
> whereas Lua 5.2 never does?
>

I think I remember hearing about 5.2 arbitrarily defining a resolution
for the grammatical ambiguity. Do you have an example of code that
triggers this?

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Peter Loveday
This does it in 5.1:

print
("hello")

- Peter


-----Original Message-----
From: Coda Highland
Sent: Wednesday, December 12, 2012 5:10 PM
To: Lua mailing list
Subject: Re: Ambiguous syntax

On Tue, Dec 11, 2012 at 10:00 PM, Dirk Laurie <[hidden email]> wrote:
> Can someone who understands the matter please explain
> why Lua 5.1 sometimes gave the message
>
>     ambiguous syntax (function call x new statement)
>
> whereas Lua 5.2 never does?
>

I think I remember hearing about 5.2 arbitrarily defining a resolution
for the grammatical ambiguity. Do you have an example of code that
triggers this?

/s/ Adam



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2221 / Virus Database: 2634/5452 - Release Date: 12/11/12


Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Sven Olsen
There's no honest ambiguity in the syntax -- it's just a question of protecting against likely bugs.

f()
(g or h)()

is naturally read as 2 statements, though given Lua's grammar it should be parsed as just one.  Lua 5.1 was newline sensitive: it would throw an error if an otherwise complete expression was continued by a newline, followed by an open paren.  Lua 5.2 removed the check.

My feeling is that this was a bad change -- I've never come across a case where the error was triggered by bug-free code.  I've actually changed my own 5.2 parser to restore 5.1's newline sensitivity, though, rather than throwing an error, I just terminate the expression -- and I wrote a long post about the pros and cons of this idea, a few months back :)

-Sven

On Tuesday, December 11, 2012, Peter Loveday wrote:
This does it in 5.1:

print
("hello")

- Peter


-----Original Message----- From: Coda Highland
Sent: Wednesday, December 12, 2012 5:10 PM
To: Lua mailing list
Subject: Re: Ambiguous syntax

On Tue, Dec 11, 2012 at 10:00 PM, Dirk Laurie <[hidden email]> wrote:
Can someone who understands the matter please explain
why Lua 5.1 sometimes gave the message

    ambiguous syntax (function call x new statement)

whereas Lua 5.2 never does?


I think I remember hearing about 5.2 arbitrarily defining a resolution
for the grammatical ambiguity. Do you have an example of code that
triggers this?

/s/ Adam



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2221 / Virus Database: 2634/5452 - Release Date: 12/11/12

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Roberto Ierusalimschy
> There's no honest ambiguity in the syntax -- it's just a question of
> protecting against likely bugs.
>
> f()
> (g or h)()
>
> is naturally read as 2 statements, though given Lua's grammar it should be
> parsed as just one.

I think Lua's grammar allows that code to be read as two statements; it
is a very "honest" ambiguity.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Sven Olsen

I think Lua's grammar allows that code to be read as two statements; it is a very "honest" ambiguity.

Yes, it looks like I am, in fact, exactly wrong about this.  My example is plenty ambiguous -- though I think Peter's isn't?

I'm still curious why the check was taken out in 5.2.  It did seem like a useful feature.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Roberto Ierusalimschy
> I'm still curious why the check was taken out in 5.2.  It did seem like a
> useful feature.

As shown here, the check was confusing anyway. There are several valid
situations where you may want a newline between the function and the
parameters.  Moreover, in 5.2, you can always add a colon before the
statement to avoid ambiguity:

;(g or h)()   -- always valid in 5.2, not in 5.1

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Javier Guerra Giraldez
On Wed, Dec 12, 2012 at 10:17 AM, Roberto Ierusalimschy
<[hidden email]> wrote:
> As shown here, the check was confusing anyway. There are several valid
> situations where you may want a newline between the function and the
> parameters.  Moreover, in 5.2, you can always add a colon before the
> statement to avoid ambiguity:

so, the new syntax means

f()
(g or h)()

would always compile to a single statement without error, and (likely)
fail at runtime if f() doesn't return a function.  right?

if so, what would be the rule of thumb to prevent this?  i'm guessing
something like "if your statement line starts with an opening
parenthesis, better add a semicolon before it"

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Roberto Ierusalimschy
> if so, what would be the rule of thumb to prevent this?  i'm guessing
> something like "if your statement line starts with an opening
> parenthesis, better add a semicolon before it"

Yes.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Rapin Patrick
In reply to this post by Sven Olsen
 
f()
(g or h)()

is naturally read as 2 statements, though given Lua's grammar it should be parsed as just one.  Lua 5.1 was newline sensitive: it would throw an error if an otherwise complete expression was continued by a newline, followed by an open paren.  Lua 5.2 removed the check.

My feeling is that this was a bad change -- I've never come across a case where the error was triggered by bug-free code.
 
Well, in my LuaBrainFuck library [1] the Lua 5.2 disambiguation is heavily used !
Although I do not pretend the this library is very useful for everyday programming.
Look at the "99 bottles of beer" bracket mode code [2] for example.
There are a number of lines ending with a closing parenthesis while the following lines open a new parenthesis.

[1] https://github.com/prapin/LuaBrainFuck
[2] https://github.com/prapin/LuaBrainFuck/blob/master/test/99-bottles-bracket.lua

--
-- Patrick Rapin
-- coauthor of "Le guide de Lua et ses applications", D-BookeR


Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Pierre-Yves Gérardy
In reply to this post by Roberto Ierusalimschy
On Wed, Dec 12, 2012 at 4:17 PM, Roberto Ierusalimschy
<[hidden email]> wrote:
> ;(g or h)()   -- always valid in 5.2, not in 5.1

... except as the first statement of a block in Lua 5.1.

-- Pierre-Yves

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Roberto Ierusalimschy
> On Wed, Dec 12, 2012 at 4:17 PM, Roberto Ierusalimschy
> <[hidden email]> wrote:
> > ;(g or h)()   -- always valid in 5.2, not in 5.1
>
> ... except as the first statement of a block in Lua 5.1.

Also after a line ending with semicolon:

  a = 1;
  ;(g or h)()  -- valid in 5.2, not in 5.1

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Ambiguous syntax

Sven Olsen
In reply to this post by Roberto Ierusalimschy

There are several valid situations where you may want a newline between the function and the parameters.

Certainly, but 5.1's ambiguous syntax check rarely got in the way of this.  It was safe to write:

    object.member_function(
        some, collection, of, parameters)

The case that triggered a false positive was:

    object.member_function
        (some, collection, of, parameters)

I haven't often seen the second pattern used in the wild -- though for programmers who prefer it, I'm sure 5.1's ambiguous syntax error would have been an annoyance.

The need to scatter semicolons through otherwise very clean Lua code is also an annoyance though.  While it was certainly less elegant, I think 5.1's handling had real practical advantages over 5.2's --  if you forget to about the newline handling in 5.1, you'll get a parse error, but if you forget to add a semicolon in 5.2, you'll get a runtime bug.  Parse errors tend to be far easier to detect and fix.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Robert Virding
Of course a simple way to remove this ambiguity would be to require ';' between statements or to terminate statements. Of course this would require people to make large increases in the size of their files but it might be worth it to remove the ambiguity.

Luerl parses them as one call, strictly left-to-right until there is nothing more which can be added to the call.

Robert


From: "Sven Olsen" <[hidden email]>
To: "Lua mailing list" <[hidden email]>
Sent: Thursday, 13 December, 2012 3:54:21 PM
Subject: Ambiguous syntax


There are several valid situations where you may want a newline between the function and the parameters.

Certainly, but 5.1's ambiguous syntax check rarely got in the way of this.  It was safe to write:

    object.member_function(
        some, collection, of, parameters)

The case that triggered a false positive was:

    object.member_function
        (some, collection, of, parameters)

I haven't often seen the second pattern used in the wild -- though for programmers who prefer it, I'm sure 5.1's ambiguous syntax error would have been an annoyance.

The need to scatter semicolons through otherwise very clean Lua code is also an annoyance though.  While it was certainly less elegant, I think 5.1's handling had real practical advantages over 5.2's --  if you forget to about the newline handling in 5.1, you'll get a parse error, but if you forget to add a semicolon in 5.2, you'll get a runtime bug.  Parse errors tend to be far easier to detect and fix.

-Sven

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Coda Highland
On Thu, Dec 13, 2012 at 9:04 AM, Robert Virding
<[hidden email]> wrote:
> Of course a simple way to remove this ambiguity would be to require ';'
> between statements or to terminate statements. Of course this would require
> people to make large increases in the size of their files but it might be
> worth it to remove the ambiguity.

Which would break source compatibility for virtually all scripts. WAY
too late in the language's development for that kind of proposal.

> Luerl parses them as one call, strictly left-to-right until there is nothing
> more which can be added to the call.

Which is what 5.2 does as well.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Miles Bader-2
In reply to this post by Sven Olsen
Sven Olsen <[hidden email]> writes:
> The case that triggered a false positive was:
>
>     object.member_function
>         (some, collection, of, parameters)
>
> I haven't often seen the second pattern used in the wild -- though
> for programmers who prefer it, I'm sure 5.1's ambiguous syntax error
> would have been an annoyance.

I've seen such syntax fairly often in C code (obviously not in Lua
code!)...

-miles

--
Bore, n. A person who talks when you wish him to listen.

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Dan
In reply to this post by Sven Olsen


2012/12/13 Sven Olsen <[hidden email]>
    object.member_function
        (some, collection, of, parameters)

I haven't often seen the second pattern used in the wild -- though for programmers who prefer it, I'm sure 5.1's ambiguous syntax error would have been an annoyance.

I haven't seen it as shown above but I've seen the below quite often (especially when there are lots of verbose arguments)

object:DrawText2d
(
    0,
    0,
    "Hello world",
    Color:Create(1,1,1,1),
    Align(center, center)
)

The soil image loading library has this type of call in it's documentation http://www.lonesock.net/soil.html (of course it's not Lua because otherwise the author would have run into the issue of it being invalid in 5.1)

I prefer 5.2, only as it seems more consistent, previously you could write a function that takes a table as a first argument like this:

myfunction
{
    arg1 = "value",
    arg2 = "value"
}

but you wouldn't be able to format it the same way if you just had a list of arguments.
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Sven Olsen

I've seen the below quite often (especially when there are lots of verbose arguments)

Well, I can see why the check was removed -- these do look like significant use cases.  But maybe there's a way that we could have the best of both worlds? Perhaps a more cautious check would cut down on the number of troublesome false positives, but still have good odds catching ambiguous syntax bugs?

For example, after matching a prefixexp(), it's easy enough to count the number of additional patterns matched before closing the primaryexp().  So, for very little cost, we could keep the 5.1 check in the parser, but only trigger it in cases where we're extending a primary expression that's already a complete function call or table index.

With that type of check, 

  object:DrawText2d
  (
      0,
      0,
      "Hello world",
      Color:Create(1,1,1,1),
      Align(center, center)
  )

parses just fine, but

  f()
  (g or h)()

still throws an error.  Of course, it's still possible to get a false positive, something like:

FunctionObject.new()
  (args)

will trigger an error, even though it clearly ought to be parsed as a single statement.  But such cases are rare. 

Even so, one might want to change the error message -- just to make it clear that the check can be triggered by false positives.  Something like:  

  "dangerous formatting: function call followed by newline followed by '('."

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Sven Olsen

So, for very little cost, we could keep the 5.1 check in the parser, but only trigger it in cases where we're extending a primary expression that's already a complete function call or table index.

Here's something that may be a better idea:  Only trigger the error once the primary expression reaches some threshold of complexity.  For example, 2+ function calls, one of which has its '(' at the start of a new line.  Then you could error on cases like:

  local a = t[k]
  (f or g)()

but, only emit the error when you start parsing the second '('.  

Without backtracking, I don't think there's any way to guarantee that the check won't sometimes be triggered by false positives.  But, I do think keeping a little logic in the parser to check for particularly dangerous formatting is sensible.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: Ambiguous syntax

Coda Highland
On Sun, Dec 16, 2012 at 11:08 AM, Sven Olsen <[hidden email]> wrote:

>
>> So, for very little cost, we could keep the 5.1 check in the parser, but
>> only trigger it in cases where we're extending a primary expression that's
>> already a complete function call or table index.
>
>
> Here's something that may be a better idea:  Only trigger the error once the
> primary expression reaches some threshold of complexity.  For example, 2+
> function calls, one of which has its '(' at the start of a new line.  Then
> you could error on cases like:
>
>   local a = t[k]
>   (f or g)()
>
> but, only emit the error when you start parsing the second '('.
>
> Without backtracking, I don't think there's any way to guarantee that the
> check won't sometimes be triggered by false positives.  But, I do think
> keeping a little logic in the parser to check for particularly dangerous
> formatting is sensible.
>
> -Sven

Your example is too simple for my tastes. In fact, it seems the entire
heuristic is wrong -- I would be more likely to pull this stunt
BECAUSE the expression is too complicated and thereby needs to be
broken across lines. A case as simple as this example I almost
certainly mean that I want (f or g)() to be evaluated as a separate
statement, but:

local a = module.factories[factoryType].create
  (1, 2, 3,
   4, 5, 6)

seems like a much more likely thing in my coding style.

(Then again, my coding style would put the ( after create and the ) on
its own line -- by MY coding style I would be arguing that the
ambiguous case should ALWAYS resolve to two statements. But I'm not
advocating for forcing my style onto others -- I'm joining in the
discussion on the assumption that there are people who USE this
style.)

Treated as a context-sensitive grammar instead of a context-free one,
it would be theoretically possible to make the decision at run-time
based on the value of the first expression, but this is a Bad Idea. So
throw that one out too.

An error based on ambiguous syntax with valid uses for either form is
also a Bad Idea -- it means that BOTH use cases get hard-flagged as an
error, which just makes for annoyance.

This leaves two variations on one solution: choose one form as the
canonical choice and offer syntax to explicitly select the other.
Either always interpret it as two statements and either offer a
line-extension character or force the use of attached braces, or
always interpret it as one statement and offer a statement-separator
character.

Since Lua already had a statement separator character, it was a clear
choice for PUC-Rio to make.

It would be useful to have some sort of syntax check mode for the Lua
parser that outputs warnings for ambiguous constructions without
actually executing the code; you wouldn't use this for run-time code
but as a testing step for the appropriate audience it would be useful.

/s/ Adam

12