Suggestion: names-isolation-statement

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestion: names-isolation-statement

Egor Skriptunoff-2
A new syntax:
   with white-list-of-names do
      ....
   end
This statement makes all names from the outer scope (with exception of white-listed names) invisible in the inner block.
The statement works for all names independently of what a name is: local/upvalue/global.

Usage example:

   local upv1, upv2 = 111, 222
   local function f()
      local x1, x2 = 11, 22
      with x1, upv1, print do -- hide all outer names except listed here
         print(x1)            -> 11
         print(x2)            -- compile-time error: "x2" is not defined
         print(upv1)          -> 111
         print(upv2)          -- compile-time error: "upv2" is not defined
         print(math.pi)       -- compile-time error: "math" is not defined
         with x1 do           -- nested "with" statement
            x1 = x1 + 1       -- OK
            print(x1)         -- compile-time error: "print" is not defined
         end
      end
      print(x2)               -> 22  (beyond the "with")
   end


Idiom for complete isolation (white-list is empty):

   with do
      -- all outer locals, upvalues and globals are inaccessible here
   end


The white-list must contain only visible names, otherwise compilation error is raised.

   local x
   with a, b, c do  -- OK: varibles a,b,c are not defined, but all globals are visible at this line
      -- "x" is invisible here (neither local not global) because "x" wasn't white-listed
      with x do     -- compile-time error: "x" is not defined
         -- "x" is invisible here because inner "with" can't unhide a name hidden by outer "with"
      end
   end


You can cache/decache globals in the outer scope without warying about the rest of the code
(because the "with" statement doesn't make a distinction between locals/upvalues/globals):

   local print = print  -- you may include or exclude this line
   with print do        -- this statement works correctly independently of whether "print" is global, local or upvalue
      print(42)
   end


Sergey Kovalev complained recently about lack of syntax for defining "pure functions" in Lua.
Due to "with" statement we could define them easily (3 variants):

   Variant #1

      local function pure_func(x, y)
         with x, y do    -- if the function must be recursive, add its name to this white-list
            -- all upvalues and globals are inaccessible here
            ...
         end
      end

   Variant#2

      local pure_func
      with pure_func do  -- we need to include something in the white-list to be able to pass the constructed function value outside the block
         function pure_func(x, y)
            -- all upvalues and globals are inaccessible here
            ...
         end
      end

   Variant#3

      with do      -- white-list is empty, the function value is passed outside the block by "return" statement
         local function pure_func(x, y)
            -- all upvalues and globals are inaccessible here
            ...
         end
         return pure_func
      end


How "with" statement should work with "_ENV" name:
1) If "_ENV" is written explicitly (user wants to directly access the upvalue "_ENV") then such "_ENV" is visible only if it's in white-list.
2) If "_ENV" is used implicitly (user accesses global variable "var", and the preprocessor replaces "var" with "_ENV.var") then this implicit "_ENV" is always accessible, even when "_ENV" is not white-listed.


Some time ago I complained about lack of Lua syntax for solving typo-in-local problem (I suggested "$" and "\" prefixes for globals).
Due to "with" statement we could write code with strict control of globals usage:

   with _ENV do
      -- all outer locals and upvalues (except _ENV) are inaccessible here
      -- global variables can be accessed only as "_ENV.varname"
   end

This solves the "typo-in-local problem": a typo would raise a compile-time error.



I believe the identifier "with" is used not very frequently in Lua programs because it is not a noun/adjective/verb,
so there would be not very much old Lua code broken by introducing this new keyword.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Sergey Kovalev
пт, 12 июл. 2019 г. в 00:16, Egor Skriptunoff <[hidden email]>:
>
> A new syntax:
>    with white-list-of-names do
>       ....
>    end

May be word "explicit" or "only_with" better describe what's going on.
But idea is good.

And instead local <toclose> noname=setmetatable({},{__close=function()
body end})
defer do
  body
end

v
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

v
In reply to this post by Egor Skriptunoff-2
On Fri, 2019-07-12 at 00:15 +0300, Egor Skriptunoff wrote:
> This statement makes all names from the outer scope (with exception
> of white-listed names) invisible in the inner block.
> The statement works for all names independently of what a name is:
> local/upvalue/global.

To put it the other way: only specified free names are accessible.

Suggestion looks fairly good for me and I think shouldn't be too hard
to implement (no VM tweaks required as it's purely syntactic sugar).
One possible extension to it: add <const> (and probably <toclose> as
well?) specifier, so you can make sure that var isn't changed by
accident. Again, as we already have <const> locals, this probably
shouldn't be too hard to add into existing code.


--
v <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Viacheslav Usov
In reply to this post by Egor Skriptunoff-2
On Thu, Jul 11, 2019 at 11:16 PM Egor Skriptunoff <[hidden email]> wrote:

   with white-list-of-names do
>      ....
>   end
This statement makes all names from the outer scope (with exception of white-listed names) invisible in the inner block.

This might solve the long-standing problem with globals.

To avoid introducing the new keyword, and to make things a little less verbose in typical use cases, we could consider a syntax like 

[a. b, c]

that introduces a white list of names, which can appear anywhere a new block (of Lua's grammar) begins, for example:

do [a. b, c]
end

function(x, y, z) [a. b, c]
end

if x < y then [a. b, c]
else [a. b, c]
end

I have chosen square brackets because such a syntax would not be valid in previous versions of Lua, there are certainly other syntactical options to the same effect.

The names thus introduced in the beginning of a block are effective in the entire block, and the effect is otherwise identical with that of Egor's proposal.

Note that every Lua chunk is a block, so [...] in the beginning of a chunk would also work, with the obvious implications on the behaviour of global variables.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

pocomane


Il sab 13 lug 2019, 12:53 Viacheslav Usov <[hidden email]> :

This might solve the long-standing problem with globals.

What about something like the following?

x = 0
y = 1
local print, x = print, x
do <error-on-free-name>
  print(x) -- no issue
  print(_ENV.y) -- no issue
  print(y) -- error at compile time
end

I.e. inside the annotated do/end block, any free name raises an error at compile time, instead of being translated to _ENV.y .

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Egor Skriptunoff-2
In reply to this post by Viacheslav Usov
On Sat, Jul 13, 2019 at 1:53 PM Viacheslav Usov wrote:
to make things a little less verbose in typical use cases, we could consider a syntax like 
[a. b, c]
that introduces a white list of names, which can appear anywhere a new block (of Lua's grammar) begins


May be the syntax should be more verbose (to correspond to overall language look-and-feel)?
   @visible(x, y)
Such annotation might appear even in the middle of a Lua's grammar block
(it is active from the line where it is defined to the end of the current block)
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

nobody
In reply to this post by Egor Skriptunoff-2
On 11/07/2019 23.15, Egor Skriptunoff wrote:
> A new syntax:
>     with white-list-of-names do
>        ....
>     end
> This statement makes all names from the outer scope (with exception of
> white-listed names) invisible in the inner block.

Interesting idea, might be fun.

But these two things in combination create a problem:

> The statement works for all names independently of what a name is:
> local/upvalue/global.

> The white-list must contain only visible names, otherwise
> compilation error is raised.

What about to-be-defined "globals"?  (Will you have to "pre-declare" them?)

And what about this?

   Point2D.length =
     function( _ENV )  with x, y do  return (x^2+y^2)^0.5  end  end

(which could (more or less) equivalently be written as

   function Point2D:length( )  return (self.x^2+self.y^2)^0.5  end

but is shorter / more readable this way and already (without with)
gives the guarantee that you'd at most pollute the object you're
working on, not the outer environment.)

Or even

   do
     local print = print
     local _ENV = { x = 23 }
     function _G.foo( )  with print, x do  print( x )  end  end
     -- up to this point you cannot even tell that _ENV might change
     -- again (so this would break Lua's single-pass compilation)
     function _G.bar( newenv )  _ENV = newenv  end
     -- only now it's clear that _ENV might be changed
   end
   bar { y = 5 }
   foo ()

Because _ENV can change dynamically, you cannot determine at compile
time what names will be valid, so you cannot always raise a compile time
error.

> Sergey Kovalev complained recently about lack of syntax for defining
> "pure functions" in Lua.
> Due to "with" statement we could define them easily (3 variants):
>
>     Variant #1
>
>        local function pure_func(x, y)
>           with x, y do    -- if the function must be recursive, add its
> name to this white-list
>              -- all upvalues and globals are inaccessible here
>              ...
>           end
>        end

This isn't ideal for "syntactical sandboxing" (you have to look inside
the function to check that it's safe) and you have to write all
arguments twice…

>     Variant#2
>
>        local pure_func
>        with pure_func do  -- we need to include something in the
> white-list to be able to pass the constructed function value outside the
> block
>           function pure_func(x, y)
>              -- all upvalues and globals are inaccessible here
>              ...
>           end
>        end

This is awfully verbose.  (You have to give the name THREE times!?!)

>     Variant#3
>
>        with do      -- white-list is empty, the function value is passed
> outside the block by "return" statement
>           local function pure_func(x, y)
>              -- all upvalues and globals are inaccessible here
>              ...
>           end
>           return pure_func
>        end

This needs an outer (function() … end)() wrapper or has to live in its
own file, again awfully clunky.

I'd prefer that to work like

   local x, y, z with a, b, c do
     … (both x,y,z AND a,b,c are visible here) …
   end

(but I already have local x,y,z do … end blocks in my code.)  Because it
wouldn't be possible to disambiguate that from

   local x, y, z;
   with a, b, c do … end

without an explicit semicolon, I'd restrict it to always have the
`'local' <NAMELIST> 'with' <NAMELIST> 'do' <BLOCK> 'end'` form.  That
would force blocks with purely external side-effects to take the form

   local _ with io, self, … do  …block…  end

which I personally wouldn't mind… but at that point I'm left wondering:
What exactly is gained over the current possibilities of _ENV?

I start my modules (whether they're files or blocks) with something like

   -- (_M is the exposed module, _MENV / _ENV is the environment)
   local _ENV = setmetatable( { _M = { } }, { __index = _ENV or _G } )
   if setfenv then  setfenv( 1, _ENV )  end
   _M._MENV = _ENV          -- be REPL-/testing-/monkey-patching-friendly
   package.loaded[...] = _M -- no `return _M` needed

which is 5.1-thru-(probably)-5.4-compatible.  Because everything has
_ENV, there's essentially no place where actual global variables can
accidentally be created – everything ends up in a "local environment" at
worst.  Also, (unless I need that tiny bit of extra speed for
recursion,) I don't `local` my internal functions, I intentionally let
them go to the module's environment (for easier monkey patching,
testing, etc.).  Tests can easily check whether any "local global" was
accidentally changed (per module, snapshot the initial foo._MENV after
loading, run all the tests, compare with final state & flag any changes
not explicitly whitelisted – this also catches updates to existing
fields, unlike __newindex-based stuff.)

Instead of going { __index = _ENV or _G } for a new _ENV, I could be
more specific, which would get me the white-listing of "globals".  And
because I don't have to `local` the module-internal state (as I have an
actual environment), I actually don't have that many local variables
outside of functions (and those that exist are `do`…`end`-scoped), so I
don't need the hiding of locals at all.  (Stuff that I want to sandbox
goes into its own file anyway and gets its _ENV set via load(), so I
have the guarantee that there won't be any accidental upvalues / locals.)

So… I guess… have you tried working with _ENV before?  Why was it
insufficient for your purposes?



> How "with" statement should work with "_ENV" name:
> 1) If "_ENV" is written explicitly (user wants to directly access the
> upvalue "_ENV") then such "_ENV" is visible only if it's in white-list.
> 2) If "_ENV" is used implicitly (user accesses global variable "var",
> and the preprocessor replaces "var" with "_ENV.var") then this implicit
> "_ENV" is always accessible, even when "_ENV" is not white-listed.

This just sounds scary.

> I believe the identifier "with" is used not very frequently in Lua
> programs because it is not a noun/adjective/verb,
> so there would be not very much old Lua code broken by introducing this
> new keyword.

Seems correct, I have an awful lot of zipWith, withOpenFile and other
withFOOs… but only two places where a function parameter is called
'with'… (not related to the above functions).

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Viacheslav Usov
On Mon, Jul 15, 2019 at 10:55 AM nobody <[hidden email]> wrote:

> What about to-be-defined "globals"?  (Will you have to "pre-declare" them?)

Globals are not defined and not declared in Lua. They always exist, with nil values unless assigned something else. Which is precisely why they are a problem.

Because they are not declared, they cannot be pre-declared. You could probably say that this proposal makes it possible to have globals declared (without "pre") before they are used (even though it is not solely about globals). Which is good, because it solves the problem.

> Because _ENV can change dynamically, you cannot determine at compile time what names will be valid

Regardless of the proposal, the compiler knows statically what names are global names; those names will be statically converted to _ENV.names. The run-time value of _ENV of irrelevant, the proposal is about statically permitted names.

> What exactly is gained over the current possibilities of _ENV?

A. Static (compile-time) guarantees.

B. Usability. I am not sure you will easily agree, but the _ENV gyrations in your post are magic incantations to a typical Lua user.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Viacheslav Usov
In reply to this post by Egor Skriptunoff-2
On Mon, Jul 15, 2019 at 7:50 AM Egor Skriptunoff <[hidden email]> wrote:

> May be the syntax should be more verbose (to correspond to overall language look-and-feel)?

As I said, the syntax I proposed is just one possibility.

> Such annotation might appear even in the middle of a Lua's grammar block 

I have realised that the semantics I proposed would have a problem in this case:

function(x, y, z) [a, b, c] end

Because it makes x, y, z inaccessible in the function's body, and having them repeat on the [] declaration would be ugly. A similar problem exists in the original proposal, too.

It could be solved by allowing [...] after the function keyword:

function[a, b, c](x, y, y) end

Then all of a, b, c, x, y, and z would be available within the function's body. If the function is named, then:

function [a, b, c]foo(x, y, y) end

which makes foo also available within the function. A syntax like function foo[a, b, c](x, y, z), which makes foo inaccessible within the function, is not necessary, because it would be equivalent to

foo = function[a, b, c](x, y, z)  end

In principle, this proposal makes the keyword function redundant, so, for example, a pure function could be unambiguously defined via []f(x, y, z) end  --:)

With respect to letting [...] (or its equivalent) appear anywhere in a block, I like the idea of annotating each syntactical block with its external dependencies at its start. This is why I find a terse syntax like [a, b,c ] acceptable, because it is always linked to some more verbose construct. External dependencies introduced in a middle of a block, not so much.

One final thing to consider is whether we would need a syntax to cancel all of those restrictions. An obvious candidate for this would be something like [*], but the question is whether it is really a good idea conceptually.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

nobody
In reply to this post by Viacheslav Usov
On 15/07/2019 12.09, Viacheslav Usov wrote:

> On Mon, Jul 15, 2019 at 10:55 AM nobody <[hidden email]
> <mailto:nobody%[hidden email]>> wrote:
>
> > What about to-be-defined "globals"?  (Will you have to "pre-declare"
> them?)
>
> Globals are not defined and not declared in Lua. They always exist, with
> nil values unless assigned something else. Which is precisely why they
> are a problem.
>

(By "pre-declare" I meant doing stuff like you need for `strict.lua` or
other checks – explicitly `rawset` the variable or initialize it to e.g.
`false` before using it or something like that…)

What I was trying to get at is that if you say `with x do … end` and x
is not a local variable or upvalue, how do you do the check that

>>> The white-list must contain only visible names, otherwise a
>>> compilation error is raised.
?  Is any global variable (including one that's `nil`) acceptable?
(Then all names are always visible & this condition is vacuously true,
so this doesn't get you any compile-time check at all.)

Otherwise, what is the condition for a global variable to be treated as
"visible"?

(The more I think about this, the more I'm convinced that this cannot
possibly work in any way that deserves to be called "working" – the
parser cannot possibly get that information.  A variable may be set in a
different file, before or after parsing this file, made visible via
__index; the environment may change when/after parsing or during
execution, the environment's metatable may change, etc.…)

>  > Because _ENV can change dynamically, you cannot determine at compile
> time what names will be valid
>
> Regardless of the proposal, the compiler knows statically what names are
> global names; those names will be statically converted to _ENV.names.
> The run-time value of _ENV of irrelevant, the proposal is about
> statically permitted names.

It only knows what variables are (not) locals/upvalues, everything else
is treated as global / coming from the environment by default.

If you use the same mechanism, that means that `with` again allows
arbitrary undefined variables, so it doesn't really improve the
situation much w.r.t. accidental globals.

>  > What exactly is gained over the current possibilities of _ENV?
>
> A. Static (compile-time) guarantees.

As re-iterated above, I don't see that's possible.

> B. Usability. I am not sure you will easily agree, but the _ENV
> gyrations in your post are magic incantations to a typical Lua user.

Sure, that's fairly high-magic code – but it could just be what the new
`module` function (if there ever is one again) does internally, and then
you don't have to see this or care about it.  In the simplest non-magic
case, all you say is

   _ENV = module( ... )

and you're set up.  If `module` is allowed to use some debug magic
internally, it could set the loading file's newest _ENV variable and
then you'd have the good old

   module( ... )

with no user-facing mention of _ENV at all.  (Only that you can't
accidentally set globals anymore, you'll have to explicitly prefix with
_G now.)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

nobody
On 15/07/2019 14.50, nobody wrote:
> If `module` is allowed to use some debug magic internally, it could
> set the loading file's newest _ENV variable and then you'd have the
> good old
>
>   module( ... )
>
> with no user-facing mention of _ENV at all.  (Only that you can't
> accidentally set globals anymore, you'll have to explicitly prefix
> with _G now.)

(Or rather, you'd neither have everything be visible through the module
table (as with `package.seeall`) nor need to explicitly pre-`local` all
the stuff that you need to access.  The old `module` already prevented
accidental globals, it just didn't create an environment (only a module
table) and made things public-by-default, so you'd still accidentally
pollute your module if you forgot to `local`.)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Viacheslav Usov
In reply to this post by nobody
On Mon, Jul 15, 2019 at 2:50 PM nobody <[hidden email]> wrote:

> Is any global variable (including one that's `nil`) acceptable?

Yes, but see below.

> If you use the same mechanism, that means that `with` again allows arbitrary undefined variables, so it doesn't really improve the situation much w.r.t. accidental globals.

The outermost with-declaration can mention any name. After that, any name, in an inner with-declaration or otherwise, must either be one in the innermost with-declaration or be defined locally in the innermost with-scope. So your room for accidental globals is very, very small.

> In the simplest non-magic case, all you say is   _ENV = module( ... )

See, you think this is not magic :)

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Sergey Kovalev
In reply to this post by Egor Skriptunoff-2
I patched lua 5.4. it has new syntax for pure functions.
https://pastebin.com/7CHqFk7a

So if you want to experiment you may try it: (ps: pastebin replace \n
to \r\n so dos2unix need to fix it)
wget https://pastebin.com/raw/7CHqFk7a -O- | dos2unix > build-pure-functions.sh
bash build-pure-functions.sh

Usage example:
fn=function(x) print(x) end -- normal function
fn=function(x,G) pure G.print(x) end -- pure function. with out access
to upvalues and globals
fn(10,_G)

пт, 12 июл. 2019 г. в 00:16, Egor Skriptunoff <[hidden email]>:
>
> A new syntax:
>    with white-list-of-names do
>       ....
>    end

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Coda Highland
In reply to this post by Viacheslav Usov


On Mon, Jul 15, 2019 at 5:58 AM Viacheslav Usov <[hidden email]> wrote:
In principle, this proposal makes the keyword function redundant, so, for example, a pure function could be unambiguously defined via []f(x, y, z) end  --:)

Coincidentally, this is basically C++'s syntax for the same task.

I can't say for sure if that's a good thing...

/s/ Adam
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Viacheslav Usov
On Mon, Jul 15, 2019 at 7:21 PM Coda Highland <[hidden email]> wrote:

> Coincidentally, this is basically C++'s syntax for the same task.

The syntax with the function keyword omitted was more of an insider joke, even though etymologically I did lift [] directly from C++ lambdas when I paused to think of invalid syntax to introduce a "capture list" in Lua.

But that was before I realised a twist would be required for functions.

Perhaps not fully coincidentally, the syntax addresses a problem that C++ still has with recursive lambdas.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Sergey Kovalev
In reply to this post by Sergey Kovalev
new syntax for pure function (v2) - https://www.lua.org/work/doc/manual.html#9
...
funcbody ::= ‘(’ [parlist] ‘)’ [pure] block end
...

[build.sh]
#!/bin/sh

wget -c https://www.lua.org/work/lua-5.4.0-alpha.tar.gz
wget https://raw.githubusercontent.com/kov-serg/lua-aux/master/pure54.patch
tar xf lua-5.4.0-alpha.tar.gz
cd lua-5.4.0-alpha
patch -p 1 < ../pure54.patch
make linux-readline
cp src/{lua,luac} ..
cd ..

[test-pure.lua]
x="global"
local y="upvalue"

function fn(G,z) pure
  local a
  G.print(z)
  G.print(G.x)
  function a() G.print(y) end -- error: no access to 'y'
end

fn(_G,"arg")

пн, 15 июл. 2019 г. в 19:46, Sergey Kovalev <[hidden email]>:
>
> I patched lua 5.4. it has new syntax for pure functions.

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: names-isolation-statement

Egor Skriptunoff-2
In reply to this post by nobody
On Mon, Jul 15, 2019 at 11:55 AM nobody wrote:
But these two things in combination create a problem:

> The statement works for all names independently of what a name is:
> local/upvalue/global.

> The white-list must contain only visible names, otherwise
> compilation error is raised.

What about to-be-defined "globals"?  (Will you have to "pre-declare" them?)


I'm aware of two ways to turn on strict control over globals usage at compile-time:
1) separate the globals' namespace from the other namespaces (for example, by using prefix "$" or "_ENV.")
or
2) explicitly declare (white-list) globals you want to use in your code

This suggestion implements both these approaches:

1) You can separate namespaces (by using prefix "_ENV.")
   with _ENV do
      _ENV.print(_ENV.math.pi)
      _ENV.non_existing_global = 42
   end

2) You can white-list all the globals you need
   with print, math, non_existing_global do
      print(math.pi)
      non_existing_global = 42
   end

3) You can combine these two approaches:
most frequently used globals are white-listed, others are prefixed
   with print, _ENV do
      print(_ENV.math.pi)
      _ENV.non_existing_global = 42
   end

It's your own decision: to pre-declare to-be-defined globals or to use "_ENV." prefix for them.



 
And what about this?

   Point2D.length =
     function( _ENV )  with x, y do  return (x^2+y^2)^0.5  end  end

(which could (more or less) equivalently be written as

   function Point2D:length( )  return (self.x^2+self.y^2)^0.5  end

but is shorter / more readable this way and already (without with)
gives the guarantee that you'd at most pollute the object you're
working on, not the outer environment.)



Nice example!
Yes, it would be more readable in "Point2D:length()" to use "self" instead of "with".
This example shows that for some (but not all) situations Lua already has a trick to write a "pure function".
But "with" statement gives more universal control over visibility of variables, and it also protects against typos.
If the price of an error is quite high and the inner block is much longer than "(x^2+y^2)^0.5" then you would probably prefer to use "with".

 
   do
     local print = print
     local _ENV = { x = 23 }
     function _G.foo( )  with print, x do  print( x )  end  end
     -- up to this point you cannot even tell that _ENV might change
     -- again (so this would break Lua's single-pass compilation)
     function _G.bar( newenv )  _ENV = newenv  end
     -- only now it's clear that _ENV might be changed
   end
   bar { y = 5 }
   foo ()

Because _ENV can change dynamically, you cannot determine at compile
time what names will be valid, so you cannot always raise a compile time
error.



What does "valid" mean?
Only visibility does matter.
The "with" statement doesn't care about whether a variable contains nil or non-nil value.
Absolutely all globals (including non-existing) are considered visible outside of the most outer "with" statement.



 
>     Variant #1
>
>        local function pure_func(x, y)
>           with x, y do   
>              -- all upvalues and globals are inaccessible here
>              ...
>           end
>        end

you have to write all arguments twice…


Yes, that's significant inconvenience.
See "suggestion #2" below to find how it could be fixed.

 
I'd prefer that to work like

   local x, y, z with a, b, c do
     … (both x,y,z AND a,b,c are visible here) …
   end



OK, I see your point: it would avoid typing variable names twice as in the following code:
   local x, y, z
   with a, b, c, x, y, z do
      -- both x,y,z AND a,b,c are visible here
   end

OK, let's try to avoid this unneeded typing.

-----------------------------------------------------
Suggestion #2, improved:
-----------------------------------------------------

Let "with" be not a statement, but an optional prefix for arbitrary Lua statement.
   with white-list-names inner-statement

If inner statement is "do...end" then we are getting our previous form
   with x, y
      do
         ...
      end

If inner statement is a local variables definition then we are getting exactly what "nobody" wants (don't typing local variables twice):
   with x, y
      local sum, diff = x+y, x-y
   -- variables "sum" and "diff" are accessible outside "with"
   print(sum, diff)

If inner statement is a local function definition then we are getting compact code to define a "pure function" (don't typing function parameters twice):
   with math
      local function pure_func(x)
         return math.sin(x)/x
      end
   -- local variable "pure_func" is accessible outside "with"
   print(pure_func(1))

Inner statement could be just anything:
   with x
      return x, x+1, x+2

   with a, i, j
      i, j, a[i], a[j] = a[i]~a[j], j+1&255, a[j], a[i]

Linebreaks in these examples are just for readability, spaces could be used instead.

It's impossible to use empty white-list because of the following ambiguity:
   with a (b or c)[1] = 42
Is it
   with
      a(b or c)[1] = 42
or
   with a
      (b or c)[1] = 42

An underscore (or could be used instead of empty white-list (but only for most outer "with").
   with _
      do
         ....
      end

   with _
      local function f(n)
         return n*(n+1)/2
      end

   with _
      local t = {
         pure_func1 = function(x) return (x+1)/(x-1) end,
         pure_func2 = function(x, y) return x*y+x+y end,
      }

Please note the difference between defining a new local and defining a new global:
   with x
      local new_var = x*x     -- new local

   with x, new_var
      new_var = x*x           -- new global

This is because the local variable is actually being created by the statement,
but globals are considered already created (just like already existing local variable).
   local old_var
   with x, old_var
      old_var = x*x           -- old local