microlight review

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

microlight review

David Manura
A couple concerns on initial review of microlight [1]...

(1) The names ml.extend, ml.inject, and ml.delete relate only loosely
to purpose.  I don't get the sense from the names that "delete" (as in
ml.delete) and "remove" (as in table.remove) are very similar in
behavior apart removing a *range* of values v.s. a *single* value.
The name "extend" might sound like class inheritance or
JQuery.extend().  ml.extend can be written in terms of ml.inject, so
there is overlap, and ml.inject could be presented as an analogue of
table.insert for multiple elements, except for the `overwrite`
argument in `ml.inject(dest,index,src,overwrite)`, which makes the
function behave more like a memcpy/bitblt than an insert.  I'm kind-of
expecting a set of related functions like

  table.insert(list, [pos,] value)   <-->   ml.inject(t, [pos,], t2)
  table.remove(list [, pos])         <-->  ml.remove(t [, pos1 [, pos2]])
  lua_copy                               <--> ml.copy (t1, pos1, t2, pos2, len)

except the ml.copy might be too trivial and uncommon for microlight,
whereas ml.inject/ml.remove at least are O(N^2) in their naive
implementation.  remove may even be merged into inject, as in the
versatile Perl splice [2].

(2) Assuming tstring is for debugging not serialization, we probably
want to escape unreadable chars.  Try

  ml.tstring(string.char(unpack(ml.range(0,255))))

(3) I see some functions in microlight of common utility
(escape/expand/readfile/exists/tstring/indexof/range/invert/collect/...and
probably class).  Others I'm not so sure about.  Others seem missing
like memoize, findbin [5], trim [8], and writefile [3] or could be
more complete [3].  Others like split, TMTOWTDI.  I planned to at some
point announce my own set of very small focused libraries for core
things [3-7].

[1] github.com/stevedonovan/Microlight
[2] http://perldoc.perl.org/functions/splice.html
[3] [https://github.com/davidm/lua-file-slurp lua-file-slurp /
file_slurp] - Easily read/write entire files or pipes from/to a string
[4] [https://github.com/davidm/lua-require-any lua-require-any /
requireany] - require any one of the listed modules
[5] [https://github.com/davidm/lua-find-bin lua-find-bin / findbin] -
Locate directory of original perl script
[6] [https://github.com/davidm/lua-lib lua-lib / lib] - Simple
insertion of directories in package search paths
[7] [https://github.com/davidm/lua-compat-env lua-compat-env /
compat_env] - Lua 5.1/5.2 environment compatibility functions
(load/loadfile/getfenv/setfenv)
[8] http://lua-users.org/wiki/StringTrim

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

steve donovan
On Sat, Apr 28, 2012 at 7:07 AM, David Manura <[hidden email]> wrote:
> (1) The names ml.extend, ml.inject, and ml.delete relate only loosely
> to purpose.

The word 'extend' comes from Python usage (add things to this list)
and the dict-equivalent used to be ml.update, but somehow I got
persuaded to use ml.import. (Both names make sense, in context, and
here an alias will do fine.)

>  I don't get the sense from the names that "delete" (as in
> ml.delete) and "remove" (as in table.remove) are very similar

'inject' (also in my humble opinion) is poorly named. I think we
called the equivalent 'insertvalues' in Penlight, and something like
that would do better (like 'insertrange', 'removerange', etc)

> (2) Assuming tstring is for debugging not serialization, we probably
> want to escape unreadable chars.

An interesting question I've been discussing with Vadim; tstring
should be as correct as possible, as long as it remains a simple
robust debugging/light serialization tool.  I'm not entirely a fan of
'big serialization', from years of maintaining MFC applications which
were crippled by their data format mirroring the internal
implementation, hence breaking encapsulation and generally making
upgrading a bitch.  (I think Alexander has a more industrial table
serializer available)

That being said, unprintable chars should be escaped, perhaps in
'stupid' mode (which is another misnomer)

> (3) I see some functions in microlight of common utility
> (escape/expand/readfile/exists/tstring/indexof/range/invert/collect/...and
> probably class).  Others I'm not so sure about.

There will always be a few such odd children in any big family.  Part
of it comes from my fluctuating enthusiasm for functional forms like
map, filter, compose and bind.

> like memoize, findbin [5], trim [8], and writefile [3] or could be
> more complete

There was a request for trim, and personally I've missed writefile;
memoize was part of Jay's original proposal and it should go in.

I'm becoming aware of trying to keep some backward compatibility,
since there are apparently a few people who are already using it. But
(always) there's the story of Stu Feldman, who invented make and made
it require tabs; by the time he realized this mistake, he already had
a few dozen users.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

David Manura
Proper 5.1/5.2 compatibility may be another reason people would want
to use Microlight rather than roll their own functions.  As a case in
point: ml.escape is currently incompatible with 5.1 due to \0
handling.  Test case to add:

  local ml = require 'ml'
  local charmap = string.char(unpack(ml.range(0,255)))
  assert(("aa"..charmap.."bb"):match(ml.escape(charmap)) == charmap)

On my suggestion about merging ml.extend and ml.inject (and perhaps
even ml.delete, like splice), I see there was already some discussion
("Is there really an benefit for conflating insert() and append()"
[1]), though there was also a goal of keeping the number of functions
under 30 (e.g. no "prepend()").

On Sat, Apr 28, 2012 at 4:04 AM, steve donovan
<[hidden email]> wrote:
> somehow I got persuaded to use ml.import. (Both names make
> sense, in context, and here an alias will do fine.)

Speaking of conflating, ml.import is described as both a primitive
table operation (which perhaps should separately be ml.update) as well
as something more involved dealing with packages/require, including
the deprecated getfenv (that users may prefer to avoid even in 5.1),
plus _ENV == _G here.

Common usage in existing programs rather than invention should dictate
in design choices, but I'm not sure about the
ifind/indexof/imap*/ifilter functions.  ml.ifind and ml.indexof are
related, differing in that one accepts a `pred` while the other
accepts a `value`, but they also differ in that one returns a key and
other returns a value.  Finding whether an element exists in a
collection like an array (or, more completely, it's index to operate
on it) is common, so I do at least want ml.indexof.  With
imap*/ifilter, there's some design questions (e.g. in-place operations
v.s. returning copies), though the copying approach chosen is probably
the safer option in the common case.  We may still want a shallow
table copy operation that handles non-sequences.

> function ml.callable (obj)
>     return type(obj) == 'function' or getmetatable(obj) and
>       getmetatable(obj).__call
> end

It could be noted that this may fail if __metatable is set, at least
without a __metatable.__call.  BTW, the special case of mt.__index ==
mt (as imposed by ml.class) implies `obj.__call` can also be tested
(and likewise for other operators).

I suppose it should also be noted that ml array operations are not
raw, like `table.*` functions are.

Maybe ml.count_keys should be ml.kcount or just ml.count, analogous to
naming conventions like in ml.imap and `ipairs/pairs`.  Like Dirk
mentioned, "subset/countset/equalset" are more obvious to me than
"*_keys" since these will *usually* be used for sets.  It may be
mentioned in the tutorial that `if next(t)` or `if next(t) ~= nil` has
been one idiom to test for empty set.  Another idiom: `not not o ==
ml.truth(o)`.

The docs could mention that ml.collect works only on stateful
iterators and collects the first value.  ml.collect(pairs{1,2,3})
would fail.

TODO: Add precise semantics/tests of ml.split.  ml.split(s, '') can be
ambiguous but presently has the Python, not Perl, behavior.  The
obscure case ml.split("asdfsadf", '.-') hangs, unlike Python.

> An interesting question I've been discussing with Vadim; tstring
> should be as correct as possible, as long as it remains a simple
> robust debugging/light serialization tool.

These things were probably discussed at length before, but my
experience is to often want a dumper for debugging and occasionally
for basic serialization.  Rarely are cycles needed for basic
serialization (e.g. they are not in json or XML), but I may want them
for debugging since they can occur in memory (intentionally or
accidentally).  Therefore, if we're talking about covering 95% of use
cases, I think Perl's Data::Dumper default is reasonable:

perl -E 'use Data::Dumper; $t = []; $t->[0]=[$t], $t->[1]=$t->[0];
print Dumper($t)'

        $VAR1 = [
          [
            $VAR1
          ],
          $VAR1->[0]
        ];

  Quote: "The default output of self-referential structures can be
evaled, but the nested references to $VARn will be undefined, since a
recursive structure cannot be constructed using one Perl statement.
You should set the Purity flag to 1 to get additional statements that
will correctly fill in these references." -- [2]

[1] http://lua-users.org/lists/lua-l/2012-02/msg00731.html
[2] http://search.cpan.org/perldoc?Data%3A%3ADumper

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

steve donovan
On Sat, Apr 28, 2012 at 10:45 PM, David Manura <[hidden email]> wrote:
> to use Microlight rather than roll their own functions.  As a case in
> point: ml.escape is currently incompatible with 5.1 due to \0
> handling.  Test case to add:

Not sure how to fix this, because the Lua 5.1 handling of \0 is not consistent:

> =('a\000'):find'\000'
2       2
> =#('a\000'):match'\000'
0

i.e. works fine for _finding_, but you get an 'empty' string when matching.

> On my suggestion about merging ml.extend and ml.inject (and perhaps
> even ml.delete, like splice), I see there was already some discussion

The latest push does some renaming: 'inject' has become insertvalues
and 'delete' has become removerange.

count_keys becomes simply count, and has_keys becomes issubset.

> Speaking of conflating, ml.import is described as both a primitive
> table operation (which perhaps should separately be ml.update) as well
> as something more involved

I'm increasingly convinced that ml.update and ml.import are different
functions (in fact, is import actually used?) As a temporary stand-by,
update is a synonym for import.

> .  ml.ifind and ml.indexof are
> related, differing in that one accepts a `pred` while the other
> accepts a `value`, but they also differ in that one returns a key and
> other returns a value.

I've taken the liberty of giving indexof an optional extra argument
which is the function to test for equality. Given that, my feeling is
that ifind should be deprecated.

> We may still want a shallow table copy operation that handles non-sequences.

ml.update({},t) does that job currently.

> The docs could mention that ml.collect works only on stateful
> iterators and collects the first value.  ml.collect(pairs{1,2,3})
> would fail.

I was bothered by this, especially when remembering that LuaJIT
implements io.lines() as a stateless iterator (it returns a function
and the file object). So now ml.collect() works with io.lines and
pairs/ipairs.  In the process, it had to lose some optional (and
somewhat confusing) arguments.  (For the pairs/ipairs pattern of
key/value iteration, it collects the value, otherwise the first loop
variable)

There is now a collect_until(pred,iter) which also works with all
iterators, with pred either being a number or a predicate. I'm not
entirely convinced it's a good idea to override the argument like
this, and perhaps collect_n should be a separate function.  The
'thirty function' limit is arbitrary, of course.

> These things were probably discussed at length before, but my
> experience is to often want a dumper for debugging and occasionally
> for basic serialization.

That's my experience as well. To go beyond that, one needs a
configurable serializer and I think there are some more industrial
modules out there for this specific purpose.

I've put in memoize, which was part of Jay's original proposal.  I'm
itching a little to add writefile as well as readfile, and there has
been a request for a good trim function.

A design decision that needs review is that all functions returning
arrays label them explicitly as Array objects.  It is trivial to give
them a new identity but I'm wondering if this is a wise move for
generic functions.

Again, thanks for this detailed review; it's really what every module
writer needs, and is lucky to get.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

Dirk Laurie-2
Steve D responding to David M:

>> On my suggestion about merging ml.extend and ml.inject (and perhaps
>> even ml.delete, like splice), I see there was already some discussion
>
> The latest push does some renaming: 'inject' has become insertvalues
> and 'delete' has become removerange.

Easy enough to say e.g.
    local inject = ml.insertvalues
In such cases Microlight should prefer long but descriptive
to snappy but cryptic function names.

> I'm increasingly convinced that ml.update and ml.import are different
> functions (in fact, is import actually used?) As a temporary stand-by,
> update is a synonym for import.

Yes, without thinking about it, I always use the idiom

    require"ml".import()

> The 'thirty function' limit is arbitrary, of course.

There's a tradeoff between two extremes.  You can have a hundred
conflation-free functions like Python and need to remember which
of two almost synonymous English words like `extend` and `append`
is appropriate, or you can have one super-conflated function like
a typical GUI library and remain blissfully unaware of features you
never use.  The optimum is somewhere in between, which means that
you can't afford to be dogmatic about conflation.

Conflation is much more acceptable with table argument lists. E.g.
   import{journal,into=ledger}
   update{ledger,from=journal}
would do the same thing and confuse no reader.  Positional arguments
are efficient, sure, and reminiscent of the "good" old Fortran days,
true, but actually make for hard-to-find bugs in a language where
names don't imply types and missing arguments default to nil.

>
> I've put in memoize, which was part of Jay's original proposal.  I'm
> itching a little to add writefile as well as readfile, and there has
> been a request for a good trim function.

There has also been a suggestion, complete with proof-of-concept
implementation, for a wrap function.

> A design decision that needs review is that all functions returning
> arrays label them explicitly as Array objects.  It is trivial to give
> them a new identity but I'm wondering if this is a wise move for
> generic functions.

The less imposing-of-arbitrary-standards there is to Microlight's
class implementation, including Array, the more likely it is to
be acceptable to users.

On another issue:  One of the reasons why Python can get away with
a maze of twisty little passages, all alike, is the self-documenting
system.

If I can't remember which of `append` or `extend` adds several
values in one go, I can type `help(list.extend)` and find out.

Now implementing this for general user-defined functions with a
Python-style docstring belongs to Ilua, not to Microlight, but
implementing it for Microlight functions is trivial: you keep a
table with functions (not names) as keys.  E.g.

--- in ml.lua

local helptable={}

function ml.help(fn,noprint)
   local helptext=helptable[fn]
   if helptext==nil then
      local tfunc = type(fn)
      if tfunc~='function' then
         helptext="bad argument #1 to 'help' (function expected, got "
            ..tfunc..")"
      else helptext = "No help available for this function"
   end end
   if noprint then return helptext else print(helptext) end
end

helptable[ml.wrap]=[[
wrap(str,[length=72]) Returns a string in which some other whitespace
characters have been replaced by newline characters, so that the
distance between newline characters is close to `length`.]]

--- in the user's session

> require"ml".import()
> inject = insertvalues
> help(inject)
insertvalues(tbl,dest,index,src,overwrite) Copies values from `src` into
`dest` starting at `index`, moving up present values of `src` unless
`overwrite` tests `true`.]]

Dirk

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

Dirk Laurie-2
> Now implementing this for general user-defined functions with a
> Python-style docstring belongs to Ilua, not to Microlight,

But a compromise using a function `addhelp` like Pari/GP would be easy:
instead of `helptable[ml.wrap]=[[...]]` put `addhelp(ml.wrap,[[...]])`
and give the ML user access to addhelp.

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

Daurnimator
In reply to this post by steve donovan
On 2 May 2012 18:32, steve donovan <[hidden email]> wrote:

> On Sat, Apr 28, 2012 at 10:45 PM, David Manura <[hidden email]> wrote:
>> to use Microlight rather than roll their own functions.  As a case in
>> point: ml.escape is currently incompatible with 5.1 due to \0
>> handling.  Test case to add:
>
> Not sure how to fix this, because the Lua 5.1 handling of \0 is not consistent:
>
>> =('a\000'):find'\000'
> 2       2
>> =#('a\000'):match'\000'
> 0
>
> i.e. works fine for _finding_, but you get an 'empty' string when matching.

You have to use %z to match embedded nulls; a unexpected lesson I
learnt long ago.
On the other hand; everything works fine with classes (ie; %S will match \0).

D

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

steve donovan
In reply to this post by Dirk Laurie-2
On Wed, May 2, 2012 at 12:17 PM, Dirk Laurie <[hidden email]> wrote:
> Easy enough to say e.g.
>    local inject = ml.insertvalues
> In such cases Microlight should prefer long but descriptive
> to snappy but cryptic function names.

That's the idea - let the user get as cryptic as they like, but this
is not Fortran or Unix on a teleprinter.

>    require"ml".import()

I think it's fine for that role, it's just one of those functions that
does two somewhat different things as David observes. update and
import should be separate functions for that reason (I have a tendency
to pack too much in a single function)

> Conflation is much more acceptable with table argument lists. E.g.
>   import{journal,into=ledger}
>   update{ledger,from=journal}

I do worry about the need for an extra table creation here, since esp.
update needs to be practically a primitive.

> There has also been a suggestion, complete with proof-of-concept
> implementation, for a wrap function.

How do others feel about this? And Bertrand's suggested trim() function?

> On another issue:  One of the reasons why Python can get away with
> a maze of twisty little passages, all alike, is the self-documenting
> system.

Sure, a doc-string convention isn't difficult, maybe I'm being
over-anxious about adding documentation as live strings, just for the
occaisional interactive convenience; perhaps there should be an
ml_help separate module[1]  (I agree that a big beast like ldoc is
overkill for the job [2])   It's definitely a thing that I appreciated
when learning Python.

steve d.

[1] a clever Lua REPL could load such doc modules automatically, if a
convention was followed.
[2] but one of its output formats could be such a 'doc module'

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

steve donovan
In reply to this post by Daurnimator
On Wed, May 2, 2012 at 1:08 PM, Daurnimator <[hidden email]> wrote:
> You have to use %z to match embedded nulls; a unexpected lesson I
> learnt long ago.

Thanks, I _knew_ it had something to do with z ;)  Well then I can
escape \000 as %z, but only for 5.1

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

Roberto Ierusalimschy
> On Wed, May 2, 2012 at 1:08 PM, Daurnimator <[hidden email]> wrote:
> > You have to use %z to match embedded nulls; a unexpected lesson I
> > learnt long ago.
>
> Thanks, I _knew_ it had something to do with z ;)  Well then I can
> escape \000 as %z, but only for 5.1

Beware that something like "[\0-\31]" cannot be translated to "[%z-\31]";
it must be written as "[%z\1-\31]".

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: microlight review

Jay Carlson
In reply to this post by steve donovan
On May 2, 2012, at 7:24 AM, steve donovan wrote:

> On Wed, May 2, 2012 at 12:17 PM, Dirk Laurie <[hidden email]> wrote:
>> Easy enough to say e.g.
>>    local inject = ml.insertvalues
>> In such cases Microlight should prefer long but descriptive
>> to snappy but cryptic function names.
>
> That's the idea - let the user get as cryptic as they like, but this
> is not Fortran or Unix on a teleprinter.

Actually, it is. See the recently mentioned string.find, string.match, string.gmatch. I think there is room for arguments, but an appeal to match the existing spirit of Lua's library is not a good argument.

I am comfortable with "you just have to learn some words" if there are not a lot of them and are used frequently. You have to learn the semantics of the operations as well; people learn in different ways, but I think the name is a small problem in this case.

Jay