The += operator, again

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The += operator, again

Felipe Tavares
Hi!

I am writing an lightweight (and bare bones) implementation of the += operator.

I already got it working, many thanks to the power patch by Olsen (http://lua-users.org/files/wiki_insecure/power_patches/5.2/compound-5.2.2.patch).

Diferently from Olsen, I added an extra token (TK_PLUSEQ) to the parser, and then in the assignment routine in lparser.c I did this:

static void assignment (LexState *ls, struct LHS_assign *lh, int nvars) {
expdesc e;
check_condition(ls, vkisvar(lh->v.k), "syntax error");
...
/* Language extension here */
else if (testnext(ls, TK_PLUSEQ)) {
// Get the rvalue expression
expr(ls, &e);
// Add it with the lvalue of the assignment
luaK_infix(ls->fs, OPR_ADD, &e);
luaK_posfix(ls->fs, OPR_ADD, &e, &lh->v, ls->linenumber);
// Store it back
luaK_storevar(ls->fs, &lh->v, &e);
return; /* avoid default */
}

...

Now, this works, but not in all cases:

Works:

local x = 1
x += 1

Doesn't (has no effect on the value of y.y):

local y = {
y = 1
}
y.y += 1

Doesnt (has no effect on z.z.z):

local z = {
z = {
z = 1
}
}

z.z.z += 1

I have no idea why it works in some cases and in others it doesn't; I am guessing it has something to do with the way the variable (lh->v) is actually represented?
Open Tracking
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Gé Weijers

On Mon, Nov 19, 2018 at 11:36 AM Felipe Tavares <[hidden email]> wrote:
Hi!

I am writing an lightweight (and bare bones) implementation of the += operator.

I already got it working, many thanks to the power patch by Olsen (http://lua-users.org/files/wiki_insecure/power_patches/5.2/compound-5.2.2.patch).

+= etc. is not simple to implement if you want to keep the regular Lua semantics, for instance in this case:

t[f()] += 1

This has to be translated into: 

local tmp = f()
t[tmp] = t[tmp] + 1
tmp = nil

If 't' has a metatable it may trigger __index and __newindex metamethod calls, so you can't just obtain the location of t[tmp] and update it.

It may be exceedingly tricky to efficiently generate this kind of code in a single-pass top-down translator, but I have not studied that in detail so I don't know.


--
--

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Felipe Tavares
The assignment operator is implemented in this same function, as:

else { /* assignment -> '=' explist */
int nexps;
checknext(ls, '=');
nexps = explist(ls, &e);
if (nexps != nvars)
adjust_assign(ls, nvars, nexps, &e);
else {
luaK_setoneret(ls->fs, &e); /* close last expression */
luaK_storevar(ls->fs, &lh->v, &e);
return; /* avoid default */
}
}

If you take a closer look the important part is:

luaK_storevar(ls->fs, &lh->v, &e);

which is a function in lcode.c that:

/*
** Generate code to store result of expression 'ex' into variable 'var'.
*/
void luaK_storevar (FuncState *fs, expdesc *var, expdesc *ex) {

So if I could just update this expression `e` with a extra addition operation, it should work, keeping all the syntax etc because it's built with the same tools as the rest of the language instead of attacking the problem from the ground up.

That is of course, except if I am missing a big point here.

On Nov 19 2018, at 8:23 pm, Gé Weijers <[hidden email]> wrote:


On Mon, Nov 19, 2018 at 11:36 AM Felipe Tavares <felipe.oltavares@...> wrote:
Hi!

I am writing an lightweight (and bare bones) implementation of the += operator.

I already got it working, many thanks to the power patch by Olsen (http://lua-users.org/files/wiki_insecure/power_patches/5.2/compound-5.2.2.patch).

+= etc. is not simple to implement if you want to keep the regular Lua semantics, for instance in this case:

t[f()] += 1

This has to be translated into: 

local tmp = f()
t[tmp] = t[tmp] + 1
tmp = nil

If 't' has a metatable it may trigger __index and __newindex metamethod calls, so you can't just obtain the location of t[tmp] and update it.

It may be exceedingly tricky to efficiently generate this kind of code in a single-pass top-down translator, but I have not studied that in detail so I don't know.


--
--
Open Tracking
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Philippe Verdy
In reply to this post by Felipe Tavares
This has to do with the way expressions are parsed, using operator priorities: the other operators are generating thier own code in order you do not expect.

Le lun. 19 nov. 2018 à 20:36, Felipe Tavares <[hidden email]> a écrit :
Hi!

I am writing an lightweight (and bare bones) implementation of the += operator.

I already got it working, many thanks to the power patch by Olsen (http://lua-users.org/files/wiki_insecure/power_patches/5.2/compound-5.2.2.patch).

Diferently from Olsen, I added an extra token (TK_PLUSEQ) to the parser, and then in the assignment routine in lparser.c I did this:

static void assignment (LexState *ls, struct LHS_assign *lh, int nvars) {
expdesc e;
check_condition(ls, vkisvar(lh->v.k), "syntax error");
...
/* Language extension here */
else if (testnext(ls, TK_PLUSEQ)) {
// Get the rvalue expression
expr(ls, &e);
// Add it with the lvalue of the assignment
luaK_infix(ls->fs, OPR_ADD, &e);
luaK_posfix(ls->fs, OPR_ADD, &e, &lh->v, ls->linenumber);
// Store it back
luaK_storevar(ls->fs, &lh->v, &e);
return; /* avoid default */
}

...

Now, this works, but not in all cases:

Works:

local x = 1
x += 1

Doesn't (has no effect on the value of y.y):

local y = {
y = 1
}
y.y += 1

Doesnt (has no effect on z.z.z):

local z = {
z = {
z = 1
}
}

z.z.z += 1

I have no idea why it works in some cases and in others it doesn't; I am guessing it has something to do with the way the variable (lh->v) is actually represented?
Open Tracking
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Felipe Tavares
This was not the case at all. It was related to order of code generation but not parsing or operators.

I got it to work after some more debugging and the issue was simply that `luaK_posfix` calls (indirectly) `freereg` on indexed variables. So my references became invalid when I tried to assign data to indexed variables.

The solution was to mess around a bit so that in this case I can free the registers after assigning the new data to the indexed variable.

Em qua, 21 de nov de 2018 às 08:12, Philippe Verdy <[hidden email]> escreveu:
This has to do with the way expressions are parsed, using operator priorities: the other operators are generating thier own code in order you do not expect.

Le lun. 19 nov. 2018 à 20:36, Felipe Tavares <[hidden email]> a écrit :
Hi!

I am writing an lightweight (and bare bones) implementation of the += operator.

I already got it working, many thanks to the power patch by Olsen (http://lua-users.org/files/wiki_insecure/power_patches/5.2/compound-5.2.2.patch).

Diferently from Olsen, I added an extra token (TK_PLUSEQ) to the parser, and then in the assignment routine in lparser.c I did this:

static void assignment (LexState *ls, struct LHS_assign *lh, int nvars) {
expdesc e;
check_condition(ls, vkisvar(lh->v.k), "syntax error");
...
/* Language extension here */
else if (testnext(ls, TK_PLUSEQ)) {
// Get the rvalue expression
expr(ls, &e);
// Add it with the lvalue of the assignment
luaK_infix(ls->fs, OPR_ADD, &e);
luaK_posfix(ls->fs, OPR_ADD, &e, &lh->v, ls->linenumber);
// Store it back
luaK_storevar(ls->fs, &lh->v, &e);
return; /* avoid default */
}

...

Now, this works, but not in all cases:

Works:

local x = 1
x += 1

Doesn't (has no effect on the value of y.y):

local y = {
y = 1
}
y.y += 1

Doesnt (has no effect on z.z.z):

local z = {
z = {
z = 1
}
}

z.z.z += 1

I have no idea why it works in some cases and in others it doesn't; I am guessing it has something to do with the way the variable (lh->v) is actually represented?
Open Tracking


--
--
Felipe Tavares
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Domingo Alvarez Duarte
Hello Felipe !

Look at this https://github.com/mingodad/ljs where I've got ideas and
code from https://github.com/ex/Killa and
https://github.com/sajonoso/jual and fixed the code for "+= -= *= /= %="
to work with indexed variables.

See the table-decl.ljs .

There is a program to translate ".lua" to ".ljs" on folder lua2ljs it
uses lemon and re2c to parse lua files and convert to ljs (lua with
C/C++/Java/Javascript syntax, there is no reason for the extensions to
also work with the original lua syntax.

I also have it done for luajit but it's not yet published.

If anyone want to help I'm now looking for how to implement pre/post
increment/decrement "++"/"--" operators and a class system.

Cheers !




Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Felipe Tavares
Hey Domingo, I took a look and the solution in there is indeed a bit better than mine!

Instead of changing the source code to not free the registers, we reserve a new register before parsing (lparser.c, assign_compound()):


/* store compound results in a new register (needed for nested tables) */
luaK_reserveregs(fs, 1);
/* parse right-hand expression */
nexps = explist(ls, &rh);
check_condition(ls, nexps == 1, "syntax error in right hand expression in compound assignment");
infix = lh->v;
luaK_infix(fs,op,&infix);
luaK_posfix(fs, op, &infix, &rh, line);
luaK_storevar(fs, &(lh->v), &infix);

`++` and `--` are pretty easy! I think prefix is easier to start because you can stuff the mod in the code that does unary operations for stuff like `-1`. For suffix, it think it would go after the code for indexing. The implementation itself is easy, you just have to construct a `expdesc` for the value of 1, sum it with the variable and store it back, it should be mostly reusing the code for compound assignment.

Thanks a lot for this repos, they're gold!

On Nov 23 2018, at 10:41 am, Domingo Alvarez Duarte <[hidden email]> wrote:

Hello Felipe !

Look at this https://github.com/mingodad/ljs where I've got ideas and
code from https://github.com/ex/Killa and
https://github.com/sajonoso/jual and fixed the code for "+= -= *= /= %="
to work with indexed variables.

See the table-decl.ljs .

There is a program to translate ".lua" to ".ljs" on folder lua2ljs it
uses lemon and re2c to parse lua files and convert to ljs (lua with
C/C++/Java/Javascript syntax, there is no reason for the extensions to
also work with the original lua syntax.

I also have it done for luajit but it's not yet published.

If anyone want to help I'm now looking for how to implement pre/post
increment/decrement "++"/"--" operators and a class system.

Cheers !
Open Tracking
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Domingo Alvarez Duarte

Hello !

I've updated the repositories with fixes for compound operators on upvalues and also added pre/pos increment operators "++/--".

Cheers !

On 24/11/2018 05:29, Felipe Tavares wrote:
Hey Domingo, I took a look and the solution in there is indeed a bit better than mine!

Instead of changing the source code to not free the registers, we reserve a new register before parsing (lparser.c, assign_compound()):


/* store compound results in a new register (needed for nested tables) */
luaK_reserveregs(fs, 1);

/* parse right-hand expression */
nexps = explist(ls, &rh);
check_condition(ls, nexps == 1, "syntax error in right hand expression in compound assignment");

infix = lh->v;
luaK_infix(fs,op,&infix);

luaK_posfix(fs, op, &infix, &rh, line);
luaK_storevar(fs, &(lh->v), &infix);

`++` and `--` are pretty easy! I think prefix is easier to start because you can stuff the mod in the code that does unary operations for stuff like `-1`. For suffix, it think it would go after the code for indexing. The implementation itself is easy, you just have to construct a `expdesc` for the value of 1, sum it with the variable and store it back, it should be mostly reusing the code for compound assignment.

Thanks a lot for this repos, they're gold!

On Nov 23 2018, at 10:41 am, Domingo Alvarez Duarte [hidden email] wrote:

Hello Felipe !

Look at this https://github.com/mingodad/ljs where I've got ideas and
https://github.com/sajonoso/jual and fixed the code for "+= -= *= /= %="
to work with indexed variables.

See the table-decl.ljs .

There is a program to translate ".lua" to ".ljs" on folder lua2ljs it
uses lemon and re2c to parse lua files and convert to ljs (lua with
C/C++/Java/Javascript syntax, there is no reason for the extensions to
also work with the original lua syntax.

I also have it done for luajit but it's not yet published.

If anyone want to help I'm now looking for how to implement pre/post
increment/decrement "++"/"--" operators and a class system.

Cheers !
Open Tracking
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Sam Putman


On Wed, Dec 5, 2018 at 2:00 AM Domingo Alvarez Duarte <[hidden email]> wrote:

Hello !

I've updated the repositories with fixes for compound operators on upvalues and also added pre/pos increment operators "++/--".

Open Tracking
If you don't mind my asking, what does this do?

tab[++#tab] = val

let's say #tab was 3, what happens? 
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

David Favro
In reply to this post by Domingo Alvarez Duarte


On December 5, 2018 11:59:47 AM UTC, Domingo Alvarez Duarte <[hidden email]> wrote:
>I've updated the repositories with fixes for compound operators on
>upvalues and also added pre/pos increment operators "++/--".

How do you differentiate "--" as an operator from a comment-start?

I have been an advocate for the += operator, which I think is sorely lacking in Lua, for many years [1].  But in my opinion ++ and -- are much less useful, and when used give fewer advantages [2], and pose more issues, such as already being used for comments.  Yet for some reason people can't help lumping them all together and then using the issues with the unary operators as a purported complication of the binary operator, as well as increasing the overall size of the addition to the language, leading to objections of 'bloat'.

-- David

[1]: I mean in core Lua.  I consider patches and forks to be vastly less useful.

[2]: ++ and -- are very frequently used in languages such as C for loop iteration.  In Lua, iteration takes place much more often via ipairs(), and even the numeric for loop does not require explicit increment. Furthermore, the ++ operator in C often compiles to a distinct hardware instruction; while in the Lua VM, it would presumably generate code no different than "x+=1" .

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Dirk Laurie-2
Op Do. 6 Des. 2018 om 00:51 het David Favro <[hidden email]> geskryf:

> I have been an advocate for the += operator, which I think is sorely lacking in Lua, for many years [1].

Are you an advocate only for +=. or for similar operators for all
binary operators?

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Frank Kastenholz-2
In reply to this post by David Favro
One place where I think that “++” or equivalent might be as syntactic sugar for adding something to the end of a list

    Fubar[++]=...
As shorthand for
    Fubar[#Fubar+1]=...

But it’s not that big a deal

Frank



> On Dec 5, 2018, at 5:50 PM, David Favro <[hidden email]> wrote:
>
>
> I have been an advocate for the += operator, which I think is sorely lacking in Lua, for many years [1].  But in my opinion ++ and -- are much less useful, and when used give fewer advantages [2], and pose more issues, such as already being used for comments.  Yet for some reason people can't help lumping them all together and then using the issues with the unary operators as a purported complication of the binary operator, as well as increasing the overall size of the addition to the language, leading to objections of 'bloat'.
>
> -- David
>
> [1]: I mean in core Lua.  I consider patches and forks to be vastly less useful.
>
> [2]: ++ and -- are very frequently used in languages such as C for loop iteration.  In Lua, iteration takes place much more often via ipairs(), and even the numeric for loop does not require explicit increment. Furthermore, the ++ operator in C often compiles to a distinct hardware instruction; while in the Lua VM, it would presumably generate code no different than "x+=1" .
>


Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Philippe Verdy-2
Shouldn't that be simply Foobar.add(value) if we assume they're no member named "add" but it is defined via the metatable __index method? And shouldn't there exist a way to cleanly refer to a member via its __index metamethod rather than its defined index, using a simpler form such as: Foobar->add(value) which would ignore any value assigned to Foobar['add']? We could then have a distinction between actual members and metamembers coming from a prototype. This would mean adding support for __proto in the metatable, allowing us to support not just classes but interfaces.

Le jeu. 6 déc. 2018 à 02:45, Frank Kastenholz <[hidden email]> a écrit :
One place where I think that “++” or equivalent might be as syntactic sugar for adding something to the end of a list

    Fubar[++]=...
As shorthand for
    Fubar[#Fubar+1]=...

But it’s not that big a deal

Frank



> On Dec 5, 2018, at 5:50 PM, David Favro <[hidden email]> wrote:
>
>
> I have been an advocate for the += operator, which I think is sorely lacking in Lua, for many years [1].  But in my opinion ++ and -- are much less useful, and when used give fewer advantages [2], and pose more issues, such as already being used for comments.  Yet for some reason people can't help lumping them all together and then using the issues with the unary operators as a purported complication of the binary operator, as well as increasing the overall size of the addition to the language, leading to objections of 'bloat'.
>
> -- David
>
> [1]: I mean in core Lua.  I consider patches and forks to be vastly less useful.
>
> [2]: ++ and -- are very frequently used in languages such as C for loop iteration.  In Lua, iteration takes place much more often via ipairs(), and even the numeric for loop does not require explicit increment. Furthermore, the ++ operator in C often compiles to a distinct hardware instruction; while in the Lua VM, it would presumably generate code no different than "x+=1" .
>


Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Domingo Alvarez Duarte
In reply to this post by Sam Putman

Hello Sam !

First of all thanks for commenting !

I didn't tried something like this before so I wrote this small program and tried to compile/run:
====
var val = 3;
var tab = [1,2];
tab[++#tab] = val;
====
Output:
====
ljs: test-pp.ljs:3: unexpected symbol near '#'
====
But it works this way showing a bug ?:
====
var val = 3;
var tab = [1,2];
tab[++(#tab)] = val;
====
Output:
====
ljs test-pp.ljs
1    1
2    3
====

The way it's implemented right now reject it, but a way I usually use it in situations like this is as follow:
====
var tab = {"one": 1, "two" : 2, "three": 3, "four": 4};
var ary = [];
var idx = 0;
for(k,v in pairs(tab)) ary[++idx] = [k,v];
table.sort(ary, function(a,b){ return a[2] < b[2];});

for( idx, v in ipairs(ary)) print(idx, v[1], v[2]);
====
Output:
====
ljs test-pp2.ljs
1    one    1
2    two    2
3    three    3
4    four    4
===

I appreciate your comments.

Again thanks in advance for your time and attention !

On 05/12/2018 21:16, Sam Putman wrote:


On Wed, Dec 5, 2018 at 2:00 AM Domingo Alvarez Duarte <[hidden email]> wrote:

Hello !

I've updated the repositories with fixes for compound operators on upvalues and also added pre/pos increment operators "++/--".

Open
                  Tracking
If you don't mind my asking, what does this do?

tab[++#tab] = val

let's say #tab was 3, what happens? 
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Domingo Alvarez Duarte
In reply to this post by David Favro
Hello David !

I would appreciate if you could give a look at
https://github.com/mingodad/ljs and give your point of view, knowing
that you have been advocating for such features so long.

I understand you didn't like patches and or forks but anyway I'll
appreciate if you could find some spare time and give it a try.

Thanks again for your time and attention !

On 05/12/2018 23:50, David Favro wrote:

>
> On December 5, 2018 11:59:47 AM UTC, Domingo Alvarez Duarte <[hidden email]> wrote:
>> I've updated the repositories with fixes for compound operators on
>> upvalues and also added pre/pos increment operators "++/--".
> How do you differentiate "--" as an operator from a comment-start?
>
> I have been an advocate for the += operator, which I think is sorely lacking in Lua, for many years [1].  But in my opinion ++ and -- are much less useful, and when used give fewer advantages [2], and pose more issues, such as already being used for comments.  Yet for some reason people can't help lumping them all together and then using the issues with the unary operators as a purported complication of the binary operator, as well as increasing the overall size of the addition to the language, leading to objections of 'bloat'.
>
> -- David
>
> [1]: I mean in core Lua.  I consider patches and forks to be vastly less useful.
>
> [2]: ++ and -- are very frequently used in languages such as C for loop iteration.  In Lua, iteration takes place much more often via ipairs(), and even the numeric for loop does not require explicit increment. Furthermore, the ++ operator in C often compiles to a distinct hardware instruction; while in the Lua VM, it would presumably generate code no different than "x+=1" .
>

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

David Favro
In reply to this post by Dirk Laurie-2


On December 5, 2018 11:57:25 PM UTC, Dirk Laurie <[hidden email]> wrote:
>Op Do. 6 Des. 2018 om 00:51 het David Favro <[hidden email]>
>geskryf:
>
>Are you an advocate only for +=. or for similar operators for all
>binary operators?

Just +=, as it comes up for me from time to time, although it's not at the top of my list of desired features (numeric pattern captures and a true ternary conditional operator easily edge it out).

But the point of my message was not so much to "advocate" as to point out that the additive assignment operator should not be conflated with the unary increment/decrement as they are often treated as a group.

OTOH, your comment has provoked a little thought journey for me into places like "..=".  But I won't let you drag me into group 3! :-)

-- David

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

David Favro
In reply to this post by Domingo Alvarez Duarte


On December 6, 2018 10:43:51 AM UTC, Domingo Alvarez Duarte <[hidden email]> wrote:
>Hello David !
>
>I would appreciate if you could give a look at
>https://github.com/mingodad/ljs 
>and give your point of view, knowing that you have been advocating for
>such
>features so long.

It looks interesting although I don't have time for a detailed look until later... but from a quick look it seems that you are using curly-bracket syntax, so I guess Lua-style comments do not even apply.  I didn't realize it was a completely different syntax, I should have read your original message a little more carefully before I even mentioned it.

>> How do you differentiate "--" as an operator from a comment-start?

Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Sven Olsen
In reply to this post by Dirk Laurie-2
Are you an advocate only for +=. or for similar operators for all
binary operators?

As someone who's implemented and used `..=`, my 2 cents would be that adding it to the language is actually unwise.  If you're going to create a large string through repeated concatenations, you should almost certainly be using some kind of pattern that leverages table.concat().  It's far more efficient because it avoids the allocation of intermediate strings.  `..=` just creates the temptation to build long strings in a way that wastes a lot of memory and processing time. I'm still cleaning up ancient parts of my own codebase where I used `..=` when I should have used table.concat().

Arithmetic compound assignment ops are nice to have, but, the generalization to a compound assignment for all binary ops is a bad idea, imo.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

William Ahern
On Thu, Dec 06, 2018 at 03:22:45PM -0800, Sven Olsen wrote:

> >
> > Are you an advocate only for +=. or for similar operators for all
> > binary operators?
> >
>
> As someone who's implemented and used `..=`, my 2 cents would be that
> adding it to the language is actually unwise.  If you're going to create a
> large string through repeated concatenations, you should almost certainly
> be using some kind of pattern that leverages table.concat().  It's far more
> efficient because it avoids the allocation of intermediate strings.  `..=`
> just creates the temptation to build long strings in a way that wastes a
> lot of memory and processing time. I'm still cleaning up ancient parts of
> my own codebase where I used `..=` when I should have used table.concat().
>
> Arithmetic compound assignment ops are nice to have, but, the
> generalization to a compound assignment for all binary ops is a bad idea,
> imo.
>

FWIW, string concatenation operators in Perl, for example, are very
efficient because Perl strings are mutable, and the parser and interpreter
are very clever about optimizing string ops, including implementing CoW.
Code like

  $foo .= $a . $b . $c

is basically a memcpy from $a into $foo, $b into $foo and then $c into $foo.
Much of Perl's crazy syntax, semantics, and insanely ugly internals is a
product of the emphasis on optimizing string operations.

For a language like Lua I think a better solution to performant string
construction would be something like Java's StringBuffer. This is trivial to
do in Lua because the language emphasizes making use of the C API, and
userdata objects are first-class in terms of language treatment. The real
issue is that the Lua ecosystem isn't a batteries included kind of
environment.


Reply | Threaded
Open this post in threaded view
|

Re: The += operator, again

Philippe Verdy
I do agree that Lua should have an additional type for mutable strings (or more generaly vectors, not just tables: this could be done using the internal representation of tables already optimizing the dense index of integer keys, internally using 3 arrays, 1 for the dense part, 1 for the hashmap, 1 for the values mapped from the hashmap, but with a bit more deeply representation so that the dense part cannot contain any nil value).

This does not necessrily needs a new exposed type (this could be still exposed as a Lua table with an internal subtype, autodetected by key types forming a contiguous range of integers).
The values array in that case can directly contain the 21-bit codepoints if this is restricted to valid UTF-32, or could do like Javascript using an array of 16 bit integers, compatible with UTF-16 even if it allows unpaired surrogates, or an array of 32-bit integers, compatible with UTF-32 even if it contains out-of-range UCS-4 non-characters outside the valid 17 planes of the standard UCS.

When this is done, there's a way to create strings that would be "automutable", and a way to transform a muted string to an atomized string in the global shared table of immutable strings. As well any string could be derived: extracting substrings would just reuse the internal arrays and only the assignment in specific key positions (or other functions like insertion/replacements) would duplicate the string to make it mutable.

Anyway the problem is not only there: incremental concatenation also has a cost because there are multiple reallocations (but these reallocations are not optimal when they are immediately "atomized" to immutable strings, causing lot of work in the garbage collector: reallocation made on mutable strings would be more lazy, allocating lengths with some gaps at end (that the garbage collector could still reduce when needed).

This optimization should be generalized for tables (and all arrays), allowing them to become atomized on demand to shared immutable instances (but still without blocking them to be deatomized when motified).

The complication however is in the GC: it would have to handle mutable and immutable tables, and mutable tables would need to be collected by compressing their unused part (and possibly also recomputing its best "dense" part of integer keys. There should also exist an optimizing hint about how many "nil" keys are allowed in the dense part (of a string, this does not make sense as their character index normally forms a continuous range from 1 to #string, but we can imagine situations where #string counts only the first part of the string where all keys in 1..N are assigned a non-nil value, but still allows other indexes to be assigned in the dense part without being part of the canonical string. With this hint (e.g. allow up to 25% of nil keys in the dense part, storing other keys using the hashmap, or no more than 0% for strict dense representation; my opinion is that 50% is enough to avoid most pathetic reallocations; but we could still repack the unused part on demand, notably at end of strings/arrays, if we need to preserve memory, but the GC could do that work itself without any explicit demand by the Lua code, and it is intructed to repack strings using a smaller threshold such as 12.5% or even 0% if this only repacks the starting or ending range of integer keys because it implies to growth of the hashmap for the non-dense part).


Le ven. 7 déc. 2018 à 18:07, William Ahern <[hidden email]> a écrit :
On Thu, Dec 06, 2018 at 03:22:45PM -0800, Sven Olsen wrote:
> >
> > Are you an advocate only for +=. or for similar operators for all
> > binary operators?
> >
>
> As someone who's implemented and used `..=`, my 2 cents would be that
> adding it to the language is actually unwise.  If you're going to create a
> large string through repeated concatenations, you should almost certainly
> be using some kind of pattern that leverages table.concat().  It's far more
> efficient because it avoids the allocation of intermediate strings.  `..=`
> just creates the temptation to build long strings in a way that wastes a
> lot of memory and processing time. I'm still cleaning up ancient parts of
> my own codebase where I used `..=` when I should have used table.concat().
>
> Arithmetic compound assignment ops are nice to have, but, the
> generalization to a compound assignment for all binary ops is a bad idea,
> imo.
>

FWIW, string concatenation operators in Perl, for example, are very
efficient because Perl strings are mutable, and the parser and interpreter
are very clever about optimizing string ops, including implementing CoW.
Code like

  $foo .= $a . $b . $c

is basically a memcpy from $a into $foo, $b into $foo and then $c into $foo.
Much of Perl's crazy syntax, semantics, and insanely ugly internals is a
product of the emphasis on optimizing string operations.

For a language like Lua I think a better solution to performant string
construction would be something like Java's StringBuffer. This is trivial to
do in Lua because the language emphasizes making use of the C API, and
userdata objects are first-class in terms of language treatment. The real
issue is that the Lua ecosystem isn't a batteries included kind of
environment.


12