Why no for table-iterator like e. g. "hashpairs"?

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
Additionally to pairs and ipairs, why does there not exist an interator
called e. g. "hashpairs"?

I assume quite often a very long listed/indexed table might have
additionally some further key-value pairs with "hash-keys". E. g. a
polyline/xy-curve with very many points might additionally have a name field
and maybe some further info fields.

If I want to iterate over such "further info fields" in a fast way in lua
(and I do NOT know the key names, or I possibly would like to check such
"further key names"), do I really have to iterate over the complete table
with pairs? (is this not a complete waste of time, as I then possibly get
also all the ipairs entries of the table for iteration in my for loop?).

(If I look at the Table struct in lobject.h, then it should be quite
straight forward to start iteration at the element "node" (I assume this is
the first hash key - value pair...?), and in such of such tables with large
array part (and I assume it is very common that the array part is large?)
this would speed up the for loop then tremendeously?)


PS: If pairs (or next) it used in a for loop, the book "Programming in Lua"
gives a warning, that the indexed and hashed keys might arrive in
"accidential order" (chapter "Table Traversal") - is this reality or does
this just describe some "very cautious programming practice?

( ... As I understand the pairs C source code, it should be clear that first
all indexed key-value pairs 1, 2, ...#t should appear, and then then
the hash pairs (those then might arrive in "accidential order" I assume ..)
 ... is this correct understanding from my side?)

--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Daurnimator
On Fri, 1 Nov 2019 at 19:55, bil til <[hidden email]> wrote:
> If I want to iterate over such "further info fields" in a fast way in lua
> (and I do NOT know the key names, or I possibly would like to check such
> "further key names"), do I really have to iterate over the complete table
> with pairs? (is this not a complete waste of time, as I then possibly get
> also all the ipairs entries of the table for iteration in my for loop?).

No. See `next` (and infact pairs uses next under the covers).

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
THank you for your answer.

Yes I understand, in the manual it says "The order in which the indices are
enumerated is not specified, even for numeric indices."

... I just somehow fail to believe this can happen...

I just tried the following table in lua:
   t= { 1, 2, "test", 4, x=5, y= 6}
   for k, v in pairs( t) do
      print( k, v)
   end

The resulting printout ALWAYS gives the index/list part of the table (1, 2,
"test", 4) in CORRECT order ... only the hashed x and y can appear in
reversed order.

This is what I would also expect when I look at the lua c code ... mixing
up the index/list part of the table I think is nearly NOT possible.
 Possibly this Reference manual warning is only some "historic warning"
which is a remainder from some previous lua versions, where the indexed
 part had not been organized as linear array?

If I can be sure that the indexed part (1, 2, ...#t ) is handled first by next,
then I could at least use next(#t) to start iteration with the hash part...
(if I would have a "stupid table" where the index part contains nil's, then
it would presumably start after this first nil, but I would be fine with this...)


... although such a for-iterator "hashpairs" of course would be really
much nicer... (an alternative might be to allow pairs( t, #t)).





--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Luiz Henrique de Figueiredo
> "The order in which the indices are
> enumerated is not specified, even for numeric indices."
>
> ... I just somehow fail to believe this can happen...

For me (using Lua 5.3.5), the code below shows that this can indeed
happen. Run it a few times if needed.

t= { 1, 2, "test", 4, x=5, y= 6}
for n=1,1000 do t[math.random(10)]=n end
for k, v in pairs( t) do print( k, v) end

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
... this is a nasty example, as then the table contains nil...

Could we perhaps change the code slightly, to make my
point more clear:

t= {1, 2, 3, x=10000, y= 11111} -- just to have some nice table start
for i= 1, 1000 do
  local n= math.random(1000)
  t[n]= n
end

print( #t)

for i= 1, 10 do
  print( i, t[i])
end

This produced the following output for #t and the table t in my case:
440
1       1
2       2
3       3
4       nil
5       5
6       6
7       7
8       8
9       nil
10      10

So this is a "really nasty table" now... it has "nil-holes"
distributed nicely in its key-value range 1...1000,
and #t just gives some sort of
"estimated length".

To confirm that 440 is at least "quite reasonable", you can
print the table range 335...445, then I got:
435     nil
436     nil
437     nil
438     438
439     439
440     440
441     nil
442     nil
443     443
444     nil
445     nil

... so this confirms, that by some "reaonably fast checking of #t"
(= avoiding time-eating linear search) 440 might be some
"reaonable value for #t".

... ok ...

but anyway:
here ALL the numbered keys appear nicely ordered if
I iterate with for using the pairs iterator:

I would be happy already concerning the first 440 ones,
but in fact you can test the complete table like this:

k_last= 0
for k, v in pairs(t) do
  if type(k) == 'number' then
    if k <= k_last then
      print ("sorting error in 1..#t: " .. k)
    end
    k_last= k
  end
end
print( k_last)

... and the result will be:
1000

... so this ran nicely up to 1000 without
any error message, all nicely sorted for the
number indices...

(but I am happy to restrict our discussion for the
range 1...#t
 I am happy if we should agree,
that in this range FOR SURE the keys will arrive
in sorted order... and that these keys 1...#t
will appear, BEFORE any other keys / any hash
keys will appear...

this is really helpful to know definitely... this is NOT
an academic question without real practical output,
I hope you also agree in this.)


PS: I felt a bit insure about my interpretation
above concerning this number #t = 440 ...
To check I just counted all "number types"
in the range 1...1000:

> ncount= 0
> for k, v in pairs(t) do
>> if type(k) == 'number' then
>>   ncount= ncount + 1
>> end
>> end
> print( ncount)
656

... so there are 656 non-nils ... so my
interpretation of 440 above seems to be correct.

... but this is NOT my crucial point.

I only want to be sure about the following:

If I iterate a table with pairs (or next), then
the iteration will ALWAYS start with the
listed / indexed range 1...#t, and this range
will ALWAYS appear "nicely ordered".

--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Roberto Ierusalimschy
In reply to this post by Luiz Henrique de Figueiredo
> > "The order in which the indices are
> > enumerated is not specified, even for numeric indices."
> >
> > ... I just somehow fail to believe this can happen...
>
> For me (using Lua 5.3.5), the code below shows that this can indeed
> happen. Run it a few times if needed.
>
> t= { 1, 2, "test", 4, x=5, y= 6}
> for n=1,1000 do t[math.random(10)]=n end
> for k, v in pairs( t) do print( k, v) end

Another example is this:

$ lua -e 't={x=1,y=2,[1]=3,[2]=4,[3]=5}; for k,v in pairs(t) do print(k,v) end'

This also may need a few runs to show the behavior.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Francisco Olarte
In reply to this post by bil til
On Fri, Nov 1, 2019 at 4:27 PM bil til <[hidden email]> wrote:
> ... this is a nasty example, as then the table contains nil...

Tables in lua can not contain nil. This one of the things
differentiating local variables and table "slots" ( globals are inside
a table ). In fact, you assign nil to a table index to delete it.

> So this is a "really nasty table" now... it has nils distributed nicely in
> its key-value range 1...1000, and #t just gives some sort of
> "estimated length".

That is because it is not a sequence. Anyway, #t does not give an
stimated length, it gives what is defined in the manual:

>>>
The length operator applied on a table returns a border in that table.
A border in a table t is any natural number that satisfies the
following condition:

     (border == 0 or t[border] ~= nil) and t[border + 1] == nil

In words, a border is any (natural) index in a table where a non-nil
value is followed by a nil value (or zero, when index 1 is nil).
<<<

And, at the start of the next line:

"A table with exactly one border is called a sequence."

......
> ... so this ran nicely up to 1000 without
> any error message, all nicely sorted for the
> number indices...

I think this is because you have a "dense" table and, IIRC, in lua 5.3
the hash value of an integer is itself ( which is open to some sort of
attacks, but good for many things ). OTOH this has the nice property
of easy enumeration of relatively dense tables.

Trying to prove my point I did a simple check in5.3.3 ( its the one
which comes with debian, but I suspect results would be the same for
5.3.5 ) and succeeded at the firs attempt not only to create an out of
order enumeration, but curiously an strictly reversed one:

>>>>
$ lua
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> t={[11111]=1,[22222]=2,[33333]=3,[44444]=4}
> for k,v in pairs(t) do print(k,v) end
44444    4
33333    3
22222    2
11111    1
<<<<

I suspect a simple analysis can reveal I'm either getting collisions (
unlikely ) or they are their own hashes and the table size is 4 or
8:
>>>
> for k,v in pairs(t) do print(k,v, k%4, k%8, k%16) end
44444    4    0    4    12
33333    3    1    5    5
22222    2    2    6    14
11111    1    3    7    7
<<<

Reading the source, it's quite commented, or asking a more
knowledgeable person may shed more light.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Francisco Olarte
Replying to myself, I was curious and did some more experiments.
Sending them to try to convince people to not trust order in
non-sequences:

On Fri, Nov 1, 2019 at 5:47 PM Francisco Olarte <[hidden email]> wrote:
> >>>>
> > for k,v in pairs(t) do print(k,v, k%4, k%8, k%16) end
> 44444    4    0    4    12
> 33333    3    1    5    5
> 22222    2    2    6    14
> 11111    1    3    7    7
> <<<

It seems to be 4:> t={[3]=1,[6]=2,[9]=3,[12]=4}
> for k,v in pairs(t) do print(k,v, k%4, k%8, k%16) end
12    4    0    4    12
9    3    1    1    9
6    2    2    6    6
3    1    3    3    3

And, as I suspected, insertion order matters:

> t={[3]=1,[6]=2,[9]=3,[13]=4}
> for k,v in pairs(t) do print(k,v, k%4, k%8, k%16) end
13    4    1    5    13
9    3    1    1    9
6    2    2    6    6
3    1    3    3    3
> t={[3]=1,[6]=2,[13]=4,[9]=3}
> for k,v in pairs(t) do print(k,v, k%4, k%8, k%16) end
9    3    1    1    9
13    4    1    5    13
6    2    2    6    6
3    1    3    3    3
>
( This seems to indicate modulo 2^n, open hashing, I did not remember
the details, read it many versions ago )

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Roberto Ierusalimschy
In reply to this post by Roberto Ierusalimschy
> Another example is this:
>
> $ lua -e 't={x=1,y=2,[1]=3,[2]=4,[3]=5}; for k,v in pairs(t) do print(k,v) end'

Note that, in this example, the table is a proper sequence since the
beginning.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
In reply to this post by Francisco Olarte
Thank you for your answers.

@Francisco:

But these tables you cite here have NO indexed / list part.

(as they do NOT start at 1).

So for your tables print(#t) gives zero ... they are
"hash only" tables ...

... I want only a clear statement about the indexed /
list part of the tables ... if a table has NO indexed part,
then it is ok for me, that the order will be "accidential".

@ Roberto:
This example is really disturbing I must admit.
Here really #t will give 3, but strangely the pair
iterator strangely starts with x, y, 3, 1, 2... .

Very odd!

But you have to admit, that this is a "somehow
bizarre" constructor, using a "one-line-constructor"
which lists the indexed items "at the end".

If you do it in a "more normal way", also possibly
the hashed key first, but then the indexed table
in a separate command, then all fine:

> t={x=1,y=2,};
> for i= 1, 3 do t[i]= i end
> for k,v in pairs(t) do print(k,v) end
1       1
2       2
3       3
y       2
x       1

So could we agree for the following statement:
The indexed part of a "regularly constructed table"
should be constructed separately from the
hash part. Or if one constructor command is used,
the constructor must start with the indexed part.

If I iterate such a "regularly constructed table"
with pairs (or next), then
the iteration will ALWAYS start with the
listed / indexed range 1...#t, and this range
will ALWAYS appear "nicely ordered".

Or do you have also a counter-example against this?

PS: Sorry, found myself:

 t={[1]=3,[2]=4,[3]=5,x=1,y=2};

...also fails...

But if I would do it the "normal way", as every
lazy person would do it, and I hope all people
ARE lazy:

     
t={3,4,5,x=1,y=2}; for k,v in pairs(t) do print(k,v) end
   
Then it woks NICELY... .

So one further try to get an agreement.
If I state:
The indexed part of a "regularly constructed table"
should be constructed separately from the
hash part. Or if one constructor command is used,
the constructor must start with the indexed part
as colon enumeration (without specifying the
index keys [1], [2], [3], ...).

If I iterate such a "regularly constructed table"
with pairs (or next), then
the iteration will ALWAYS start with the
listed / indexed range 1...#t, and this range
will ALWAYS appear "nicely ordered".

Or do you have also a counter-example against this?

PPS:
 t={x=1,y=2,3,4,5}; for k,v in pairs(t) do print(k,v) end

also works...

So I can formulate my definition for "regularly
constructed table" more wide:

The indexed part of a "regularly constructed table"
should be constructed separately from the
hash part. Or if one constructor command is used,
the indexed part must NOT specify the keys
explicitly (do NOT use [1]=..., [2]=..., [3]=..., ...).

If I iterate such a "regularly constructed table"
with pairs (or next), then
the iteration will ALWAYS start with the
listed / indexed range 1...#t, and this range
will ALWAYS appear "nicely ordered".

Or do you have also a counter-example against this?

--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Roberto Ierusalimschy
> So could we agree for the following statement:
> The indexed part of a "regularly constructed table"
> should be constructed separately from the
> hash part. Or if one constructor command is used,
> the constructor must start with the indexed part.
>
> If I iterate such a "regularly constructed table"
> with pairs (or next), then
> the iteration will ALWAYS start with the
> listed / indexed range 1...#t, and this range
> will ALWAYS appear "nicely ordered".
>
> Or do you have also a counter-example against this?

If you define a "regularly constructed table" as one which iteration
shows integer keys nicely ordered, then for sure there won't be counter
examples :-)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
Thank you for this confirmation.

But then you should adapt the first example in your chapter "Table
Traversal" in your book "Programming in Lua", which really is super-nice
(but possibly this example is from times of lua 2.0?). Presumably I will
order a T-Shirt "Found ONE error in 'Programming in Lua'", and if the tale
about the "Valiant little Tailor" makes any sense in my understanding, then
next time when I arrive at SFO airport with this T-Shirt I will presumably
marry the princess of Silicon Valley and become the king of the Silicon
Valley :).

So the ONLY possibility to create such a non-nice table would be this
constructor definition line for tables?

And is it not somehow quite easy to "educate this constructor-line command"
to just handle such indices of type [.. some small integer number 1,2,3 ...]
in a more nice / compliant way?

Because I am really humbly quite sure, that MANY MANY people would like it
very much, if you could state at the pairs/ next iterator, that it can be
guaranteed that the ipairs "shoot first", and in regular order as to be
expected for ipairs... .

PS: Or possilby to write some function table.regularize or so?

(There is not possibly some "ingenious trick" who does this already?
... somehow at the least the table seems to know quite well, that its
index part has a "reasonable range" 1...#t, otherwise #t would not
return 3???)

PPS: One further question: So in this irregular table, in the Table
struct of lobject.h, then the array pointer still is zero, and the keys
1..3 are kept hashed?

PPPS: Just found one trick, but this is really so dirty, that it is not
really usable I think. I can use a for loop to double the amount of
elements in the table (but unfortunately #t new elements is NOT
sufficient, I really need to increase to the next binary exponent
I assume, so in this case to 16...).
Then as expected the table re-organizes and
suddenly pairs works nicely with "ipairs shooting first".
And then I remove the last elements again with table.remove()...

So here my super-ingeious educational code:
> t={x=1,y=2,[1]=3,[2]=4,[3]=5};
> t_len=#t
> -- now calculate somehow the next 2-exponent number,
> -- but too lazy for this here, so let's "know" that this is 16
> for i= 1,16-t_len do t[i+t_len]= i end
> for i= 1,16-t_len do table.remove( t) end
>  for k,v in pairs(t) do print(k,v) end
1       3
2       4
3       5
y       2
x       1
-- Heureka - the table has been educated!


Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Philippe Verdy
In reply to this post by bil til
#t in your case should only return 3 (i.e. all integer indexes between 1 and 3 are set to non-nil and index 4 is nil) not 440 (which would be correct if all indexes from 1 to 440 were not nil, but this is false here for index 4; all we know is that index 441 is not-nil, but it is not the highest index whose next is not nul, as you can see at [443]).

How the "#t" operator is computed is crazy when tables are filled in random order, or reverse order, even when all indexes are integers only...).
Ideally there can exist an optimization in the table implementation where an array (indexed by hashes) is used for collision lists of (key,value) pairs, and a second array is used to contain values directly without any collision (but possibly containing nil values): if the second direct array of values can contain nil values, then you save a lot when setting values in the table, without having t oscan the array constantly when one allocated position is set to nil (including the case where some slots may be preallocated to nil values to save the cost of reallocations): setting a non-nil value at a positive integer key higher than the last estimation would set increase an hidden "length-max" field, setting a nil value to a positive integer key value lower than the last estimation would decrease an hidden "length-min" field.

But the operator "#t" should then be able to scan from "lengh-min" to "length-max" to determine the first positive integer position with a non-nil value which is followed by nil-value: this is the only case where a table scan would be needed, and then it would set "length-min" and "lengh-max" to that determined position, and it would be very fast.

But this also means that the integer-index can then be sparse, and the standard hash table would be used for all other keys that are not positive integers or for integers that are too large without creating a large gap. The table implementation would determine when to move some integer keys from the hash table to the array, based on a simple "fill ratio" metric (the table can knows the current maximum size of the array indexed by integer, and how many positions have non-nil values, it can also count the number of collisions per slot in the hash table and when as long this is low, there's no need to extend the number of slots, but above some fill level for collision lists, the integers present in the hash table could be moved to the integer-ba

Le ven. 1 nov. 2019 à 16:27, bil til <[hidden email]> a écrit :
... this is a nasty example, as then the table contains nil...

Could we perhaps change the code slightly, to make my
point more clear:

t= {1, 2, 3, x=10000, y= 11111} -- just to have some nice table start
for i= 1, 1000 do
  local n= math.random(1000)
  t[n]= n
end

print( #t)

for i= 1, 10 do
  print( i, t[i])
end

This produced the following output for #t and the table t in my case:
440
1       1
2       2
3       3
4       nil
5       5
6       6
7       7
8       8
9       nil
10      10

So this is a "really nasty table" now... it has nils distributed nicely in
its key-value range 1...1000, and #t just gives some sort of
"estimated length".

To confirm that 440 is at least "quite reasonable", you can
print the table range 335...445, then I got:
435     nil
436     nil
437     nil
438     438
439     439
440     440
441     nil
442     nil
443     443
444     nil
445     nil

... so this confirms, that by some "reaonably fast checking of #t"
(= avoiding time-eating linear search) 440 might be some
"reaonable value for #t".

... ok ...

but anyway:
here ALL the numbered keys appear nicely ordered if
I iterate with for using the pairs iterator:

I would be happy already concerning the first 440 ones,
but in fact you can test the complete table like this:

k_last= 0
for k, v in pairs(t) do
  if type(k) == 'number' then
    if k <= k_last then
      print ("sorting error in 1..#t: " .. k)
    end
    k_last= k
  end
end
print( k_last)

... and the result will be:
1000

... so this ran nicely up to 1000 without
any error message, all nicely sorted for the
number indices...

(but I am happy to restrict our discussion for the
range 1...#t
 I am happy if we should agree,
that in this range FOR SURE the keys will arrive
in sorted order... and that these keys 1...#t
will appear, BEFORE any other keys / any hash
keys will appear...

this is really helpful to know definitely... this is NOT
an academic question without real practical output,
I hope you also agree in this.)




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Francisco Olarte
In reply to this post by bil til
On Fri, Nov 1, 2019 at 6:24 PM bil til <[hidden email]> wrote:
> @Francisco:
> But these tables you cite here have NO indexed / list part.

Yes they do, it just happens to be empty.

> (as they do NOT start at 1).

> So for your tables print(#t) gives zero ... they are
> "hash only" tables ...

> ... I want only a clear statement about the indexed /
> list part of the tables ... if a table has NO indexed part,
> then it is ok for me, that the order will be "accidential".

What you are not accounting for is that, after a few manipulations, a
table might end with some unexpected values in the hash part. Lua
tables are just dictionaries with an optimization for small ( defined
as "starting at 1 in this case") integer indexes. They also overload
the # operator to work correctly in sequences, but that is all. If you
want to enumerate in your way, just define your datatype and you'll be
right. Do something like rereserving the fields __I__MIN,  __I__MAX in
the hash part and enumerate with them. Hide them if you want using any
of the many tricks there are.

....... ( to @roberto)
> But you have to admit, that this is a "somehow
> bizarre" constructor, using a constructor which
> lists the indexed items "at the end".
> If you do it in a "more normal way", also possibly
> the hashed key first, but then the indexed table
> in a separate command, then all fine:

Those statements of yours sound like "do my homework". Translated as
"instead of coding my problem in <insert language name here> lets rant
about how the language is bad and should include the magic
"solve_bil_til_problem_xxxx" instruction. I'm not accusing, just
pointing the impression it gives me.

Also, if you persist on sending those long messages, a little bit of
properly quoted context would help a lot with the understanding.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
This post was updated on .
In reply to this post by Philippe Verdy
Hi Phillipe,
yes this is correct, very strictly spoken 3 would be correct for this
"random hole 1...1000 table".

The Lua code uses some fast exponential search algorithm which starts with
the current size of their table array part, and then goes to half, and half
again, and looks for points, were index i is still ok, and i+1 missing /
nil. (as 1000 is about 2^10, it needs only roughly 10 iterations so come to
some "senseful" value ... this is very efficient... it would only need
maximum about 32 such tries, if the table has a size of about 2GB... ).

Of course you can call this "crazy" ... but such a language of course always
has to find some compromise between time and the accuracy.

But what here mainly dseerves to be called crazy, is of course this table
itself ... a table with "random holes" of course is something very fuzzy ...
and you can start discussing, whether for #t the value 3 (the max non-nil
index) here makes more sense, or 656 would make more sense (the number of
non-nils), so 440 might even be some sort of "crazy compromise for crazy
table / crazy conditions".

Seen from fy side this is fine.

I just think, that 99% of large table applications for "exhausting for
operations" will have 99% index part and only <1% hash part or even less.
And in this case in the current situation the iterators "pair" and "next"
behave extremely nasty to check the table hash part ... . They require to
walk through a complete huge key assembly of "boring numbers", only to get
then the link to this "hash side info" - which in case of tables of course
often IS crucial... .

"For loops" in non-JIT interpreter languages anyway always are quite a waste
of time, and I think that an interpreter language always should support the
programmer as good as possible to avoid "hanging around in for loops"
unnecessarily.

(I did not look in detail at luaJIT yet (and I am not really interested in
this, as my applications concentrate on low resource (ROM/RAM) consumption
and flexibility... there lua I think is VERY optimum...), but I am quite
sure that all these examples from the luaJIT homepage where they state to
have 20-100 times speed increase, will be some "crazy for loop number
crunching" applications ... you could of course also try to do gigantic
table crunching in lua, like jpeg compression or so, but this no normal
person would try usualy in an interpreter language as lua I think ... just
anyway "for loops" ARE of course anyway very important for fexibility).

@Francesco:
I do not mind with the problem, that #t sometimes might not be perfectly
correct, this is fine for me.

I just look for some EFFICIENT mechanism to look
through the hash part of the table (without having to walk through all the
index part also...). I do NOT bother about SOME unexpected further
"exceptional integers" in the hash part... this is ok for me, this is NO speed
concern then, i can ignore SOME... .

Sorry if my messages are long sometimes, but yours also are NOT the
shortest ones...

But looking in an efficient way at the hash part of a table, is for SURE
not some "strange_bil_til_problem" - this I have to refuse. I am VERY sure
that this is a PERFECTLY TYPICAL applicatoin problem.

 (Though I came to this when I thought how to include tables in the possibility
of string.pack/string.unpack most efficiently ... this I will tell you some
later time in more detail in my unpack/pack post (and I am frightened again
in some futher LONGER post...), but I first have to think a bit about it).








--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

nobody
In reply to this post by bil til
On 01/11/2019 09.55, bil til wrote:

> I assume quite often a very long listed/indexed table might have
> additionally some further key-value pairs with "hash-keys". E. g. a
> polyline/xy-curve with very many points might additionally have a
> name field and maybe some further info fields.
>
> If I want to iterate over such "further info fields" in a fast way in
> lua (and I do NOT know the key names, or I possibly would like to
> check such "further key names"), do I really have to iterate over the
> complete table with pairs? (is this not a complete waste of time, as
> I then possibly get also all the ipairs entries of the table for
> iteration in my for loop?).

If you want to iterate over different collections of fields, use
multiple tables and put the <LOTS of stuff that you only want SOMETIMES>
into a separate table.  Then you can iterate quickly over either the
list or the extra fields.  E.g.:

   curve = { { 23, 42 }, { 21, 37 }, metadata = { input = "mouse" } }

[ipairs gets you just the list, pairs on .metadata is the other fields]

or

   curve = { input = "mouse", data = { { 23, 42 }, { 21, 37 }, … }, … }

[tradeoff: different order, there's ONLY data in .data, but you'll have
to manually skip .data when doing pairs() on the outer container.]

or maybe even

   curve = { input = "mouse", x = { 23, 21, … }, y = { 42, 37, … }, … }

[split/"planar" representation tends to save A LOT of memory, if your
code can handle it]

(See also http://lua-users.org/lists/lua-l/2016-08/msg00256.html for
some more bad ideas.)

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Philippe Verdy
It just proves that Lua, when it defined tables to be working both as
associative array (hashed dictionnary) and as indexed arrays (vectors,
i.e. "sequences" in Lua terminology) chose the wrong implementation
approach.

For now, the #t operator is completely broken. Tables have to be
segregated in applications between those used as integer-indexed
sequences (almost only data), and those using hashed dictionnaries
(notably all objects for their properties).

Something is severely wrong in Lua, compared to what Javascript and
PHP do, with similar constraints: the underlying implementation of
tables should NEVER affect what #t returns, and the API was not
consistantly written. The result in applications is unpredictable (and
why tables in Lua are so slow and memory-inefficient compared to
Javascript or PHP, even without their modern JIT compilers like v8 for
Javascript or Zend for PHP, or modern implementations of Perl or
Python?)

Tables are so universally used in Lua that this part should be
urgently refactored for efficiency, stability, and predictability.
Without it, Lua will remain a "toy", used in niche projects, it will
not compete in the same playground: Python rocks! Javascript rocks
too! Lua does not. Various projects that initially started with Lua
integration have renounced and disabled this feature and prefered
using other scripting engines (most frequently now they use
Javascript/ECMAscript, often though node.js and jQuery).

May be one way to revive Lua would be to have an implementation in
Javascript (I'm sure it will perform better and more safely than the
current implementation in C/C++, and Javascript engines are now much
easier to integrate and offer better integration with many more
libraries, including with other languages like Python, Perl, SQL,
PHP)...

Is there some stable port of Lua in Javascript (excluding external
LuaC: the standard library of Lua can perfectly be written in
Javascript with its own standard libary) that we could recommend ? or
in PHP ? or in Java (for server-side integration, including in
database engines, GIS applications, UI frameworks like the Android
SDK, the iOS SDK, or the Windows UWP, or UX components for Mozilla
browsers ?).

As well the default implementation in LuaC still has lot of progresses
to do for its memory manager and the garbage collector, as well in
terms of security (Lua VMs are easily broken by malicious scripts that
can crash it and crash all other competing threads, it's not safe for
use in multi-user contexts, as its memory usage is out of control: all
that is proposed is to create many separate instances of the Lua VM,
but this is also very inefficient and requires big servers with lot of
memory to serve many people; the instanciation of a new VM is much too
slow and CPU-intensive).


2019-11-04 13:30 UTC+01:00, nobody <[hidden email]>:

> On 01/11/2019 09.55, bil til wrote:
>> I assume quite often a very long listed/indexed table might have
>> additionally some further key-value pairs with "hash-keys". E. g. a
>> polyline/xy-curve with very many points might additionally have a
>> name field and maybe some further info fields.
>>
>> If I want to iterate over such "further info fields" in a fast way in
>> lua (and I do NOT know the key names, or I possibly would like to
>> check such "further key names"), do I really have to iterate over the
>> complete table with pairs? (is this not a complete waste of time, as
>> I then possibly get also all the ipairs entries of the table for
>> iteration in my for loop?).
>
> If you want to iterate over different collections of fields, use
> multiple tables and put the <LOTS of stuff that you only want SOMETIMES>
> into a separate table.  Then you can iterate quickly over either the
> list or the extra fields.  E.g.:
>
>    curve = { { 23, 42 }, { 21, 37 }, metadata = { input = "mouse" } }
>
> [ipairs gets you just the list, pairs on .metadata is the other fields]
>
> or
>
>    curve = { input = "mouse", data = { { 23, 42 }, { 21, 37 }, … }, … }
>
> [tradeoff: different order, there's ONLY data in .data, but you'll have
> to manually skip .data when doing pairs() on the outer container.]
>
> or maybe even
>
>    curve = { input = "mouse", x = { 23, 21, … }, y = { 42, 37, … }, … }
>
> [split/"planar" representation tends to save A LOT of memory, if your
> code can handle it]
>
> (See also http://lua-users.org/lists/lua-l/2016-08/msg00256.html for
> some more bad ideas.)
>
> -- nobody
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

bil til
Sorry, but I have to defend lua here. I think it is super-great and flexible.

Just when thinking of making such a more general function (like a nicer
pack/unpack) which wants to operate on any "reasonable table", then such
"hashpairs" iterator really would be VERY nice.

When I thought why lua might put possilbe "low-indexed" keys 1,2,3... into
the hash part, I first though this somehow might happen through the garbage
collector, this then of course might get really heavy to change. But if if
this only this { "key"=value1, [1]=value2, ...} constructor problem, then it
would really VERY easy to change this I think, just by invoking some sort of
"table.normalize()" after this {} constructor (and this would be necessary
only for constructors which contain [1]... with indeces in the range 1..#t
... this should be an easy check at the end of this {} constructor function
I hope).

(A further very nice point would be, if in pairs/next iterators the hash
keys appear in the order as they were defined in {...}. Now they just seem
to be reversed. I think this might be, because the keys are a
single-linked-list ... in such a list if you create a new element, the most
fast way is to create it is at the end ... And if you created 1000 such
hashed key 20 years ago on a MHz processor, this would then have needed the
order of 1000*500usec = 0.5sec, which is quite much ... but nowadays in GHz
times, this would need only the order of 1000*500nsec, and this then is only
0.5milliseconds ... this also would be very nice for many users I think...).

Then in pairs and next it could be states, that the ipairs "fire first", and
the hash pairs "fire in the order as they were added to the table". This
would really be VERY nice for very many applications, I am very sure. (And
it would be NO restriction at all for exisitng applications...).




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Tim Hill
In reply to this post by Philippe Verdy


> On Nov 4, 2019, at 5:05 AM, Philippe Verdy <[hidden email]> wrote:
>
> For now, the #t operator is completely broken. Tables have to be
> segregated in applications between those used as integer-indexed
> sequences (almost only data), and those using hashed dictionnaries
> (notably all objects for their properties).

While I have my issues with the design of the # operator, its NOT completely broken; it behaves exactly as specified by the manual. Why do you claim that tables have to be segregated? And if so, why is this a problem?

> Tables are so universally used in Lua that this part should be
> urgently refactored for efficiency, stability, and predictability.
> Without it, Lua will remain a "toy", used in niche projects, it will
> not compete in the same playground: Python rocks! Javascript rocks
> too! Lua does not. Various projects that initially started with Lua
> integration have renounced and disabled this feature and prefered
> using other scripting engines (most frequently now they use
> Javascript/ECMAscript, often though node.js and jQuery).

You are of course free to choose any of these languages for your projects.

> May be one way to revive Lua would be to have an implementation in
> Javascript (I'm sure it will perform better and more safely than the
> current implementation in C/C++, and Javascript engines are now much
> easier to integrate and offer better integration with many more
> libraries, including with other languages like Python, Perl, SQL,
> PHP)...

Er, seriously? Compiled JavaScript is faster than C? How is the current version of Lua not “safe”? It’s one of the most stable code bases I’ve ever encountered.

> Is there some stable port of Lua in Javascript (excluding external
> LuaC: the standard library of Lua can perfectly be written in
> Javascript with its own standard libary) that we could recommend ? or
> in PHP ? or in Java (for server-side integration, including in
> database engines, GIS applications, UI frameworks like the Android
> SDK, the iOS SDK, or the Windows UWP, or UX components for Mozilla
> browsers ?).

So basically you dont like Lua. Fine, use JavaScript. Or do you want to add everything you like in JavaScript/Python/etc to Lua? In which case it would no longer be Lua. Ever seen a language that tried to be everything to everyone? Think Ada, or PL/1. Is that what you want?

—Tim



Reply | Threaded
Open this post in threaded view
|

Re: Why no for table-iterator like e. g. "hashpairs"?

Coda Highland
On Mon, Nov 4, 2019 at 6:29 PM Tim Hill <[hidden email]> wrote:
> May be one way to revive Lua would be to have an implementation in
> Javascript (I'm sure it will perform better and more safely than the
> current implementation in C/C++, and Javascript engines are now much
> easier to integrate and offer better integration with many more
> libraries, including with other languages like Python, Perl, SQL,
> PHP)...

Er, seriously? Compiled JavaScript is faster than C? How is the current version of Lua not “safe”? It’s one of the most stable code bases I’ve ever encountered.

The performance claim might actually have some merit, actually. The current Lua interpreter is a very lean, efficient interpreter, to be sure, but it's only an interpreter. Modern Javascript implementations have sunk ridiculous amounts of effort into JIT compilation. A Lua-to-JS transpiler would almost certainly outperform Lua's C implementation at runtime, if the only metric you cared about was the speed of pure Lua code execution. An implementation of Lua in Javascript may or may not end up outperforming the C implementation depending on how it's done and how the JIT decides to approach it. (PROBABLY not, but I've seen some impressive stuff before -- it's possible that a sufficiently clever implementation might be able to pull it off.)

I can't speak to the safety claim. It doesn't seem to make a whole lot of sense. I can only assume it has something to do with VM sandboxing, but once you start putting in bindings to native code any safety promises from the VM become irrelevant.

On the other hand, I must strongly disagree with being easier to integrate. Lua is one of the best languages I've ever seen when it comes to embedding. That is, in fact, where Lua particularly shines -- it's very, very lightweight, with an API that's not too bad to work with once you get used to it. Embedding V8 isn't as nasty as it used to be but it's still not nearly as nice and it's way heavier.

/s/ Adam
12