# finding keys in a table

20 messages
Open this post in threaded view
|

## finding keys in a table

 ```Hello, ```I need a frequency list of all characters in a text file and I try to use a table for this but I'm not totally sure if this is the best way to solve my problem. Anyway I setup a table "count" with the characters as key and a counter as value. When I iterate over my data I want to: ```(1) lookup if the current character already exists as a key in the table. (2a) If it does exist I increment the value for key currentchar with 1 (2b) else I add a new entry: currentchar=1 (3) Later I sort the table and print my freq.list. ```The two questions I have are: is this the best/most efficient/fastest way to do this? Second: How do I check if a key exists? (I don't think I can use an inverted table because I'll end up with duplicate keys, or not?) ``` Thanks for some advice, Jelle ```
Open this post in threaded view
|

## Re: finding keys in a table

 ```Exactly as you list: From: "Jelle Huisman" <[hidden email]> ``````(1) lookup if the current character already exists as a key in the table. (2a) If it does exist I increment the value for key currentchar with 1 (2b) else I add a new entry: currentchar=1 (3) Later I sort the table and print my freq.list. `````` if tab[char] ~= nil then -- (1) is the char in the table? tab[char] = tab[char] + 1 -- (2a) then increment else tab[char] = 1 -- (2b) else add new entry end (It could also be written like this, for a bit more speed: ) local count = tab[char] ```tab[char] = (count or 0) + 1 -- initializes count if necessary (an explanation can be found at: http://www.lua.org/pil/3.3.html) ``` ```Because you can't sort key/value pairs, sorting is a little bit more complicated: ``` local reverse = {} for k,v in pairs(tab) do reverse[#reverse+1] = { k, v } end ```table.sort(reverse, function(a, b) a[2] < b[2] end)
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Jelle Huisman Second: How do I check if a key exists? (I don't think I can use an inverted table because I'll end up with duplicate keys, or not?) You should just be able to check your_table["key"] or alternately your_table.key. If it has no value (hasn't been set yet), it'll return nil.Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio > chars = {}> chars.a = 5> if chars.a then print("yes") endyes> if chars.b then print("yes") end> = chars.bnil
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Alex Davies ```Alex Davies wrote: ``````Exactly as you list: From: "Jelle Huisman" <[hidden email]> ``````(1) lookup if the current character already exists as a key in the table. (2a) If it does exist I increment the value for key currentchar with 1 (2b) else I add a new entry: currentchar=1 (3) Later I sort the table and print my freq.list. `````` if tab[char] ~= nil then -- (1) is the char in the table? tab[char] = tab[char] + 1 -- (2a) then increment else tab[char] = 1 -- (2b) else add new entry end (It could also be written like this, for a bit more speed: ) local count = tab[char] ```tab[char] = (count or 0) + 1 -- initializes count if necessary (an explanation can be found at: http://www.lua.org/pil/3.3.html) ``` actually, tab[char] = (count or 0) + 1 is slower than if count then tab[char] = count + 1 else tab[char] = 1 end ```(the if then variant takes 60% of the time of the or case; the interpreter needs an extra LOADK) ``` in a similar fashion, parallel assignments can be faster Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ```
Open this post in threaded view
|

## Re: finding keys in a table

 ```local t = setmetatable({}, { __index = function() return 0 end }) for c in string.gmatch(io.read("*a"), ".") do t[c] = t[c] + 1 end table.foreach(t, print) --Mike ```
Open this post in threaded view
|

## Re: finding keys in a table

 ```On 2/21/08, Mike Pall <[hidden email]> wrote: > local t = setmetatable({}, { __index = function() return 0 end }) > for c in string.gmatch(io.read("*a"), ".") do t[c] = t[c] + 1 end > table.foreach(t, print) very neat! now, which way is the fastest? -- Javier ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Hans Hagen ```Hmmm. I actually meant: tab[char] = (tab[char] or 0) + 1 Although it shouldn't make any difference. I'm a bit confused though, because I measured: for i = 1, 1024*1024*64 do test.a = (test.a or 0) + 1 end versus: for i = 1, 1024*1024*64 do local count = test.a if count then test.a = count + 1 else test.a = 1 end end ```And on my compiler/computer at least, the top one is consistently faster. No idea why, given that both give the same opcodes for the main loop: ``` Gettable, Test, Add, Settable ```Of course, if you mean that the first time through the loop would be faster in the bottom case you would be spot on. Still, if the loop takes any amount of time, I'd figure the first hit for each letter insignificant. Just an imo. In any case the top way is a lot neater. =) ``` Javier: Mike's way is fastest and neatest =) - Alex ```----- Original Message ----- From: "Hans Hagen" <[hidden email]> ```actually, tab[char] = (count or 0) + 1 is slower than if count then tab[char] = count + 1 else tab[char] = 1 end ```(the if then variant takes 60% of the time of the or case; the interpreter needs an extra LOADK) ``` in a similar fashion, parallel assignments can be faster Hans `````` ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Alex Davies ```Alex Davies wrote: ``````Exactly as you list: ``````Hello Alex, and others, ```Sorry to be slow with my response: I had a meeting going on at the same time... :-) Your solution is almost the same as what I had first but I somehow managed to break my code. This gives me a new start. It's nice to have different implementations, I'll test it with a larger chuck of my data. ``` Thanks again, Jelle ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Hans Hagen ```on 2/21/08 7:15 AM, Hans Hagen at [hidden email] wrote: > actually, > > tab[char] = (count or 0) + 1 > > is slower than > > if count then > tab[char] = count + 1 > else > tab[char] = 1 > end > > (the if then variant takes 60% of the time of the or case; the > interpreter needs an extra LOADK) > > in a similar fashion, parallel assignments can be faster That seems surprising. Admittedly for a small input set, we avoid doing the addition but I would expect in a long running case, we're mostly on the first branch of the if and since or is short-circuiting it should amount to essentially the same computation. My guess, however, would be that the __index approach is faster in the long run because while it makes the initialization case more expensive (it requires a Lua function call), it does the detection as part of the standard table fetch logic. Mark ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Javier Guerra Giraldez ```Javier Guerra wrote: ``````On 2/21/08, Mike Pall <[hidden email]> wrote: ``````local t = setmetatable({}, { __index = function() return 0 end }) for c in string.gmatch(io.read("*a"), ".") do t[c] = t[c] + 1 end table.foreach(t, print) `````` very neat! now, which way is the fastest? `````` hard to measure, but the lpeg variant is slightly faster data = string.rep("a",10*1000*1000) collectgarbage("collect") t = os.clock() cc = setmetatable({}, { __index = function() return 0 end }) for c in data:gmatch(".") do cc[c] = cc[c] + 1 end print(os.clock()-t) collectgarbage("collect") t = os.clock() cc = setmetatable({}, { __index = function() return 0 end }) lpeg.match((lpeg.P(1) / function(c) cc[c] = cc[c] + 1 end)^0,data) print(os.clock()-t) collectgarbage("collect") t = os.clock() local function add(c) cc[c] = cc[c] + 1 end cc = setmetatable({}, { __index = function() return 0 end }) lpeg.match((lpeg.P(1) / add)^0,data) print(os.clock()-t) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ```
Open this post in threaded view
|

## Re: finding keys in a table

 ```Hans Hagen wrote: > cc = [...] Do you know about the keyword "local"? If not, then do NOT post Lua benchmarks. Ever. --Mike ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Mike Pall-82 ```Mike Pall wrote: > local t = setmetatable({}, { __index = function() return 0 end }) > for c in string.gmatch(io.read("*a"), ".") do t[c] = t[c] + 1 end > table.foreach(t, print) I like it! You could replace the middle line with: io.read("*a"):gsub("(.)", function(c) t[c] = t[c] + 1 end) Gotta love those string functions. I tried to clean up the special character output with a new last line: table.foreach(t, function (c, n) print(("%q %4d"):format(c, n)) end) I was surprised to find that "%q" did not escape the newline (linefeed) character with "\n" but with "\ ". The reference manual describes this as the proper behavior, even including an embedded newline in its example. I've read the manual front to back a number of times, yet I was still surprised by this. It seems like using "\n" would give an exact representation of the string even if it is written to a file that is shared between machines with different line termination characteristics. My experience with this sort of thing is that there was a reason for it, but the only reason I can come up with (allowing multi-line quoted strings) is easily dismissed by using '[['. Is it done this way explicitly so it can automatically pick up the LF/CR+LF of the host system when the string is re-read? It would be an interesting project (not volunteering - yet!) to create an Annotated Lua Reference Manual that includes rationales for various design decisions. Both the reference manual and PIL have a few examples of such annotations. Ada had such an annotated reference manual and it was invaluable when I was learning it. It made me think from the compiler's point of view, which is always helpful. As always, it's about the time... Doug -- Innovative Concepts, Inc. www.innocon.com 703-893-2007 x220 ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Mike Pall-82 ```Mike Pall wrote: ``````Hans Hagen wrote: ``````cc = [...] `````` Do you know about the keyword "local"? `````` sure ``````If not, then do NOT post Lua benchmarks. Ever. `````` ```does not make much difference in the relative speed here; the lpeg one seems slightly faster (which does not surprise me that much) ``` ```(and i've done quite some benchmarking, on large data sets as well as on rather complex lua code; not that it matters; this is not production code but just a quick comparison between gmatch and lpeg) ``` ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Doug Rogers-4 It's funny you mentioned it though, because the first time it did that to me I was surprised too. I'm guessing it's because it's easier to print that way. Imagine reading the output of an escaped 100kB text file otherwise. It definitely doesn't modify the strings \n's to the host systems variants - that would be counterproductive. (I check lstrlib just to be sure) ``` - Alex ```----- Original Message ----- From: "Doug Rogers" <[hidden email]> ``` ``````It seems like using "\n" would give an exact representation of the string even if it is written to a file that is shared between machines with different line termination characteristics. My experience with this sort of thing is that there was a reason for it, but the only reason I can come up with (allowing multi-line quoted strings) is easily dismissed by using '[['. Is it done this way explicitly so it can automatically pick up the LF/CR+LF of the host system when the string is re-read? `````` ```
Open this post in threaded view
|

## Re: finding keys in a table

 ```Alex Davies wrote: > Imagine reading the output of an escaped 100kB text file > otherwise. I guess I don't consider that a use case for s:format("%q") but rather a use case for io.write('[[' .. s .. ']]'). The former only seems useful when I want to write a short string that is quoted for later reading. Doug > ----- Original Message ----- From: "Doug Rogers" <[hidden email]> >> It seems like using "\n" would give an exact representation of the >> string even if it is written to a file that is shared between machines >> with different line termination characteristics. My experience with this >> sort of thing is that there was a reason for it, but the only reason I >> can come up with (allowing multi-line quoted strings) is easily >> dismissed by using '[['. >> >> Is it done this way explicitly so it can automatically pick up the >> LF/CR+LF of the host system when the string is re-read? -- Innovative Concepts, Inc. www.innocon.com 703-893-2007 x220 ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Mark Hamburg-4 ```Mark Hamburg wrote: ``````My guess, however, would be that the __index approach is faster in the long run because while it makes the initialization case more expensive (it requires a Lua function call), it does the detection as part of the standard table fetch logic. `````` ```indeed, of course at the expense of an extra table, but that seldom is an issue; the __index method is of course also cleaner ``` Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ```
Open this post in threaded view
|

## Re: finding keys in a table

 In reply to this post by Doug Rogers-4 ```> I guess I don't consider that a use case for s:format("%q") but rather a > use case for io.write('[[' .. s .. ']]'). If s comes from user input, then this is unsafe if later loaded back into Lua: s might be "]]; MALICIOUS CODE HERE ; [[" or something like that. ```
Open this post in threaded view
|

## Patch for escaping '\n' with string.format("%Q") (was Re: finding keys in a table)

 ```Luiz Henrique de Figueiredo wrote: >> I guess I don't consider that a use case for s:format("%q") but rather a >> use case for io.write('[[' .. s .. ']]'). > If s comes from user input, then this is unsafe if later loaded back into Lua: > s might be "]]; MALICIOUS CODE HERE ; [[" or something like that. It would be too complex to have a '%Q' format that automatically escaped '[-*[', otherwise Q could be q's big sister, just like [[]] are to '". For anyone interested I've attached a patch for lstrlib.c that will allow use of '%Q' to mean "escape newlines, too". Doug -- Innovative Concepts, Inc. www.innocon.com 703-893-2007 x220 ``````diff -r -u lua-5.1.3-orig/src/lstrlib.c lua-5.1.3/src/lstrlib.c --- lua-5.1.3-orig/src/lstrlib.c 2007-12-28 10:32:23.000000000 -0500 +++ lua-5.1.3/src/lstrlib.c 2008-02-21 14:01:20.000000000 -0500 @@ -692,7 +692,7 @@ #define MAX_FORMAT (sizeof(FLAGS) + sizeof(LUA_INTFRMLEN) + 10) -static void addquoted (lua_State *L, luaL_Buffer *b, int arg) { +static void addquoted (lua_State *L, luaL_Buffer *b, int arg, int nlescape) { size_t l; const char *s = luaL_checklstring(L, arg, &l); luaL_addchar(b, '"'); @@ -700,7 +700,7 @@ switch (*s) { case '"': case '\\': case '\n': { luaL_addchar(b, '\\'); - luaL_addchar(b, *s); + luaL_addchar(b, nlescape ? 'n' : *s); break; } case '\r': { @@ -789,8 +789,8 @@ sprintf(buff, form, (double)luaL_checknumber(L, arg)); break; } - case 'q': { - addquoted(L, &b, arg); + case 'q': case 'Q': { + addquoted(L, &b, arg, *(strfrmt-1) == 'Q'); continue; /* skip the 'addsize' at the end */ } case 's': { ```
 In reply to this post by Luiz Henrique de Figueiredo ```This variant takes care of those cases quite neatly, while preserving ease of use for most cases. function escape(s) return "[[" .. s:gsub("%]%]", "]]..']]'..[[") .. "]]" end On Thu, Feb 21, 2008 at 6:49 PM, Luiz Henrique de Figueiredo <[hidden email]> wrote: > > I guess I don't consider that a use case for s:format("%q") but rather a > > use case for io.write('[[' .. s .. ']]'). > > If s comes from user input, then this is unsafe if later loaded back into Lua: > s might be "]]; MALICIOUS CODE HERE ; [[" or something like that. > ```
 In reply to this post by Jelle Huisman ```For performance I'd just intialize all characters to zero: local count = {} for i = 0, 255 do count[ string.char( i ) ] = 0 end for c in string.gmatch( io.read( "*a" ), "." ) do count[ c ] = count[ c ] + 1 end -- And to iterate over them: for i = 0, 255 do local c = string.char( i ) if count[ c ] > 0 then print( c, count[ c ] ) end end -- Or just remove the zeroes for i = 255, 0, -1 do local c = string.char( i ) if count[ c ] == 0 then count[ c ] = nil end end table.foreach( count, print ) *not tested* ;) --rb On Thu, Feb 21, 2008 at 11:24 AM, Jelle Huisman <[hidden email]> wrote: > Hello, > > I need a frequency list of all characters in a text file and I try to > use a table for this but I'm not totally sure if this is the best way to > solve my problem. Anyway I setup a table "count" with the characters as > key and a counter as value. When I iterate over my data I want to: > (1) lookup if the current character already exists as a key in the table. > (2a) If it does exist I increment the value for key currentchar with 1 > (2b) else I add a new entry: currentchar=1 > (3) Later I sort the table and print my freq.list. > > The two questions I have are: is this the best/most efficient/fastest > way to do this? > Second: How do I check if a key exists? (I don't think I can use an > inverted table because I'll end up with duplicate keys, or not?) > > Thanks for some advice, > > Jelle > > ```