Long and short strings (was: Memory usage stats for 5.2 vs 5.3)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Long and short strings (was: Memory usage stats for 5.2 vs 5.3)

Dirk Laurie-2
2015-01-14 14:31 GMT+02:00 Roberto Ierusalimschy <[hidden email]>:

> I think we can do that following Tom's idea of storing the length of
> short strings in a byte. Long strings are not hashed, so they do not
> need the 'hnext' field. The struct could look like this:
>
> typedef struct TString {
>   CommonHeader;
>   lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
>   lu_byte shtlen;  /* length for short strings */
>   unsigned int hash;
>   union {
>     size_t len;  /* length for long string */
>     struct TString *hnext;  /* linked list for hash (only for short strings) */
>   } u;
> } TString;

This structure implies that LUAI_MAXSHORTLEN must not be
set to more than 255, right?

The reference manual does not use the phrases 'short string'
and 'long string' nor the word 'internalized', so these must count
as implemetation details.

I'm curious: when a long string is used as a key in a table, will
at that stage the string be internalized?

Reply | Threaded
Open this post in threaded view
|

Re: Long and short strings (was: Memory usage stats for 5.2 vs 5.3)

Roberto Ierusalimschy
> This structure implies that LUAI_MAXSHORTLEN must not be
> set to more than 255, right?

Yes. (The default value is 40.)


> The reference manual does not use the phrases 'short string'
> and 'long string' nor the word 'internalized', so these must count
> as implemetation details.

Yes.


> I'm curious: when a long string is used as a key in a table, will
> at that stage the string be internalized?

No. At that point, the string can already have two different instances
in Lua. Moreover, for every new long string, we would have to check
whether it has been internalized (which is the heaviest part of
internalizing).

-- Roberto