Simple Lua-only JSON decoder

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple Lua-only JSON decoder

Dirk Laurie-2
My primitive JSON decoder, which operates by lexically translating
a JSON code to a Lua table literal, now does three things:

1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
2. List delimiters translated from […] to {…}.
3. Keys translated e.g. from "item": to ["item"]=.

local function json_decode (s)
  s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
  local function do_json_list(s,is_list)
    if is_list then s = s:sub(2,-2) end
    s = s:gsub("(%b[])()",do_json_list)
    if is_list then s = '{' .. s ..'}' end
    return s
  end
  local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
  if t then return t() else return s end
end

Please show me some sample JSON code that this decoder can't handle
properly.

I already know about:
  1. Integer overflow. (Thanks, Sean!) --> won't happen
  2. JSON null is not Lua nil. --> define a suitable global called 'null'

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Marc Balmer


Am 17.04.17 um 08:47 schrieb Dirk Laurie:

> My primitive JSON decoder, which operates by lexically translating
> a JSON code to a Lua table literal, now does three things:
>
> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
> 2. List delimiters translated from […] to {…}.
> 3. Keys translated e.g. from "item": to ["item"]=.
>
> local function json_decode (s)
>   s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>   local function do_json_list(s,is_list)
>     if is_list then s = s:sub(2,-2) end
>     s = s:gsub("(%b[])()",do_json_list)
>     if is_list then s = '{' .. s ..'}' end
>     return s
>   end
>   local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>   if t then return t() else return s end
> end
>
> Please show me some sample JSON code that this decoder can't handle
> properly.

Does it decode numbers to lua numbers?  Does it handle numeric subtypes?

I have an extensive JSON test suite lying around somewhere, ping me off
list if I should try to find it.

>
> I already know about:
>   1. Integer overflow. (Thanks, Sean!) --> won't happen
>   2. JSON null is not Lua nil. --> define a suitable global called 'null'

I let the used choose how JSON null values are to be handled (see
https://github.com/arcapos/luajson/blob/master/luajson.c), it can either
map to a json-null object (which has a json-null metatable), to nil, or
to an empty string.  You could reuse that idea in above code.

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Dirk Laurie-2
2017-04-17 9:02 GMT+02:00 Marc Balmer <[hidden email]>:

>
>
> Am 17.04.17 um 08:47 schrieb Dirk Laurie:
>> My primitive JSON decoder, which operates by lexically translating
>> a JSON code to a Lua table literal, now does three things:
>>
>> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
>> 2. List delimiters translated from […] to {…}.
>> 3. Keys translated e.g. from "item": to ["item"]=.
>>
>> local function json_decode (s)
>>   s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>>   local function do_json_list(s,is_list)
>>     if is_list then s = s:sub(2,-2) end
>>     s = s:gsub("(%b[])()",do_json_list)
>>     if is_list then s = '{' .. s ..'}' end
>>     return s
>>   end
>>   local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>>   if t then return t() else return s end
>> end

First: thanks for giving me the opportunity for correcting an error
(the code was correct in the interpreter but not yet in the Lua source).
The first line should be

    s = s:gsub("\\u(%x%x%x%x)","\\u{%1}")

>>
>> Please show me some sample JSON code that this decoder can't handle
>> properly.
>
> Does it decode numbers to lua numbers?  Does it handle numeric subtypes?

Yes and yes, because Lua does. The three steps are for example:

    [{"list": [1], "array":{"utf8": "\u00e9", "next": null}}]
    [{"list": [1], "array":{"utf8": "\u{00e9}", "next": null}}]
    {{"list": {1}, "array":{"utf8": "\u{00e9}", "next": null}}}
    {{["list"]= {1}, ["array"] = {["utf8"]= "\u{00e9}", ["next"]= null}}}

>> I already know about:
>>   1. Integer overflow. (Thanks, Sean!) --> won't happen
>>   2. JSON null is not Lua nil. --> define a suitable global called 'null'
>
> I let the used choose how JSON null values are to be handled (see
> https://github.com/arcapos/luajson/blob/master/luajson.c), it can either
> map to a json-null object (which has a json-null metatable), to nil, or
> to an empty string.  You could reuse that idea in above code.

I think that defining a global 'null' takes care of all three possibilities.

BTW both cjson (light userdata) and rapidjson (function) provide
unique immutable null objects that cannot have their own metatable,

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Michal Kottman
In reply to this post by Dirk Laurie-2
On Apr 17, 2017 9:05 AM, "Dirk Laurie" <[hidden email]> wrote:
Please show me some sample JSON code that this decoder can't handle
properly.

Probably not problem for your use-case, but:

pl.pretty.dump(json_decode'["[hey!]"]')
{
  "{hey!}"
}
Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Dirk Laurie-2
2017-04-17 12:39 GMT+02:00 Michal Kottman <[hidden email]>:
> '["[hey!]"]'

OK, the inner pair of brackets will also become braces.
Thanks for pointing it out. It is definitely worth a comment
in the code, although the solution (in cases where it is a problem)
will be to require a real json module like rapidjson. I'm setting
up the code so that this decoder is only a fallback.

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Matthew Wild
In reply to this post by Dirk Laurie-2
On 17 April 2017 at 07:47, Dirk Laurie <[hidden email]> wrote:
> Please show me some sample JSON code that this decoder can't handle
> properly.

{" a": 1 }

I don't see why you use that pattern for keys. Intentional limitation?

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Soni "They/Them" L.
In reply to this post by Dirk Laurie-2


On 2017-04-17 03:47 AM, Dirk Laurie wrote:

> My primitive JSON decoder, which operates by lexically translating
> a JSON code to a Lua table literal, now does three things:
>
> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
> 2. List delimiters translated from […] to {…}.
> 3. Keys translated e.g. from "item": to ["item"]=.
>
> local function json_decode (s)
>    s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>    local function do_json_list(s,is_list)
>      if is_list then s = s:sub(2,-2) end
>      s = s:gsub("(%b[])()",do_json_list)
>      if is_list then s = '{' .. s ..'}' end
>      return s
>    end
>    local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>    if t then return t() else return s end
> end
>
> Please show me some sample JSON code that this decoder can't handle
> properly.
>
> I already know about:
>    1. Integer overflow. (Thanks, Sean!) --> won't happen
>    2. JSON null is not Lua nil. --> define a suitable global called 'null'
>

I don't think { "\":": {} } works... Haven't tested it tho.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Dirk Laurie-2
I2017-04-17 16:22 GMT+02:00 Soni L. <[hidden email]>:

>
>
> On 2017-04-17 03:47 AM, Dirk Laurie wrote:
>>
>> My primitive JSON decoder, which operates by lexically translating
>> a JSON code to a Lua table literal, now does three things:
>>
>> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
>> 2. List delimiters translated from […] to {…}.
>> 3. Keys translated e.g. from "item": to ["item"]=.
>>
>> local function json_decode (s)
>>    s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>>    local function do_json_list(s,is_list)
>>      if is_list then s = s:sub(2,-2) end
>>      s = s:gsub("(%b[])()",do_json_list)
>>      if is_list then s = '{' .. s ..'}' end
>>      return s
>>    end
>>    local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>>    if t then return t() else return s end
>> end
>>
>> Please show me some sample JSON code that this decoder can't handle
>> properly.
>>
>> I already know about:
>>    1. Integer overflow. (Thanks, Sean!) --> won't happen
>>    2. JSON null is not Lua nil. --> define a suitable global called 'null'
>>
>
> I don't think { "\":": {} } works... Haven't tested it tho.

It also defeats rapidjson, so at least I am in good company :-)

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Dirk Laurie-2
In reply to this post by Matthew Wild
2017-04-17 16:08 GMT+02:00 Matthew Wild <[hidden email]>:
> On 17 April 2017 at 07:47, Dirk Laurie <[hidden email]> wrote:
>> Please show me some sample JSON code that this decoder can't handle
>> properly.
>
> {" a": 1 }
>
> I don't see why you use that pattern for keys. Intentional limitation?

The keys are all trimmed, but in a multilingual environment (which is not
uncommon on that website) non-ASCII characters could occur. I'll have
another think about that pattern, but since the ordinary string library is
UTF8-agnostic, there may not be much I can do about it. The locale
concept for %w only works in an 8-bit setting like ISO-8859.

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Gé Weijers
In reply to this post by Dirk Laurie-2

On Sun, Apr 16, 2017 at 11:47 PM, Dirk Laurie <[hidden email]> wrote:
My primitive JSON decoder, which operates by lexically translating
a JSON code to a Lua table literal, now does three things:

1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
2. List delimiters translated from […] to {…}.
3. Keys translated e.g. from "item": to ["item"]=.

local function json_decode (s)
  s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
 
\\u is followed by 4 HEX digits, not decimal ones.

Using gsub to replace \u(%x%x%x%x) with \u{%1} does not ignore a double backslash.  \\u1234 is not a representation of unicode character 0x1234, but of the string 'backslash, lower case u, digit 1, etc'.
Note \\\u1234 DOES denote a backslash followed by a unicode char. Lua's basic regex implementation is a bit too restricted to handle the pattern "odd number of backslashes followed by 'u'".

In my experience it's easier in the long run to build a state machine for the lexical analysis and a recursive descent parser for the recursive structure than trying to handle all the special cases using increasingly clever hacks.



Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

nobody
In reply to this post by Dirk Laurie-2
> Please show me some sample JSON code that this decoder can't handle
> properly.

Because it's not on your list yet:

 > json_decode '{ "fail": os.execute("echo oops!") }'
oops!
< table: 0x17c2600
<< fail true

(Where does the JSON come from?  Can you trust the sources...
   ...to not intentionally do this?
   ...to properly handle all inputs and not accidentally generate this?
   ...not to get hacked ever so no attacker will send this?)

Until now, no one said that [1] is broken.  So just replacing
load("...")() with something along those lines should be (/ is?) enough
to handle direct escape attempts.

-- nobody

[1]: http://lua-users.org/lists/lua-l/2017-03/msg00232.html

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Marc Balmer
In reply to this post by Dirk Laurie-2


Am 17.04.17 um 17:53 schrieb Dirk Laurie:

> I2017-04-17 16:22 GMT+02:00 Soni L. <[hidden email]>:
>>
>>
>> On 2017-04-17 03:47 AM, Dirk Laurie wrote:
>>>
>>> My primitive JSON decoder, which operates by lexically translating
>>> a JSON code to a Lua table literal, now does three things:
>>>
>>> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
>>> 2. List delimiters translated from […] to {…}.
>>> 3. Keys translated e.g. from "item": to ["item"]=.
>>>
>>> local function json_decode (s)
>>>    s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>>>    local function do_json_list(s,is_list)
>>>      if is_list then s = s:sub(2,-2) end
>>>      s = s:gsub("(%b[])()",do_json_list)
>>>      if is_list then s = '{' .. s ..'}' end
>>>      return s
>>>    end
>>>    local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>>>    if t then return t() else return s end
>>> end
>>>
>>> Please show me some sample JSON code that this decoder can't handle
>>> properly.
>>>
>>> I already know about:
>>>    1. Integer overflow. (Thanks, Sean!) --> won't happen
>>>    2. JSON null is not Lua nil. --> define a suitable global called 'null'
>>>
>>
>> I don't think { "\":": {} } works... Haven't tested it tho.
>
> It also defeats rapidjson, so at least I am in good company :-)
>

isn't the problem here that Lua sees \" as an escape?  When I decode
'{ "\\":": {} }' it works.


Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Martin
In reply to this post by Dirk Laurie-2
On 04/16/2017 11:47 PM, Dirk Laurie wrote:

> My primitive JSON decoder, which operates by lexically translating
> a JSON code to a Lua table literal, now does three things:
>
> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
> 2. List delimiters translated from […] to {…}.
> 3. Keys translated e.g. from "item": to ["item"]=.
>
> local function json_decode (s)
>   s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
>   local function do_json_list(s,is_list)
>     if is_list then s = s:sub(2,-2) end
>     s = s:gsub("(%b[])()",do_json_list)
>     if is_list then s = '{' .. s ..'}' end
>     return s
>   end
>   local t = load('return '..do_json_list(s):gsub('("[%w_-]-"):','[%1]='))
>   if t then return t() else return s end
> end
>
> Please show me some sample JSON code that this decoder can't handle
> properly.
>
> I already know about:
>   1. Integer overflow. (Thanks, Sean!) --> won't happen
>   2. JSON null is not Lua nil. --> define a suitable global called 'null'

I've done some years ago similar thing (>1). Problems with such
implementation are probably same: security and format incompatibilities.
(But this is a nice hack, usually it is faster than all other
JSON-loading tools except lua-cjson. And yes, practically it works
in most cases. So using it is potentially risky but easy.)

Regarding format incompatibilities:

  In JSON strings both "\/" and "/" means "/". Also "\\" means "\".
  So we can't just replace "\/" to "/" as it converts "\\/" (means
  "\/") to "\/" (means "/").

[1]:
https://github.com/martin-eden/workshop/blob/master/formats/json/load/via_hack.lua#L9

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Simple Lua-only JSON decoder

Dirk Laurie-2
2017-04-18 16:03 GMT+02:00 Martin <[hidden email]>:
> On 04/16/2017 11:47 PM, Dirk Laurie wrote:
>> My primitive JSON decoder, which operates by lexically translating
>> a JSON code to a Lua table literal, now does three things:
>>
>> 1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
>> 2. List delimiters translated from […] to {…}.
>> 3. Keys translated e.g. from "item": to ["item"]=.

> I've done some years ago similar thing (>1). Problems with such
> implementation are probably same: security and format incompatibilities.
> (But this is a nice hack, usually it is faster than all other
> JSON-loading tools except lua-cjson. And yes, practically it works
> in most cases. So using it is potentially risky but easy.)

Well, after all the very useful comments — thanks a lot — and this
bit of moral support — much appreciated — I'll reveal the identity
of the website. The typical URL with which to extract data is for example

    https://apps.wikitree.com/api.php?action=getProfile&key=Turing-3

and the API home page is at

    https://apps.wikitree.com/apps/

For getBio, which consists mainly of one long character string in WikiMedia
markup, I do not use this decoder.