On JSON parsers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

On JSON parsers

Hisham
An interesting work on checking the compliance and compatibility of
JSON parsers:

http://seriot.ch/parsing_json.html

The author wrote a test suite comparing 34 different JSON parser
implementations in several languages (there are two Lua
implementations there, Jeffrey Friedl's and lua-dkjson) and found out
that there are no two alike.

This reminded me of the complaint given a few times about the Lua
module ecosystem having "too many JSON parsers" or something like
that, so I thought it would be interesting to share it here. :)

-- Hisham

Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

Charles Heywood
This is about as common as INI, YAML, TOML, SQL, and just about literally every configuration or data storage language. Usually the best way to write using these formats is know your environment and use as little implementation-specific code as possible. Nice read tho.

On Sun, Oct 30, 2016 at 8:19 PM Hisham <[hidden email]> wrote:
An interesting work on checking the compliance and compatibility of
JSON parsers:

http://seriot.ch/parsing_json.html

The author wrote a test suite comparing 34 different JSON parser
implementations in several languages (there are two Lua
implementations there, Jeffrey Friedl's and lua-dkjson) and found out
that there are no two alike.

This reminded me of the complaint given a few times about the Lua
module ecosystem having "too many JSON parsers" or something like
that, so I thought it would be interesting to share it here. :)

-- Hisham

--

Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

Marc Balmer
In reply to this post by Hisham
An interesting work on checking the compliance and compatibility of
JSON parsers:

http://seriot.ch/parsing_json.html

The author wrote a test suite comparing 34 different JSON parser
implementations in several languages (there are two Lua
implementations there, Jeffrey Friedl's and lua-dkjson) and found out
that there are no two alike.

This reminded me of the complaint given a few times about the Lua
module ecosystem having "too many JSON parsers" or something like
that, so I thought it would be interesting to share it here. :)


That's interesting.  So will have to check out JSON module
(github.com/arcapos/luajson) how it performs compared to
the "competition" (in practice, it performs and behaves very
well).

- mb

Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

Daurnimator
In reply to this post by Hisham
On 31 October 2016 at 14:18, Hisham <[hidden email]> wrote:

>
> An interesting work on checking the compliance and compatibility of
> JSON parsers:
>
> http://seriot.ch/parsing_json.html
>
> The author wrote a test suite comparing 34 different JSON parser
> implementations in several languages (there are two Lua
> implementations there, Jeffrey Friedl's and lua-dkjson) and found out
> that there are no two alike.
>
> This reminded me of the complaint given a few times about the Lua
> module ecosystem having "too many JSON parsers" or something like
> that, so I thought it would be interesting to share it here. :)


Slightly worth mentioning is that the author used dkjson incorrectly.
https://twitter.com/daurnimator/status/791494454888673280

Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

David Heiko Kolf-2
On 31.10.2016 at 08:42 Daurnimator wrote:

> On 31 October 2016 at 14:18, Hisham <[hidden email]> wrote:
>>
>> An interesting work on checking the compliance and compatibility of
>> JSON parsers:
>>
>> http://seriot.ch/parsing_json.html
>>
>> The author wrote a test suite comparing 34 different JSON parser
>> implementations in several languages (there are two Lua
>> implementations there, Jeffrey Friedl's and lua-dkjson) and found out
>> that there are no two alike.
>>
>> This reminded me of the complaint given a few times about the Lua
>> module ecosystem having "too many JSON parsers" or something like
>> that, so I thought it would be interesting to share it here. :)
>
>
> Slightly worth mentioning is that the author used dkjson incorrectly.
> https://twitter.com/daurnimator/status/791494454888673280

I have to read into it a bit more carefully, but it appears to me that a
lot of the tests are checking something which I didn't even design my
parsers for -- my library cannot be used for validating JSON strings. My
goal was that every valid (UTF-8) JSON string can be parsed.

If I wrote a parser for a language usually written by users, it would be
important that the parser rejects any mistakes -- otherwise the user
would trust the broken code and only see it breaking later when trying
to port to a different system. But for JSON I assumed the data came
mostly from other encoders.

Especially the pure-Lua parser (which appears to have been used in the
tests) accepts really ridiculous input data. The reason is that I tried
to keep the amount of code as small as possible. So it actually uses the
same function for parsing arrays and objects.

The LPeg-version is more strict and would have passed more of the tests.
I am aware that it can be a problem that my library has actually two
parsers that behave differently, although previously I thought that was
mostly related to different error messages returned by the library.

Maybe I should clarify in the description that the library is not a
validator.

I might try to rerun the tests locally to see how many of them are also
problematic in the LPeg-based parser. If that one is more accurate, my
advice would be to always use 'require "dkjson".use_lpeg()' when a
strict parser is required. Of course the trailing garbage would still be
an issue as you described in your Twitter post. But it could be solved
by a wrapper function that checks that no trailing data is left.

The only "red" flags I saw in the test results was the lack of UTF-16
support. Indeed, that is something I intentionally left out, as I had
never seen a UTF-16 JSON in the wild. If I did, I would probably use a
designated Unicode-library for converting it to UTF-8.

Best regards,

David


Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

Doug Gale
The JSON RFC is as clear as day about JSON having hex escaped UTF-16 surrogate pairs to encode Unicode characters outside the basic multilingual plane (>= 0x10000) I am surprised that JSON parsers are completely violating the spec.

On Mon, Oct 31, 2016 at 4:05 PM, David Heiko Kolf <[hidden email]> wrote:
On 31.10.2016 at 08:42 Daurnimator wrote:
> On 31 October 2016 at 14:18, Hisham <[hidden email]> wrote:
>>
>> An interesting work on checking the compliance and compatibility of
>> JSON parsers:
>>
>> http://seriot.ch/parsing_json.html
>>
>> The author wrote a test suite comparing 34 different JSON parser
>> implementations in several languages (there are two Lua
>> implementations there, Jeffrey Friedl's and lua-dkjson) and found out
>> that there are no two alike.
>>
>> This reminded me of the complaint given a few times about the Lua
>> module ecosystem having "too many JSON parsers" or something like
>> that, so I thought it would be interesting to share it here. :)
>
>
> Slightly worth mentioning is that the author used dkjson incorrectly.
> https://twitter.com/daurnimator/status/791494454888673280

I have to read into it a bit more carefully, but it appears to me that a
lot of the tests are checking something which I didn't even design my
parsers for -- my library cannot be used for validating JSON strings. My
goal was that every valid (UTF-8) JSON string can be parsed.

If I wrote a parser for a language usually written by users, it would be
important that the parser rejects any mistakes -- otherwise the user
would trust the broken code and only see it breaking later when trying
to port to a different system. But for JSON I assumed the data came
mostly from other encoders.

Especially the pure-Lua parser (which appears to have been used in the
tests) accepts really ridiculous input data. The reason is that I tried
to keep the amount of code as small as possible. So it actually uses the
same function for parsing arrays and objects.

The LPeg-version is more strict and would have passed more of the tests.
I am aware that it can be a problem that my library has actually two
parsers that behave differently, although previously I thought that was
mostly related to different error messages returned by the library.

Maybe I should clarify in the description that the library is not a
validator.

I might try to rerun the tests locally to see how many of them are also
problematic in the LPeg-based parser. If that one is more accurate, my
advice would be to always use 'require "dkjson".use_lpeg()' when a
strict parser is required. Of course the trailing garbage would still be
an issue as you described in your Twitter post. But it could be solved
by a wrapper function that checks that no trailing data is left.

The only "red" flags I saw in the test results was the lack of UTF-16
support. Indeed, that is something I intentionally left out, as I had
never seen a UTF-16 JSON in the wild. If I did, I would probably use a
designated Unicode-library for converting it to UTF-8.

Best regards,

David



Reply | Threaded
Open this post in threaded view
|

Re: On JSON parsers

David Heiko Kolf-2
On 11.2016 at 00:38, Doug Gale wrote:
> The JSON RFC is as clear as day about JSON having hex escaped UTF-16
> surrogate pairs to encode Unicode characters outside the basic
> multilingual plane (>= 0x10000) I am surprised that JSON parsers are
> completely violating the spec.

Which parsers are you talking about? My parsers support escaped
surrogate pairs (as long as my test suite is correct). Most other Lua
JSON parsers do so as well (see <http://lua-users.org/wiki/JsonModules>).

In case you were referring to this:

>     The only "red" flags I saw in the test results was the lack of UTF-16
>     support. Indeed, that is something I intentionally left out, as I had
>     never seen a UTF-16 JSON in the wild. If I did, I would probably use a
>     designated Unicode-library for converting it to UTF-8.

What I meant was not the escape codes but the encoding of the entire
JSON text itself.

Best regards,

David