[ANN] org.conman.parsers.url 2.0.2 released

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] org.conman.parsers.url 2.0.2 released

Sean Conner
  I just released org.conman.parsers.url 2.0.1 [1], a module to parse URLs
(as defined in RFC-3986) into an easy to use table.  Some examples:

        url = require "org.conman.parsers.url"
        x = url:match "https://example.com/cgi-bin/search?q=foo%20bar&lang=en#anchor-2"

        x =
        {
          scheme = "https",
          host = "example.com",
          port = 443,
          query = "q=foo%20bar&lang=en",
          fragment = "anchor-2",
          path = "/cgi-bin/search",
        }

        x = url:match "https://fred:password@...:4443/one/two/three.html"

        x =
        {
          scheme = "https",
          host = "example.com",
          path = "/one/two/three.html",
          port = 4443,
          user = "fred:password",
        }

  Available via LuaRocks: "luarocks install org.conman.parsers.url".

  -spc (Enjoy)

[1]     Source code:
        https://github.com/spc476/LPeg-Parsers/blob/e9006470a020d0aa3899919617392ba74d332dae/url.lua
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Luiz Henrique de Figueiredo
>           query = "q=foo%20bar&lang=en",

Shouldn't query be a table?

query = { q = "foo bar", lang = "en" }

Or are there helper functions for that?
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Sean Conner
It was thus said that the Great Luiz Henrique de Figueiredo once stated:
> >           query = "q=foo%20bar&lang=en",
>
> Shouldn't query be a table?
>
> query = { q = "foo bar", lang = "en" }
>
> Or are there helper functions for that?

  It used to be decoded into a table, but there were issues doing that
related to CGI scripts (as defined in RFC-3875) where the following is
legal:

        http://example.com/search?look+for+this

(RFC-3875, section 4.4).  If I parsed that into a table, it would look
something like:

        q = { ['look+for+this'] = true }

  Faced with that reality [1], I decided not to decode the query string,
instead leaving that for the user of the library.  I do have some helper
code for URLs [3], but not as a standalone module yet.  I'm still debating
about where (as in namespace wise) to put them.  I don't want to add them to
org.conman.parsers.url since each module there [4] returns an LPEG
expression, which allows for [5]:

        url = require "org.conman.parsers.url.data"   -- data: is special
            + require "org.conman.parsers.url.gopher" -- gopher: is annoying
            + require "org.conman.parsers.url.siptel" -- phone numbers
            + require "org.conman.parsres.url.tag"    -- tag: is also special
            + require "org.conman.parsers.url"        -- generic URI parsing

  I don't automatically include the modules for data:, gopher:, sip:, tel:
(siptel supports both in one module because of so much overlap between the
two) and tag: because they're rather specific in nature and don't quite
follow the normal URI scheme (although the generic parser can parse them).

  One thing I am working in is taking a decoded URL (which is a table) and
convert it into a string.  The only issue there are the other specific URL
types like data: and gopher:, which due their constructions, have different
fields than a generic URL, and how to include specific code when required
(much like how I have specific parsers for them that can be included at
will).

  -spc (This was probably a longer answer than expected ... )

[1] I was implementing CGI support for the new Gemini protocol [2]:

                https://github.com/spc476/GLV-1.12556

[2] An overview of the protocol:

                https://portal.mozz.us/gemini/gemini.circumlunar.space/

[3] As part of my Gemini sever:

                https://github.com/spc476/GLV-1.12556/blob/master/Lua/GLV-1/url-util.lua

[4] With one exception that I recently added.  I still may change my
        mind on that one, as it returns a table.

[5] gopher and siptel are available via LuaRocks.  data and tag are
        not currently.
 
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Sean Conner
It was thus said that the Great Sean Conner once stated:

> It was thus said that the Great Luiz Henrique de Figueiredo once stated:
> > >           query = "q=foo%20bar&lang=en",
> >
> > Shouldn't query be a table?
> >
> > query = { q = "foo bar", lang = "en" }
> >
> > Or are there helper functions for that?
>
>   It used to be decoded into a table, but there were issues doing that
> related to CGI scripts (as defined in RFC-3875) where the following is
> legal:
>
> http://example.com/search?look+for+this
>
> (RFC-3875, section 4.4).

  Oh, and one other reason related to CGI---the CGI script is expecting an
undecoded query string, which meant re-serializing a parsed table back into
a string.  That also influenced my decision---to avoid parsing data that
wasn't going to be used parsed.

  -spc
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Philippe Verdy-2
I also agree: the URL-encoding of parameters in the query string is just a convention

Also a simple table would break with a query string like "?q=1&q=2&r=3" which is also legal:
- If you represent it as a table it cannot be {"q" = "1"; "q" = "2"; "r" = "3"}, which is legal in Lua syntax but as the effect of overriding the key "q" with a second assignment so you just get  {"q"="2";"r"="3"}
- if you use a table of tables, it becomes: {"q" = {"1", "2"}; "r" = {"3"}} which preserves all values (note that the MIME encoding typically used for POST contents also allow duplicate assignments to the same parameter name, so these names are subindexed by a sequence number and for POST contents, there are several encodings possible: the posted parameters could use many other encodings, including some binary data in many supported formats (as indicated by the MIME type, and with the help of the transport syntax).

But there's still no key for a query string like  "?q&r&s"; you could think you could use anonymous keys, but "?=q&=r&=s" is also legal and distinct, and  "?1=q&3=r&3=s" is also legal and distinct...
Finally using "&" as a separator in query strings is just an optional convention; it is not mandatory for the HTTP(S) protocol for any GET verb or other verb. As well the URL-encoding interpreting the "+" and "%nn" sequences in query strings is also an optional convention; the correct way to represent unambiguously parameters is not inside the query string of the URL , but as an attachment in the content of the query (like for most POST commands).

For the HTTP GET command, the query string is an opaque cachable identifier that just helps identifying a resource or a different version of a resource from an endpoint located at the given path on the specified server. Only the path part has an enforced hierarchival convention (i.e. it is splittable at each "/" and the parts "." and ".." have a special meaning in that context)

It just happens that query strings now are most often used with the convention for URL encoding
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Petite Abeille
In reply to this post by Sean Conner
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Stefan-2
In reply to this post by Philippe Verdy-2


Am 05.06.2020 um 06:34 schrieb Philippe Verdy:

> I also agree: the URL-encoding of parameters in the query string is
> just a convention
>
> Also a simple table would break with a query string like "?q=1&q=2&r=3"
> which is also legal:
> - If you represent it as a table it cannot be {"q" = "1"; "q" = "2"; "r"
> = "3"}, which is legal in Lua syntax but as the effect of overriding the
> key "q" with a second assignment so you just get  {"q"="2";"r"="3"}
> - if you use a table of tables, it becomes: {"q" = {"1", "2"}; "r" =
> {"3"}} which preserves all values (note that the MIME encoding typically
> used for POST contents also allow duplicate assignments to the same
> parameter name, so these names are subindexed by a sequence number and
> for POST contents, there are several encodings possible: the posted
> parameters could use many other encodings, including some binary data in
> many supported formats (as indicated by the MIME type, and with the help
> of the transport syntax).

Browsers produce q=1&q=2 for checkboxes in forms.
PHP uses another convention to recognize arrays: ?q[]=1&q[]=2
But it is little known and breaks many scripts that expect only string
parameters.
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] org.conman.parsers.url 2.0.2 released

Sean Conner
In reply to this post by Petite Abeille
It was thus said that the Great Petite Abeille once stated:
>
> > On Jun 5, 2020, at 03:06, Sean Conner <[hidden email]> wrote:
> >
> > [3] As part of my Gemini sever:
> >
> > https://github.com/spc476/GLV-1.12556/blob/master/Lua/GLV-1/url-util.lua
>
> Another very cool thing Sean has done while on Gemini is porting the Urbit Sigils [1] to Lua.

  A few things you forgot to mention:

        1) it's a large 202,000+ byte script (due to the sigil information)
        2) the only dependency is Lua 5.1 or higher
        3) it works on HTTP, gopher and Gemini servers.

  And speaning of gopher, I have written a gopher server in Lua as well [2].

> Demo:
> https://portal.mozz.us/gemini/gemini.conman.org/sigil
> https://portal.mozz.us/gemini/gemini.conman.org/sigil%3Fid%3DF000F3F9
>
> Code:
> https://portal.mozz.us/gemini/gemini.conman.org/sigil-cgi.lua

  -spc

> [1] https://urbit.org/blog/creating-sigils/

[2] https://github.com/spc476/port70