Recommended way to download and parse web pages?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Recommended way to download and parse web pages?

Nereus
Hello

I'm a semi-Lua newbie.

I need to fetch web pages and extract infos from each of them.

I have LuaRocks installed, and was wondering what packages are
recommended for this.

lua-curl
luacurl

http-digest
httpclient
lua-http-parser
lua-resty-http

htmlparser
luahtml
lusty-html

Thank you.


Reply | Threaded
Open this post in threaded view
|

Re: Recommended way to download and parse web pages?

Aapo Talvensaari
On 15 May 2015 at 16:05, Gilles <[hidden email]> wrote:
> I need to fetch web pages and extract infos from each of them.

You may want to try this:

(It has C-lib dependecies, though) 

Reply | Threaded
Open this post in threaded view
|

RE: Recommended way to download and parse web pages?

Thijs Schreijer
In reply to this post by Nereus

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Gilles
> Sent: vrijdag 15 mei 2015 15:06
> To: [hidden email]
> Subject: Recommended way to download and parse web pages?
>
> Hello
>
> I'm a semi-Lua newbie.
>
> I need to fetch web pages and extract infos from each of them.
>
> I have LuaRocks installed, and was wondering what packages are
> recommended for this.
>
> lua-curl
> luacurl
>
> http-digest
> httpclient
> lua-http-parser
> lua-resty-http
>
> htmlparser
> luahtml
> lusty-html
>
> Thank you.
>

I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.

For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).

Thijs

[1] https://github.com/keplerproject/copas 
[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua


Reply | Threaded
Open this post in threaded view
|

Re: Recommended way to download and parse web pages?

Nereus
On Fri, 15 May 2015 21:56:51 +0000, Thijs Schreijer
<[hidden email]> wrote:
>I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.
>
>For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).
>
>Thijs
>
>[1] https://github.com/keplerproject/copas 
>[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua

Thanks for the infos. Are those available as LuaRocks? It's easier to
install for newbies.

"lua patterns" = regex?


Reply | Threaded
Open this post in threaded view
|

Re: Recommended way to download and parse web pages?

Choonster TheMage
On 16 May 2015 at 21:06, Gilles <[hidden email]> wrote:

> On Fri, 15 May 2015 21:56:51 +0000, Thijs Schreijer
> <[hidden email]> wrote:
>>I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.
>>
>>For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).
>>
>>Thijs
>>
>>[1] https://github.com/keplerproject/copas
>>[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua
>
> Thanks for the infos. Are those available as LuaRocks? It's easier to
> install for newbies.
>
> "lua patterns" = regex?
>
>

Copas, LuaSocket and LuaSec are all available on LuaRocks.

Lua's patterns are similar to regular expressions, but are more limited:
http://www.lua.org/manual/5.3/manual.html#6.4.1

Reply | Threaded
Open this post in threaded view
|

Re: Recommended way to download and parse web pages?

Nereus
On Sat, 16 May 2015 21:16:36 +1000, Choonster TheMage
<[hidden email]> wrote:
>Copas, LuaSocket and LuaSec are all available on LuaRocks.
>
>Lua's patterns are similar to regular expressions, but are more limited:
>http://www.lua.org/manual/5.3/manual.html#6.4.1

Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: Recommended way to download and parse web pages?

Coda Highland
In reply to this post by Nereus
On Sat, May 16, 2015 at 4:06 AM, Gilles <[hidden email]> wrote:

> On Fri, 15 May 2015 21:56:51 +0000, Thijs Schreijer
> <[hidden email]> wrote:
>>I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.
>>
>>For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).
>>
>>Thijs
>>
>>[1] https://github.com/keplerproject/copas
>>[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua
>
> Thanks for the infos. Are those available as LuaRocks? It's easier to
> install for newbies.
>
> "lua patterns" = regex?

Not exactly regex, but conceptually similar.

For your edification: http://www.lua.org/pil/20.1.html

/s/ Adam