Question about LuaSocket and HTTP

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about LuaSocket and HTTP

Geoff Smith

Hi


I am trying to learn a bit more about how to download a webpage using LuaSocket. What I was trying to do was download the English Premier league results from

http://www.bbc.co.uk/sport/football/results. . That was easy to achieve with this code


local http = require("socket.http")
local ltn12 = require("ltn12")

    fileHandle = io.open([[c:\_resultsFile.html]], "w")
    url1= [[http://www.bbc.co.uk/sport/football/results]]

    http.request(
        {
            url = url1,
            sink = ltn12.sink.file(fileHandle),
        })

This is the front page however, what I want to get is all of the Premier league results. I.e manually navigating the webpage I would select "Premier League" from the left most drop down window.  I can see from looking at the page source this is achieved by sending a GET string


If I sniff the data actually sent from the webpage when I click on "Premier League", I see


GET /bbc/bbc/s?name=sport.football.results.page&select_change_option_value=competition-118996114&select_change_container=NOT-SET&ns_type=hidden&action_name=select_change&action_type=change&ml_name=webmodule&ml_version=50&app_version=2.7.654&page_type=pal_data&sp=football&bbc_site=sport&pal_route=footballResults&app_type=web&language=en-GB&pal_webapp=sport&prod_name=sport&app_name=sport&blq_s=3.5d&blq_r=3.5&blq_v=default-domestic&blq_e=pal&bbc_mc=ad1ps1pf1&screen_resolution=1324x745&ns_ti=BBC%20Sport%20-%20Football%20-%20Results&ns_c=utf-8&ns__t=1462737017061&ns_jspageurl=http%3A%2F%2Fwww.bbc.co.uk%2Fsport%2Ffootball%2Fresults&ns_referrer=http%3A%2F%2Fwww.bbc.co.uk%2Fsport%2Ffootball%2Ffixtures HTTP/1.1


where the gist of the GET selection for the Premier league appears to be      select_change_option_value=competition-118996114


At this point I am stuck, how do I specifiy this GET selection string in the Lua example above ?

Do I add it to the Lua table that is passed to http.request() ? Or more likely do I just append it somehow to the basic site url ?


I have tried numerous variations of adding this to the url but so far cant get it to download the page I want.


Any tips on how to get this http GET to work would be most appreciated. Thanks


Geoff









Reply | Threaded
Open this post in threaded view
|

Re: Question about LuaSocket and HTTP

Vadim A. Misbakh-Soloviov
>
> At this point I am stuck, how do I specifiy this GET selection string in the
> Lua example above ?

In your current case I'd suggest something like:

    url1= [[/sport/football/results]]

    http.request(
        {
            url = [[http://www.bbc.co.uk]]..url1,
            sink = ltn12.sink.file(fileHandle),
        })

and operate with "url1" variable (there will be GET request with exactly it's
content.

But...

Are you sure that you want exactly luasocket and ltn12? ;)

There is luacurl (C) and luahtmlparser (pure Lua) libraries, which, when
combined, gives some magic powers to easily fetch and parse every site in
about 10 lines of code :)

--
wbr,
mva

Reply | Threaded
Open this post in threaded view
|

Re: Question about LuaSocket and HTTP

Geoff Smith
Hi Vadim

Thanks for the reply, yes I could already get the BBC home page, so that didnt really help me solve my problem

Try getting the http://www.bbc.co.uk/sport/football/results site and then programmatically navigate to the Premier League results page ? Thats where I am stuck.

I would be happy to use any Lua library that solves the above problem. LuaSocket, luacurl both can probably due the job in a few lines of code if I can figure out the GET param string and how to specify it.

Regards Geoff


________________________________________
From: [hidden email] <[hidden email]> on behalf of Vadim A. Misbakh-Soloviov <[hidden email]>
Sent: 08 May 2016 21:53:54
To: Lua mailing list
Subject: Re: Question about LuaSocket and HTTP

>
> At this point I am stuck, how do I specifiy this GET selection string in the
> Lua example above ?

In your current case I'd suggest something like:

    url1= [[/sport/football/results]]

    http.request(
        {
            url = [[http://www.bbc.co.uk]]..url1,
            sink = ltn12.sink.file(fileHandle),
        })

and operate with "url1" variable (there will be GET request with exactly it's
content.

But...

Are you sure that you want exactly luasocket and ltn12? ;)

There is luacurl (C) and luahtmlparser (pure Lua) libraries, which, when
combined, gives some magic powers to easily fetch and parse every site in
about 10 lines of code :)

--
wbr,
mva


Reply | Threaded
Open this post in threaded view
|

Re: Question about LuaSocket and HTTP

aryajur
If you just replace the url from  to [[http://www.bbc.com/sport/football/results/partial/competition-118996114?structureid=5&dateTimeNow=20160511]] it should work. I think you need to update the date in the dateTimeNow field of the URL. The result is the premiere league data. 

Milind

On Mon, May 9, 2016 at 2:21 AM, Geoff Smith <[hidden email]> wrote:
Hi Vadim

Thanks for the reply, yes I could already get the BBC home page, so that didnt really help me solve my problem

Try getting the http://www.bbc.co.uk/sport/football/results site and then programmatically navigate to the Premier League results page ? Thats where I am stuck.

I would be happy to use any Lua library that solves the above problem. LuaSocket, luacurl both can probably due the job in a few lines of code if I can figure out the GET param string and how to specify it.

Regards Geoff


________________________________________
From: [hidden email] <[hidden email]> on behalf of Vadim A. Misbakh-Soloviov <[hidden email]>
Sent: 08 May 2016 21:53:54
To: Lua mailing list
Subject: Re: Question about LuaSocket and HTTP

>
> At this point I am stuck, how do I specifiy this GET selection string in the
> Lua example above ?

In your current case I'd suggest something like:

    url1= [[/sport/football/results]]

    http.request(
        {
            url = [[http://www.bbc.co.uk]]..url1,
            sink = ltn12.sink.file(fileHandle),
        })

and operate with "url1" variable (there will be GET request with exactly it's
content.

But...

Are you sure that you want exactly luasocket and ltn12? ;)

There is luacurl (C) and luahtmlparser (pure Lua) libraries, which, when
combined, gives some magic powers to easily fetch and parse every site in
about 10 lines of code :)

--
wbr,
mva