lua.org down

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

lua.org down

Luiz Henrique de Figueiredo
It seems that the Lua site is down. There is a mirror at
        http://www.tecgraf.puc-rio.br/lua/mirror/

(But I think that this message will not be posted before lua.org comes back.)

Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Daniel Silverstone
On Sun, Nov 06, 2016 at 00:15:13 -0200, Luiz Henrique de Figueiredo wrote:
> It seems that the Lua site is down. There is a mirror at
> http://www.tecgraf.puc-rio.br/lua/mirror/
>
> (But I think that this message will not be posted before lua.org comes back.)

As an outage report...

The server which hosts both lua.org and the lua-l mailing list suffered from a
severe failure approximately 14 hours ago which was not picked up by myself or
Rob until this morning at about 7am.  We have worked since then to restore
things and everything should be back just fine.  Sadly due to the nature of the
failure, there are no logs from when it happened so I cannot state what the
cause was.

We will be looking to put some external monitoring into place to help us spot
this kind of thing sooner (the machine in question was the gateway for email
so we didn't see any monitor emails until we'd fixed things).

Sorry for the outage, and I hope you all enjoyed a quieter Saturday evening
as a result :-)

D.

--
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69

Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Lorenzo Donati-3
On 06/11/2016 10:52, Daniel Silverstone wrote:
> On Sun, Nov 06, 2016 at 00:15:13 -0200, Luiz Henrique de Figueiredo wrote:

> The server which hosts both lua.org and the lua-l mailing list suffered from a
> severe failure approximately 14 hours ago which was not picked up by myself or
> Rob until this morning at about 7am.  We have worked since then to restore
> things and everything should be back just fine.  Sadly due to the nature of the
> failure, there are no logs from when it happened so I cannot state what the
> cause was.
>

 From the wording above it seems you experienced some kind of "Murphy's
catastrophe", like "A rodent somehow slipped in the server room and got
fried after gnawing the mains cable and bringing the whole rack down". :-)

Just curious, could you explain in a bit more detail what happened? I
was talking about catastrophic system failures to my students [1] last
week and maybe this could make a nice case study? Of course feel free to
ignore my request if you are too busy or if you cannot disclose the details.



> We will be looking to put some external monitoring into place to help us spot
> this kind of thing sooner (the machine in question was the gateway for email
> so we didn't see any monitor emails until we'd fixed things).
>
> Sorry for the outage, and I hope you all enjoyed a quieter Saturday evening
> as a result :-)
>
> D.
>

Thanks in advance.

Cheers!

-- Lorenzo

[1] Students of a technical high school attending a course for becoming
sysadmins.

Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Daniel Silverstone
On Sun, Nov 13, 2016 at 11:17:58 +0100, Lorenzo Donati wrote:
> Just curious, could you explain in a bit more detail what happened? I was
> talking about catastrophic system failures to my students [1] last week and
> maybe this could make a nice case study? Of course feel free to ignore my
> request if you are too busy or if you cannot disclose the details.

The machine in question is a virtual machine which means we actually got
to look at its console.  The console was full of messages along the lines
of:

INFO: task <process>:<pid> blocked for more than 120 seconds

In this instance, it was pretty much all the apps which suggests that the
IO subsystem for the VM had a hiccough.  The host system was fine so we
are basically only going to blame bogons for the fault.

As for not spotting it in time; that was simply human error combined with
bad error reporting design.  The box in question is our primary web server
which means the monitoring apps present their reports there; and as a human
I simply didn't try to look at anything for about 12 hours.

In mitigation, I have in the past mentioned ways to contact me out of band
(since yes, that server is also the mail delivery box) but in the end it
was simply the sort of thing that a non-professional small-time hosting
provider hits.

Sorry again for the inconvenience.

D.

--
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69

Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Lorenzo Donati-3
Thank you very much for the prompt reply!

On 13/11/2016 12:10, Daniel Silverstone wrote:

> On Sun, Nov 13, 2016 at 11:17:58 +0100, Lorenzo Donati wrote:
>> Just curious, could you explain in a bit more detail what happened? I was
>> talking about catastrophic system failures to my students [1] last week and
>> maybe this could make a nice case study? Of course feel free to ignore my
>> request if you are too busy or if you cannot disclose the details.
>
> The machine in question is a virtual machine which means we actually got
> to look at its console.  The console was full of messages along the lines
> of:
>
> INFO: task <process>:<pid> blocked for more than 120 seconds
>
> In this instance, it was pretty much all the apps which suggests that the
> IO subsystem for the VM had a hiccough.  The host system was fine so we
> are basically only going to blame bogons for the fault.
>

So no dead rat entangled in mains wiring :-D, but still an interesting case.

> As for not spotting it in time; that was simply human error combined with
> bad error reporting design.  The box in question is our primary web server
> which means the monitoring apps present their reports there; and as a human
> I simply didn't try to look at anything for about 12 hours.
>
> In mitigation, I have in the past mentioned ways to contact me out of band
> (since yes, that server is also the mail delivery box) but in the end it
> was simply the sort of thing that a non-professional small-time hosting
> provider hits.
>

Don't worry, I think we all appreciate your efforts here.

BTW, it is a particularly interesting case for my students because the
local business situation here is made of primarily small firms,
sometimes with highly varied and specialized needs (touristic
facilities, high-tech niche mechanics, civil engineering design teams,
etc.). Oftentimes a sysadmin will end working in a very resource-limited
environment, with few personnel available for ICT things.


> Sorry again for the inconvenience.
>
> D.
>

Cheers!

-- Lorenzo


Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Luiz Henrique de Figueiredo
In reply to this post by Daniel Silverstone
> it was simply the sort of thing that a non-professional small-time hosting
> provider hits.

For the record, Pepperfish has been providing outstanding free service
to the Lua community since 2004, for which we are very grateful.

Reply | Threaded
Open this post in threaded view
|

Re: lua.org down

Luiz Henrique de Figueiredo
In reply to this post by Luiz Henrique de Figueiredo
It seems that the Lua site is down. There is a mirror at
        http://www.tecgraf.puc-rio.br/lua/mirror/

(But I think that this message will not be posted before lua.org comes back.)