LuaSocket and strange DNS failures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

LuaSocket and strange DNS failures

mchalkley
All,

I've written a monitoring program in Lua (running only on Windows
machines) that does server Echo tests, web site checks, etc. Over a
period of six months or so, I've noticed that eventually the DNS on
the machines my program was running on eventually just stopped working
and I couldn't get it to start working again, except by
restarting the machine.

When I say the DNS quits working, I mean that the machine can no
longer communicate using hostnames. For example, once the DNS is
broken, I can open a command prompt and ping other machines by IP
address, but not by host name.

Other than this anomaly, my program runs fine for weeks, but
eventually, the machine's DNS stops working. Sometimes this happens
within a few days, and other times it might take a week or two, but
it's never been able to run more than a couple weeks on any Windows
machine without crashing the DNS. If I reboot the machine, DNS works
fine until I start my program - then, eventually, the DNS lookups stop
working again.

Has anybody else ever seen this, or do you have any idea what could be
causing it?

Thanks,

Mark


Reply | Threaded
Open this post in threaded view
|

Re: LuaSocket and strange DNS failures

Gyepi SAM
On Wed, May 08, 2013 at 11:11:41PM -0400, [hidden email] wrote:
> Other than this anomaly, my program runs fine for weeks, but
> eventually, the machine's DNS stops working.
>
> Has anybody else ever seen this, or do you have any idea what could be
> causing it?

There are a few possible causes:

1. The machine's dns server list is changed.
2. The DNS server is either broken or is, itself, having connectivity issues.
3. The DNS server IP address is actually transient; DHCP, based on external
external IP, etc.

Should be easy to determine the actual cause.

When the machine machine is rebooted, bring up a console and use

    ipconfig /all

Or whatever the current equivalent is. It's been a while since I futzed with
Windows.

Note the DNS IP addresses and determine their source and reachability.
If the DNS servers are permanent (not DHCP based and don't change) and always reachable
(within your network, etc), then the next step is to wait until the problem
occurs again then recheck the network configuration. If the DNS server list
has changed, you have a new path for investigation because something or
someone has changed it. If not, you can focus on the servers; are they still
on the network (ping), what happens when you use nslookup.exe to make a
request to the servers, etc.

As you can tell, this isn't really related to Lua so you'd probably be better
served by asking the question or more specific versions in a Windows (admin?)
forum.

-Gyepi

Reply | Threaded
Open this post in threaded view
|

Re: LuaSocket and strange DNS failures

mchalkley
Thursday, May 9, 2013, 8:43:18 AM, you wrote:

> There are a few possible causes:

> 1. The machine's dns server list is changed.
> 2. The DNS server is either broken or is, itself, having connectivity issues.
> 3. The DNS server IP address is actually transient; DHCP, based on external
> external IP, etc.

> Should be easy to determine the actual cause.

> When the machine machine is rebooted, bring up a console and use

>     ipconfig /all

> Or whatever the current equivalent is. It's been a while since I futzed with
> Windows.

> Note the DNS IP addresses and determine their source and reachability.
> If the DNS servers are permanent (not DHCP based and don't change) and always reachable
> (within your network, etc), then the next step is to wait until the problem
> occurs again then recheck the network configuration. If the DNS server list
> has changed, you have a new path for investigation because something or
> someone has changed it. If not, you can focus on the servers; are they still
> on the network (ping), what happens when you use nslookup.exe to make a
> request to the servers, etc.

> As you can tell, this isn't really related to Lua so you'd probably be better
> served by asking the question or more specific versions in a Windows (admin?)
> forum.

> -Gyepi

Thanks for the suggestions, Gyepi, but it's not as simple as that. As
I mentioned, it's happening on multiple machines and workstations (8
out of 8 that I've tried, so far), and it takes a few days before the
problem manifests itself. None of the machines have any DNS issues
when my program isn't running, and all of them eventually do when it
is, so it's definitely somehthing my program is breaking. The memory
usage doesn't go up, so it's not something like that, but it's
definitely causing all DNS lookups on the machine to start failing.

I noticed that I was doing this in the routine that calls DNS (the
only place in my program that I do):

  local socket = require("socket") -- create local instance of socket
  local ip = socket.dns.toip(machine)
  return ip

I'm also doing a require "socket" globally at the top of my code, so I
commented out the local socket = require in the routine, thinking that
maybe I was stepping on myself, or that maybe the local socket wasn't
getting disposed of properly (or something like that). It still works
fine (in the sense that it functions properly), so I guess I'll just
have to let it run a few days to see if that does any good.

Thanks again,

Mark


Reply | Threaded
Open this post in threaded view
|

Re: LuaSocket and strange DNS failures

Rena

On 2013-05-09 11:49 AM, <[hidden email]> wrote:
>
> Thursday, May 9, 2013, 8:43:18 AM, you wrote:
>
> > There are a few possible causes:
>
> > 1. The machine's dns server list is changed.
> > 2. The DNS server is either broken or is, itself, having connectivity issues.
> > 3. The DNS server IP address is actually transient; DHCP, based on external
> > external IP, etc.
>
> > Should be easy to determine the actual cause.
>
> > When the machine machine is rebooted, bring up a console and use
>
> >     ipconfig /all
>
> > Or whatever the current equivalent is. It's been a while since I futzed with
> > Windows.
>
> > Note the DNS IP addresses and determine their source and reachability.
> > If the DNS servers are permanent (not DHCP based and don't change) and always reachable
> > (within your network, etc), then the next step is to wait until the problem
> > occurs again then recheck the network configuration. If the DNS server list
> > has changed, you have a new path for investigation because something or
> > someone has changed it. If not, you can focus on the servers; are they still
> > on the network (ping), what happens when you use nslookup.exe to make a
> > request to the servers, etc.
>
> > As you can tell, this isn't really related to Lua so you'd probably be better
> > served by asking the question or more specific versions in a Windows (admin?)
> > forum.
>
> > -Gyepi
>
> Thanks for the suggestions, Gyepi, but it's not as simple as that. As
> I mentioned, it's happening on multiple machines and workstations (8
> out of 8 that I've tried, so far), and it takes a few days before the
> problem manifests itself. None of the machines have any DNS issues
> when my program isn't running, and all of them eventually do when it
> is, so it's definitely somehthing my program is breaking. The memory
> usage doesn't go up, so it's not something like that, but it's
> definitely causing all DNS lookups on the machine to start failing.
>
> I noticed that I was doing this in the routine that calls DNS (the
> only place in my program that I do):
>
>   local socket = require("socket") -- create local instance of socket
>   local ip = socket.dns.toip(machine)
>   return ip
>
> I'm also doing a require "socket" globally at the top of my code, so I
> commented out the local socket = require in the routine, thinking that
> maybe I was stepping on myself, or that maybe the local socket wasn't
> getting disposed of properly (or something like that). It still works
> fine (in the sense that it functions properly), so I guess I'll just
> have to let it run a few days to see if that does any good.
>
> Thanks again,
>
> Mark
>
>

require() normally only loads a module once; further loads of the same module return a cached copy. See package.loaded.

Reply | Threaded
Open this post in threaded view
|

Re: LuaSocket and strange DNS failures

David Favro
In reply to this post by mchalkley
On 05/09/2013 11:48 AM, [hidden email] wrote:

> Thanks for the suggestions, Gyepi, but it's not as simple as that. As
> I mentioned, it's happening on multiple machines and workstations (8
> out of 8 that I've tried, so far), and it takes a few days before the
> problem manifests itself. None of the machines have any DNS issues
> when my program isn't running, and all of them eventually do when it
> is, so it's definitely somehthing my program is breaking. The memory
> usage doesn't go up, so it's not something like that, but it's
> definitely causing all DNS lookups on the machine to start failing.

It may be the program that is the proximate cause of the breakage as you
say, but if so the actual problem is very likely in something else and it's
just being triggered by this program... but regardless, that's beside the
point: without posting anything about how your DNS is configured [do you run
recursive lookups locally? if not, how many and which servers do you
query?], what is the nature of the failure [does your DNS resolver report
any errors to the calling program or to a system log, and if so what are
they? you said that you cannot ping by hostname, OK so what error did your
ping program report?], or running any diagnostics on the machine when it's
malfunctioning [did you run a more specific resolver tool than 'ping'? did
you try sniffing the wire? do DNS query packets go out? do replies come
back? what are the contents of the reply packet?] all that anyone could do
is wildly speculate, which frankly is a waste of time.  How can you expect
anyone here to help you find the cause of the failure when you've given no
indication of what the failure is, other than "ping ip-addr works, ping
hostname fails."

Also a little off-topic for this list, unless you can establish a more
direct link to the Lua program other than your observed correlation between
the program running for a long time and the resolver failure.  It's the
resolver that's failing so you need to diagnose it and see if that leads you
back to the program or not.