Flaky DNS resolution diagnosis
I suspect there's a DNS resolution issue on my Linode causing it to be a bit flaky sometimes. I suspect this as I have Webmin checking if a couple of external web sites like Google are functioning… one testing with a direct IP and another with a DNS name. The DNS name one sometimes fails while the direct IP never fails.
Does anyone have an idea how I could diagnose this?
I did try mtr for a few hours and had no issues, which I know isn't related to DNS but at least irons out that there aren't intermittent network issues.
6 Replies
@jonny5alive:
Does anyone have an idea how I could diagnose this?
What sort of errors do you get when the webmin check fails by name (e.g., does it point to not being able to resolve the name)?
When I was having some issues in the past with the local resolvers in one of the DCs, I used a script to poll the local resolvers (with dig) for a known name. After a day or so it was easy enough to see periods of time when neither resolver was answering the request, and being able to quote those logs was helpful in submitting the trouble ticket.
Alternatively, you could switch your Linode to use a public DNS resolver for a period of time (say Google's) and see if the behavior changes and then just open a more general ticket asking for any known issues with the local resolvers.
Or, if your current testing is clearly showing name resolution errors, you could just open a ticket anyway. Best case it's something Linode is already aware of, or worst case you just fall back on one of the prior approaches to gather more data.
– David
The webmin check is UP or DOWN, and the DNS one goes DOWN while the direct IP stays UP.
What script did you use with dig?
My tests aren't clear yet so hard to do a ticket.
@jonny5alive:
What script did you use with dig?
Just a couple of lines of a bash script I wrote for the occasion. I doubt I kept it, but it would have just been an infinite while loop with two dig calls (one for each resolver) and a sleep. Oh, and probably a date in there somewhere for logging, so probably something like (untested):
#!/bin/bash
while [ 1 ]; do
date
# One of the following for each resolver address
echo "x.y.z.w:" `dig +noall +answer @x.y.z.w domain`
sleep 30
done
Replace addresses, domain name and sleep delay as needed. In my case I used a domain of mine hosted on Linode's servers, to minimize the risk of introducing a remote DNS server issue into the testing. Then, nohup that into the background while redirecting the output to some file, e.g.:
nohup script >script.out 2>&1
wait a bit and then review the output looking for dig errors.
– David
A very simple CURL script in PHP served by Apache won't resolve a DNS name, while when run as CLI with Apache user it works fine
No problems with DNS resolution when using dig, ping
Tried using Google's DNS servers in resolv.conf
Disabled IPv6
Disabled firewall
Only works when hardwiring DNS->IP address with /etc/hosts
At a loss to understand what is wrong. Looks like I will have to build a new Linode and set up everything again
Thanks everyone in IRC today.