Random DNS problem?
This has been working perfectly every night for over a year.
In the last couple of weeks, it has started to fail about half the time. The script reports "Net::FTP: Bad hostname '
It seems that in the middle of the night (1am PT and recently tried switching to 3am PT), my other VPS sometimes can't resolve a DNS name that points to my linode. I can't tell if the remote DNS server is unresponsive or if my Linode is down, or nsX.linode.com is just not responding at that time of night. The script only tries once, since this has never been an issue until now.
I could simply plug in the Linode's static IP address to the script, but I kind of want to know why this is failing on principle. I'm also too old to stay up until 1am and do any "live" troubleshooting. Every time I run the script manually during the day, it works fine with no errors. I can't reproduce this from 7am to 11pm. I am running CSF/LFD firewall on both hosts, but that doesn't explain the random nature of this failure (and both IP addresses are whitelisted on each box anyway).
Any suggestions on where to start narrowing this down?
12 Replies
To see whether your linode DNS is causing the problems, you might run one of these before your script (they both bypass your recursive DNS server and talk directly to the authoritative nameservers):
dig +trace www.MYLINODE.com
````````
dig +nssearch www.MYLINODE.com
(You might have to install dig)
I will give that a try. I think you meant 'nosearch' rather than 'nssearch'. I haven't studied Net::ftp much to see how long it waits for a reply, etc. I also thought about just adding a quick sleep-and-try-again routine if the first connection fails. I'm guessing any of these might help.
It's more about "why did it start failing and then only some of the time". I can't stand things like that.
@haus:
I think you meant 'nosearch' rather than 'nssearch'.
Nope, meant it exactly as it was typed. 'nssearch' hits ALL the authoritative nameservers and tells you how long they took to respond.
@man dig:+[no]nssearch
When this option is set, dig attempts to find the authoritative name servers for the zone containing the name being looked up and display the SOA record that each name server has for the zone.
````
$ dig +nssearch linode.com
SOA ns1.linode.com. dns.linode.com. 2010122118 7200 3600 604800 86400 from server ns3.linode.com in 33 ms.
SOA ns1.linode.com. dns.linode.com. 2010122118 7200 3600 604800 86400 from server ns4.linode.com in 36 ms.
SOA ns1.linode.com. dns.linode.com. 2010122118 7200 3600 604800 86400 from server ns1.linode.com in 71 ms.
SOA ns1.linode.com. dns.linode.com. 2010122118 7200 3600 604800 86400 from server ns2.linode.com in 103 ms.
SOA ns1.linode.com. dns.linode.com. 2010122118 7200 3600 604800 86400 from server ns5.linode.com in 113 ms.
````
When I do subdomain.MYLINODE.com I get nothing back. When I do MYLINODE.com I get results like the ones you posted.
Obviously there's a chance this is a configuration issue specific to the domain in question, but since this is a new intermittent failure I'm guessing it relates more to a broader issue (something changed elsewhere beyond my immediate control). Particularly as I haven't made any DNS changes in over 6 months for any of the VPS' or domains in question.
Anyway, I "solved" the issue by adding a quick loop in my script that tries the FTP connection up to 5 times with a short break in between. Last night it failed on the first try and then succeeded on the second attempt. So thanks again to Stever for the suggestions and helping me learn something new.
If I manage to sort out the issue for real someday I'll post the solution, but for now this will work.
@haus:
I'm not willing to provide my real domain name to an open troubleshooting forum.
Obviously there's a chance this is a configuration issue specific to the domain in question, but since this is a new intermittent failure I'm guessing it relates more to a broader issue (something changed elsewhere beyond my immediate control). Particularly as I haven't made any DNS changes in over 6 months for any of the VPS' or domains in question.
Yes, but it also means that nobody else can reproduce the problem on their own linode.
#!/usr/local/bin/perl
my $ftp_hostname = ''; # ftp host name
my $ftp_port = '21'; # typical value
my $ftp_passive = 0; # change to 1 for passive mode
use Net::FTP;
my $ftp = Net::FTP->new($ftp_hostname, Port => $ftp_port, Passive => $ftp_passive);
print "Content-type: text/html\n\n";
if (!ftp) {
print "FTP connection failed: $@";
} else {
print "FTP connection successful.";
}
exit;
That's just a snippet pulled from my original code, which would result in a "bad hostname" error about half the time when run in the wee hours of the morning. Again just for clarity, this script is running on a different host, trying to connect via FTP to my linode.
The script only runs once per day, so I suspect this may relate to a DNS cache (which might explain why it works all day long when I try it at the command line; the lookup has already occurred so it is now cached for the day, even though the script may not have waited long enough for the query to finish).