[SOLVED] Bind9: AXFR not coming through
I use one of my Linodes in Dallas as a primary DNS for my company, and we have the secondary DNS on our company's office network in Norway. Yet, when I attempt to query an AXFR for a whole domain from the office network, the query fails. Both DNS servers are authorative for the domain in question, and the IP segment of my office network is in the "allow-transfer" option. Yet, the transfer query fails. It has worked fine previously, but this has stopped working lately.
The beginning of the Dallas Linode's /etc/bind/named.conf:
options {
directory "/var/bind";
auth-nxdomain yes;
listen-on { 127.0.0.1; 67.18.92.145; 70.85.129.159; };
allow-notify { 213.184.199.28; 127.0.0.1; 67.18.92.145; };
allow-transfer { 213.184.199.0/26; };
allow-query { any; };
allow-recursion { 213.184.199.0/26; 77.75.208.0/24; };
recursive-clients 5000;
};
And yes, this Linode has two IP addresses, and my company's office network is 213.184.199.0/26 (213.184.199.1 - 213.184.199.63).
From the server using the IP address 213.184.199.28:
$ dig @70.85.129.159 axfr by.com
; <<>> DiG 9.6-ESV-R4 <<>> @70.85.129.159 axfr by.com
; (1 server found)
;; global options: +cmd
; Transfer failed.
Querying other record types (A, MX, NS) works just fine, but it hinders the ability to propagate zone updates to the secondary DNS server.
Any ideas what I'm missing?
6 Replies
options {
directory "/var/bind";
auth-nxdomain yes;
listen-on { 127.0.0.1; 67.18.92.145; 70.85.129.159; };
allow-notify { 213.184.199.28; 127.0.0.1; 67.18.92.145; };
allow-transfer { 213.184.199.0/26; };
allow-query { any; };
allow-recursion { 213.184.199.0/26; 77.75.208.0/24; };
recursive-clients 5000;
};
Unless you have any of the above allow- options in your zone(s) overriding them, I don't see anything abnormal. Some debug logging might be helpful.
Something like this maybe:
logging {
channel "query_channel" {
file "/var/log/dns.log";
print-time yes;
print-category yes;
print-severity yes;
};
category "default" {
"query_channel";
};
category "queries" {
"query_channel";
};
};
````
–
Travis
logging {
channel "query_channel" {
file "/var/log/dns.log";
print-time yes;
print-category yes;
print-severity yes;
};
category "default" { "query_channel"; };
category "queries" { "query_channel"; };
category "security" { "query_channel"; };
category "unmatched" { "query_channel"; };
category "xfer-in" { "query_channel"; };
category "xfer-out" { "query_channel"; };
category "resolver" { "query_channel"; };
category "general" { "query_channel"; };
category "database" { "query_channel"; };
category "client" { "query_channel"; };
};
Searching out the IP address 213.184.199.28, which is where I'm testing my queries from, doesn't get me anywhere. Queries for "A" and "MX" records (ie. queries that come through) are listed in the log, but not queries for "AXFR" records.
Any suggestions for other log configurations I can try to get more information?
– David
@db3l:
Any chance that your office firewall changed recently? Maybe they're letting UDP DNS through (which most queries use) but not TCP (which is used for axfr, but can also be used if a truncated response to a regular query is received)?
No changes in my office's firewall that could explain this. And I should know, since I'm the one managing the office firewall.
I just tried something else: I added 127.0.0.1 to the allow-transfer list below, restarted bind9, and tried the following:
$ dig @localhost axfr by.com
; <<>> DiG 9.7.3 <<>> @localhost axfr by.com
; (1 server found)
;; global options: +cmd
; Transfer failed.
Something mysterious is up, that's for sure. Bind version is 9.7.3, if that helps (latest from Debian repository).
@NeonNero:
@db3l:Any chance that your office firewall changed recently? Maybe they're letting UDP DNS through (which most queries use) but not TCP (which is used for axfr, but can also be used if a truncated response to a regular query is received)?
No changes in my office's firewall that could explain this. And I should know, since I'm the one managing the office firewall.
Hmm, ok, just to be complete, any firewall setup on your Linode that might interfere?
Can you telnet (or nc, or whatever your favorite tcp connection test tool is) to port 53 on your Linode either locally or from your office? Can you see bind listening on that port for both udp and tcp?
I'm just wondering if the failure to see anything in the log is due to the traffic never reaching bind rather than something in bind's configuration itself.
– David
@db3l:
@NeonNero:
@db3l:Any chance that your office firewall changed recently? Maybe they're letting UDP DNS through (which most queries use) but not TCP (which is used for axfr, but can also be used if a truncated response to a regular query is received)?
No changes in my office's firewall that could explain this. And I should know, since I'm the one managing the office firewall.
Hmm, ok, just to be complete, any firewall setup on your Linode that might interfere?Can you telnet (or nc, or whatever your favorite tcp connection test tool is) to port 53 on your Linode either locally or from your office? Can you see bind listening on that port for both udp and tcp?
I'm just wondering if the failure to see anything in the log is due to the traffic never reaching bind rather than something in bind's configuration itself.
OK, now we're getting somewhere.
Without even trying to telnet in, I did a netstat, first with -an just to see port 53 pop up on both tcp and udp. I then shut down the bind9 service to see if there was something else looking in, and even when bind wasn't running, I saw that something was listening on 0.0.0.0:53, both on tcp and udp, as well as their equivalents on tcp6 and udp6, which I found rather strange. Adding -p to the netstat command revealed that dnsmasq was running, for some reason. Shutting down dnsmasq and restarting bind9 appeared appeared to do the trick, as the axfr query from the remote machine was suddenly working again.
It seemed that I had installed dnsmasq when I attempted to install openvpn (as part of a tutorial) some time ago, and hadn't noticed that something was wrong because bind9 hadn't been complaining during startup about the busy udp ports. In my case, I might as well just remove dnsmasq for now (since it's not really in use).
Thanks for nudging me in the righ direction, guys!