DNS hangs 5 secs on VirtualBox with --natdnshostresolver1
When running an alpinelinux guest using VirtualBox on OSX, DNS lookups hang for 5 seconds if the VB option --natdnshostresolver1 is enabled.
The --natdnshostresolver1 option has VB network/NAT intercept DNS requests and use the OSX host resolver to perform the lookup instead of the normal network. This is useful in development environments when the /etc/hosts should be used to resolve some hosts. This also affects docker containers of alpine running in the VB guest as well. The vagrant setup for CoreOS (and possibly other distros) enable the --natdnshostresolver1 by default.
Since getting VB running alpine as a guest directly is non-trivial to setup, I'll demonstrate the bug using Vagrant with the Vagrantfile show here:
btalbot-lt:alpine$ cat Vagrantfile
Vagrant.configure("2") do |config|
config.vm.box = "maier/alpine-3.4-x86_64"
config.vm.synced_folder ".", "/vagrant", disabled: true
config.vm.provider "virtualbox" do |vb|
vb.customize ['modifyvm', :id, '--natdnshostresolver1', 'on']
btalbot-lt:alpine$ vagrant up
<... stuff to console elided ...>
--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 6.700/6.700/6.700 ms
real 0m 5.01s
user 0m 0.00s
sys 0m 0.00s
A tcpdump of the DNS traffic from the alpine guest shows that responses for the A and AAAA queries are returned, but then the alpine guest seems to ignore the AAAA response, wait 2.5 seconds, and repeat the request again (and then ignore the response) before timing out after another 2.5 seconds.
22:34:45.724471 IP 10.0.2.15.52190 > 10.0.2.3.53: 27848+ A? www.google.com. (32)
22:34:45.724542 IP 10.0.2.15.52190 > 10.0.2.3.53: 28141+ AAAA? www.google.com. (32)
22:34:45.812045 IP 10.0.2.3.53 > 10.0.2.15.52190: 27848 1/0/0 A 188.8.131.52 (48)
22:34:45.812068 IP 10.0.2.3.53 > 10.0.2.15.52190: 28141 NotImp 0/0/0 (32)
22:34:48.228641 IP 10.0.2.15.52190 > 10.0.2.3.53: 28141+ AAAA? www.google.com. (32)
22:34:48.228965 IP 10.0.2.3.53 > 10.0.2.15.52190: 28141 NotImp 0/0/0 (32)
There are no interesting logs in /var/log/messages. The DNS resolver clearly doesn't like the response to the AAAA query but exactly what it does not like about it is not clear to me but my speculation is that the response is not a valid AAAA response. Maybe the alpine resolver can handle this response in a way that doesn't block resolution and just return the A record result right away instead of waiting 5 seconds?
#4 Updated by Natanael Copa 4 months ago
It turns out to be a bug in virtualbox. The RCODE 4 is not supposed to be used as a return code for missing AAAA records. It is to indicate that OPCODE field is unsupported.