Last Saturday morning my plans were interrupted when I noticed an email saying that a certain system was down. After some work, I was able to infer the cause of the failure and to work around it in part. The name server used on our internal network was not accessible, nor were some other systems, suggesting the failure of a VMWare host. By editing files on a couple of servers, so that they resolved names locally rather than through a query, I was able to get a couple of outward-facing systems working again. With the help of another techie, I identified the host that was not working, establishing that it was not connecting to the network. He restarted it, and presently we were back in business.
Today I heard from a co-worker that a system was down. I switched to that tab in my browser and confirmed that in fact it was running. Knowing that, it took only a few minutes to infer that Cloudflare,which manages DNS for our organization as for much of the internet, was not answering queries. I told the co-worker that connecting to work over VPN would allow the use of our internal nameserver, and so access to the system. But within half an hour, Cloudflare was answering queries again.
DNS, the domain name service, is what turns symbolic names such as www.stanford.edu into the numeric addresses that computers use. Everyone on the internet depends on it, relatively few know of it, vanishingly few think of it when it is working properly. But when it does not work, many things fail. The best comparison I can think of for those not engaged with it is GPS: imagine what would happen to travelers, Uber drivers, or others traveling in unfamiliar areas who supposed they could count on GPS and suddenly discovered they could not.
One must never suppose that one can count on anything manmade
ReplyDeleteIt's well to figure on intermittent failure? Probably so.
Delete