Let Bogons Be Bogons: A Nightmare from ISP Hell

By lamppter in Meta
Tue Jul 04, 2006 at 02:05:07 PM EST

I work for a tier 2 ISP as a WAN and Systems Administrator. We have approximately 160,000 users and provide Internet Services to 13 large organizations.

A couple of years ago our backbone was on a tier 2 provider with its hub located in Austin, Texas. We decided to provide our customers with greater bandwidth, less cost and better service by changing to a major tier 1 backbone and getting a two 45 Mbps(DS3) pipes to the Internet. We also had the ability to add more bandwidth later with a newly purchased second Cisco 7513 router.

We carefully planned the move for 6 months and brought our customers into the planning process. So we sent out an RFP, awarded the bid and began the process. What followed was to become a disaster. It was the "perfect storm" for an ISP failure.

"What's the problem? Why can't I go to my favorite website?" My-Pointy-Headed Boss asked.

"So, what's your favorite website?" I asked her. I knew how I was going to answer.

"AOL.COM." she tells me, quite disturbed and very angry.

"We're on the bogons ip list." I told her. She was furious and didn't have the foggiest idea what it was that I was talking about. To top it off the Executive Director was also furious, it's his favorite website too.

The glitch was this. The problem was caused by a multi-million dollar tier one Internet Service Provider and ended up costing us a great deal of money in the short run.

Here's the story of what happened.

The Preparation
The reason it is such a concer to change ISPs is that your customers have to change their IP address space. For our largest customers that is a big issue. They all have to look at their infrastructure very carefully, even devices they have forgotten about. For example, every router, layer-3 switch, mail server, DNS server and file server needs to be looked at and reconfigured. Also, they all had to change their DNS entries at their Internet registrar, so that the world can find them.

So I prepared an extensive checklist for each customer to go through and check. Some customers had considerable technical expertise and didn't need this. They used it as a reference. Other customers really needed hand holding and relied on us for technical support to get them through it. Our preparation included a seminar for the less technical customers so that they would at least understand what was going on. Many of them had hard coded workstations with DNS entries instead of using DHCP. We went so far as to show them how to set up a DHCP server and showed them how to set up DHCP on the individual workstations.

We also provided primary and secondary DNS services for at least half of our customers. To make things easier we purchased two new blade servers for DNS and had them reconfigured to the new CIDR block. We then requested and received a /20 CIDR block from our ISP. This provided us with essentially 16 Class C networks for our customers' needs. With the smaller customers we had to slice up our /20 into a couple of /26s. It all seemed simple, well planned and prepared. I had documented everything necessary and provided this to our customers.

Classless Inter-Domain Routing (CIDR)
Why is CIDR important? Without it, the Internet would probably have run out of IP addresses long before now. CIDR allows ISPs to more efficiently use the IP address space that we are quickly using up. IP addresses are a finite resource, like oil. We then sliced up the /20 that we received into /26 CIDR blocks to give to our customers. This gave them 62 usable IP addresses.

Our customers in turn, would put their customers on private networks and then use various forms of Network Translation (NAT). For example, most people reading this probably have a private IP address. Private IP addresses are in the following ranges:
This is probably the case if you have a router/firewall at home. Your ISP might then assign your router/firewall another private IP address before it finally receives a public IP address. This brings us back to the public CIDR blocks of IP addresses.

It was now early June, I had been working on this since the previous Christmas and was sure things would go smoothly. We had a date for the cut over from our old backbone provider and then notified our customers well ahead of time of the date of the cut over to the new ISP.

Gathering Storm On the Horizon
The big day finally arrived. I had double checked procedures with our customers and they were all prepared. Part of our plan was to do one-a-day with our largest customers and then two-a-day with our smaller customers. We decided to do our largest customers on the weekend. They both had thousands of their own customers. Downtime would be minimal, DNS issues notwithstanding, would take awhile to update throughout the Internet.

A couple of weeks earlier our company switched over to the new backbone and we saw no problems. We are quite small but we tested connectivity and were quite pleased. We decided that our customers would be quite happy with the larger pipes, more bandwidth and lower costs.

The changeover for the first large customer went flawlessly, they had planned well and were ready to go. I was running the other DNS servers for a couple of weeks and some of our customers for awhile pointed to them. The whole changeover occurred in less than 30 minutes. Our small team hung around and helped them through a few glitches and called it a day. We told them they could call us anytime day or night and they received my cell number if they wanted to call me with any questions. Pleased and satisfied we went for drinks at a nearby bar.

The next day was a Sunday and we switched over our next largest customer. They had some concerns about the changeover, so while we did it they sent some of their technical specialists over to take a look and monitor their new network. We had a switch available for them to plug into monitor their nets. All went like the previous customer and the changeover was flawless. They were pleased. It continued this way for the rest of the week with little hand holding on our part.

Bogon IPs
Lurking in the background all this time but unknown to us was the bogons IP listing. Our tier 1 provider assured us several times over the phone and via email and their bulletin board that the /20 CIDR block was good, fresh and not previously used. The following week as we were changing our smaller customers we began getting tech calls from our larger customers. The problem; there were certain web sites they couldn't reach. I remember calling up the Provisioning Manager and asking if some other ISP had previously used the /20 we now had. We were beginning to have sporadic website and email issues.

What are bogon IPs you are asking? Well I didn't know either. There is no reason to really. Here is what they are.

Officially they are not IP blocks officially allocated by IANA or RIRs. Additionally, they should not be routable. This is no big deal unless you are an organization that puts these lists in your routers so that if they show up on your WAN circuit the bad packets are immediately dropped. Still, this is a common practice with some organization, no big deal nothing wrong with that. However, there is a very big problem that happens very rarely and it happened to us.

The /20 CIDR block that had been allocated to us by our tier 1 backbone provider was allocating an IANA unallocated /16 CIDR block! The result was 1000s of ISPs dropping our packets to them. This was happening on DNS lookups, MX lookups and for websites. For example we were getting calls from our customers saying that they could not get to a airline's website or some email was not getting to recipients.

"Have you double checked your firewall settings?" I would ask them scratching my head.

"Yeah we have double checked a hundred times! Something is not right." The network admin advised me.

"Will you check some DNSRBL lists?"

"ARRRGGHHH!" would come the response.

The first 10 of these I answered this way. I couldn't figure out what was going on. Finally, I call our upstream provider.

"I need to speak with our Provisioning Manager." after slamming through the lame voice menu crap.

"Our customers cannot get to certain websites, was the CIDR block you gave us clean and never used before?"

"Yes, they are clean and have never been used." That was the answer I would get in a somewhat condescending tone.

"Would you check for me please?"

"Let me put you on hold while I talk to the IP Address Team."

This happened a couple of times until I got disgusted and hung up. It was difficult diagnosing this problem from work because I could only see what I could see from there. So, on the second day of this nonsense I took a freebsd box home, put it on my network and started looking at our CIDR block from the outside.

First I tried to ping and traceroute to our router. What I saw were packets being dropped long before they even reached our backbone. In fact, I tried doing DNS lookups off my DNS server and nothing, email bounced also ... not a good thing. I only received non-existent domain responses.

Deciding that something somewhere was dropping ICMP packets I decided to traceroute using mtr (Matt's Traceroute). It is a nice tool that combines ping and traceroute. Also it gives good statistics that you can copy and paste. I was consistently getting dropped packets to my network at work. I ran mtr going to all our interfaces on our two Cisco 7513s from home. Then I used an online traceroute and ping at DNSStuff.com and started gathering data from there. None of it was looking good but I had come to the conclusion that there was some vague firewall blocking me going to various sites.

Then I decided to google the first Class C slice out of the /20 CIDR block. BINGO! A ton of information sprung up. At the top of the list was a bulletin board entry buried deep inside our tier 1 backbone's website. The problem first occurred in February and noone noticed for a month.

The bogons list is called the Bogon IP List. What some network administrators do is put this list in their router's ACL and other infrastructure devices. It is a good idea unless you forget to update the list. Which is what we were now confronted with.

The /20 CIDR block was only on the list for a short time, one month in fact from February to the end of March. When I discovered the bogons list and did not see the references to the URLs that were on the tier 1 ISP's BBS I figured out that it had been quickly removed by whoever made the mistake. But during that short period 1000s of Network Admin had stuffed it into their routers and hadn't bothered updating the list. Our /20 did not show up on the June and July list.

Help Desk Nightmare
Our phones are now ringing off the hook with help desk calls, angry customers and I don't blame them. I go to work every morning and stay on the phone all day, literally.

I email all my evidence to the ISP's Project/Provisioning Manager. Then I call him up. They tell me they are working on the problem. My bosses are fuming.

"Did you read my email?"

"We are going over it now. Can I call you back?"

"NO, I want some answers now and so does my boss! I have a bunch of angry customers. Do you want me to transfer their calls to you?"

Once again I am put on hold and in anger I hang up and report the results to the boss. I call the Provisioning Manager back and he directs me immediately to a network specialist.

"Did you know about the complete /16 being on the bogons list in February?"

"Well um...yes but it is not on there now."

The fact is once a CIDR block is on the list it takes years for it to get clean. This is due to busy admin not updating the list very often and thousands of router admin use the list.

We demanded a new /20 and it provided us with a work around. The new /20 was clean and we started routing everyone through it. The IPs that blocked us we entered in the router. Eventually, this ate up all the router resources.

As it turns out, the Executive Director knows the VP at the ISP. He calls him up. Our ISP wants us to call all the main admin together for a big egg-on-the-face meeting with lots of swag and a free breakfast. I sense this will be fun to watch these fat cats weasel out of this. But I was worried that our customers would start leaving us.

Over the weekend, I had written a simple PHP weblog for our customers to log sites they could not reach. I was beginning to number hundreds of sites and and the MySQL database was growing.

So the morning breakfast came and it was filled with Bosses and network specialists. The tier 1 ISP had sent VPs, PR people and a Technical Staff. They gave their canned speech then the techs started in on them.

"How is this gonna be fixed?"

"Are you gonna charge us? Will we get rebates until this is fixed?"

This went on for about an hour. Finally, our Executive Director rose and said,

"There will be no charge and they will fix this today or tomorrow."

My boss, who at this point knew all about the Bogons list, said,

"And for those that want to switch will receive new Class C IP addresses. Those that can't switch immediately will have their Class C addresses reserved when they are ready."

Over the next month everyone switched to the clean /20s and we lost no customers. We eventually straightened out all the billing problems with the upstream provider. Even with all that mess billing became the next problem.

Eventually the storm calmed. I returned to sleeping at nights once again.

As suggested in the comments by nasty1, a better bogons listing: Team Cymru

Tier 1 ISP: A Tier 1 ISP is a telco or Internet service provider IP network which connects to the rest of the Internet only via a practice known as peering.

Tier 2 ISP: A Tier 2 carrier (or Tier 2 ISP) is an Internet service provider who peers with other networks, but still pays for IP transit to reach some portion of the Internet.

Peering: Peering is the practice of voluntarily interconnecting distinctly separate data networks on the Internet, for the purposes of exchanging traffic between the customers of the peered networks. Peering is also known as settlement-free interconnection, which indicates that neither party pays the other for the traffic being exchanged. It is my understanding that depending on how Net Neutrality turns out, this may change the way you use the Internet, peering as it exists now may no longer exist.


