Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

Is crawl-66-249-71-82.googlebot.com Trying to Root Your PHP Board? (Abridged)

By MichaelCrotchford in Internet
Mon Jan 09, 2012 at 01:13:07 AM EST
Tags: crawl-66-249-71-82.googlebot.com, googlebot.com, crawl-66-249-71-82, Script Kiddies, The Russian Mob, Occasional Nigerian Sole Proprietor (all tags)

Hey guys,

I thought that elite hackers were using Google's servers to run a botnet. Get This: they figured out how to hack into Google's servers but they couldn't figure out how to spoof googlebot's user agent string!

Corey Haim pointed out that the URLs they were hitting were from the Domainit.com landing page, which Google had cached. He also pointed out that I'm a dipshit.

My bad!


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Related Links
o Google
o Corey Haim
o Also by MichaelCrotchford

Display: Sort:
Is crawl-66-249-71-82.googlebot.com Trying to Root Your PHP Board? (Abridged) | 9 comments (7 topical, 2 editorial, 0 hidden)
No. Corey had it all wrong (1.15 / 13) (#2)
by Zombie Jesus Christ on Sat Jan 07, 2012 at 03:58:20 PM EST

the php parking pages that Corey turned up at Google had different URLs than the message board page whose log entry I included in my story.

Mike Crawford for Clark County Commissioner
District 1 North County

Paid for by The Communard Party of Washington State

I had the gist of it right (3.00 / 11) (#4)
by Corey Haim on Sat Jan 07, 2012 at 05:45:14 PM EST

Namely, that it was an order of magnitude more likely that the problem was on your side rather than Google's;

that the Russian Mafia were almost certainly not involved;

that the use of Webmaster Tools would have quickly shown what Googlebot was trying to fetch (and why);

that Googlebot was still fetching deadlinks (the parking pages), despite your insistence otherwise ("It's not just following bad links. If that were the case, it would have given up by now");

that Googlebot does not always identify itself with a useragent string, since that string is widely abused by spammers.

Things I got dead wrong:

Assuming the world's greatest debugger wouldn't fuck up an Apache config, or would at least consider the possibility before fingering a cross-border underworld plot;

not having the necessary psychic skills to find out that you hadn't signed up to Webmaster Tools, (the Googlebot-debugging equivalent of gdb, and something which you'd expect someone priding themself on their SEO and webmaster skills to know about), information I should have picked on telepathically since you never once pointed it out, despite my repeating the suggestion to use WMT ad nauseum;

assuming that the problem was with softwareproblem.net, where you said it was, and not with some other domain.

If I was in your position, with a fucked up Apache config and wrongly thinking the problem was with softwareproblem.net, I would consider the most likely options first: perhaps the problem is on my side. Perhaps I should try this tool that shows my what the Googlebot is seeing. Perhaps I should do a few searches to find out why the world leaders in webcrawling might not be using a useragent string, even though I have this belief that they always should. Perhaps I should consider that I'm hosting a few domains on the one VPS, and think about problems I might have introduced there.

But no, that would take a little humility, and recognition of your own fallibility.

So instead, you act like a fucking idiot, and decide that Google has been infiltrated by Russian/Chinese/Eastern European mobsters.

Congratulations, dipshit.

[ Parent ]

I had not yet registered for Webmaster Tools (1.20 / 10) (#6)
by Zombie Jesus Christ on Sat Jan 07, 2012 at 06:58:19 PM EST

I had registered for Google Apps, which is an entirely different thing.  GoogleBot's onslaught started about forty-three minutes later.

I don't regard it as at all cool that GoogleBot doesn't always supply a User Agent, nor that many of the crawlers that visit my site don't have a reverse DNS name.

I'm not quite sure what I'll do about that, but it will be something along the lines of redirecting every single one of them to a certain, specific page, that points out how rude it is not to identify themselves when they come banging on my door.

Mike Crawford for Clark County Commissioner
District 1 North County

Paid for by The Communard Party of Washington State

[ Parent ]
why don't you send them a 10000 word essay (3.00 / 7) (#7)
by osm on Sat Jan 07, 2012 at 07:41:05 PM EST

about how their ignorant motherfucking user agent bullshit is contributing to the software problem. that will teach them.

[ Parent ]

I've explained why Googlebot doesn't always (3.00 / 6) (#8)
by Corey Haim on Sat Jan 07, 2012 at 08:33:03 PM EST


The useragent string was being massively abused by spammers and shady SEOs.

Anyway, it's obvious from the log entries that the bot was calling from Google's IP range. In 1996 it might have been useful to have a link to a readme, but nowadays, if you don't know what Googlebot is and find your way to its on-line help pages, you really shouldn't be in charge of a server.

And good luck redirecting crawlers: they'll immediately flag you as a spammer and send your pages to the bottom of the pile.

[ Parent ]

+FP (3.00 / 3) (#3)
by Del Griffith on Sat Jan 07, 2012 at 04:57:25 PM EST

to server as a warning to others of the hysteria surrounding the debugmeister.

I...I like me. My wife likes me. My customers like me. Because I'm the real article. What you see is what you get. - Me

this diary is nearly a week old (none / 0) (#9)
by nateo on Sat Jan 14, 2012 at 06:37:11 PM EST

but still very funny.

"I'm so gonna travel the world, photographing my dick at every location."
  - Vampire Zombie Abu Musab al Zarqawi
Is crawl-66-249-71-82.googlebot.com Trying to Root Your PHP Board? (Abridged) | 9 comments (7 topical, 2 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!