WhizBang … as my jaw hits the floor ….

These blaggards have just ruined all my anticipated fun. There is no way in hell I ever anticipated a response from them. I was looking forward to emailing them daily, with the missives getting progressively more off-the-wall and tongue-in-cheek. I was going to eventually collect them all, create a book, sell it to a top-line publisher, stay on the NY Times best-seller list for 26 weeks, negotiate the movie rights to Peter Jackson, and then retire on my earnings. Then this idiot at WhizBang has to go and answer my e-mail. Twice. I’d forgotten he had two accounts of mine to respond to, and only checked the usual account this morning.

Dear *********:

The company I work for, Whizbang Labs, does extract information from the

web, as you already know. We crawl millions of web sites a week and

extract different pieces of information for different customers. In

your case, your site is in a list for company information extraction.

This simply involves crawling a site and figuring out the name of the

company owning the site, as well as contact information such as mail

address or phone number. We often have to crawl many pages before

finding this information, and so we take the shotgun approach of

crawling more pages up front than we may need to. We also don’t know

(without looking) what sites have actual companies behind them or not.

In your case I would imagine that we extracted the name as deardiary.net

and didn’t find any contact information other than an email address.

Our largest customer for this information is Dun & Bradstreet, who are a

leader in the business of deciding who’s a company and collecting

information about them.

I can assure you that we have not targetted any information that is

particular to your site, other than who you are. Specifically, we have

not logged and extracted any information from the diary posts of your

customers. It is not in our best interest to target pieces of

information that are that specific, we only go after information that

can be found across thousands or millions of web sites.

I noticed that your site does not have a /robots.txt file. This is a

simple text file that tells robots where they shouldn’t go. For

example, if your /robots.txt file contained:

User-agent: *

Disallow: /show

This would tell all robots to not visit URLs that have ‘/show’ at the

start of their path.

While we may have hit you quite a few times over the course of a few

days, it is our intention to temper the crawl so that it doesn’t pound

your site to quickly. Please let me know if that is not the case. If

you have other questions or concerns, feel free to respond to me

directly.

Matt Jacobsen

Software Engineer

Whizbang Labs


And:

Hello –

I responded to this message yesterday and sent it to your

‘*************’ account. If you didn’t receive the message please let

me know. In short, we do extract information from the web, but our

current projects involve extracting highly general information from

millions of websites. Things like company name and company contact

information. We did not target your site for anything specific to it

(i.e. the journal entries). Rest assured that we have not logged or

extracted any of that information, nor are we selling it to others. We

visit millions of sites per week and sometimes crawl quite deep before

stopping on a site. I hope that we did not hit the site too quickly.

To answer your question, we did not stop the hits due to your request,

it was just a matter of circumstance. Did it appear that we visited the

same pages over and over, or just a lot of pages on your site? If it is

the former, it could be that your site is somehow in a test list and I

will do what I can to remove it. Otherwise, you probably won’t be hit

by our crawler for some time.

Let me know if there are other questions or comments.

Matt Jacobsen

Software Engineer

Whizbang Labs

Of course, I have no way of finding out what page(s) the spider hit, but I have to believe it was more than one. Which means I can’t even grip about being on a test list and demand I be removed immediately. I am very curious why site meter doesn’t register these hits though. Maybe I’ll shoot them a quick email and ask.

In somewhat unrelated news, I took the Internet Junkie test. (Hey, it’s been a while since I did the useless quiz thing.)




Are you Addicted to the Internet?

62%


Hardcore Junkie (61% – 80%)

While you do get a bit of sleep every night and sometimes leave the house, you spend as much time as you can online. You usually have a browser, chat clients, server consoles, and your email on auto check open at all times. Phone? What’s that? You plan your social events by contacting your friends online. Just be careful you don’t get a repetitive wrist injury…





The Are you Addicted to the Internet? Quiz at Stvlive.com!






Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *