(1st topical post! take that, PD)
I assist with several sites for organizations who take (in context) unpopular political positions at times. I noticed eWatch, oh, I guess over a year ago, as a rather impolite web spider that also had a completely deceptive User-Agent string. They don't take any masking actions with the client DNS, so it wasn't too long before I started looking through their website.
They were reading the whole site, daily... I was developing web log summarizers at the time, and I noticed them through the spike in a particular user-agent string ("Mozilla/4.04 [en] (Win95; I)" if that's all the user-agent you get, you know it's them, oddly enough it's now virtually unique). Though all the propaganda on their site is technically legitimate, the fact that they go to some lengths to disguise their activity says something about them.
The primitive behavior of their software doesn't reflect well on them either. They've taken the aformentioned non-effective method of hiding the fact that they're watching a site, but apparently given little thought to watching politely. They load every page on my small, static site just under once a day on average, with never a query for robots.txt, never an "If-Modified-Since", so we don't have to send them the couple of big files we do host here, that never change.
At these prices, they can afford to waste money on bandwidth; that server is on an ISDN line and we have trouble with that bill. If a one time research project costs $5k; what does this kind of service cost? Wouldn't it be cheaper for them to arrange for me to send them a .tgz every night? I could set up a cron job. Should we, as a target of their service, have any right to to know why they're monitoring us so closely, and to what use that data is put?
I'd like to know, but I don't care enough to justify devoting the time it'd take to make much noise about it. If they let their spider loose on the sites where I'd care about its uncivil attitude, I'll block their subnets at the router.
(OK, two more editorial notes: "topical" should be the defualt comment posting mode, and k5 still lacks one essential ingeridiment for the noise to drown the signal: karma.)