Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
RIAA Pit of Confusion

By salimfadhley in Culture
Fri May 16, 2003 at 10:13:06 PM EST
Tags: Humour (all tags)
Humour

After reading about the RIAA threatening to sue yet another innocent archive operator, I decided to take some direct action: It occurred to me that the RIAA keep falsely accusing others of piracy because they put their faith in an unintelligent spider - a fact which can be simply exploited to make my servers into an RIAA no-go-zone...


Whilst spidering is nothing to worry about (and only to be expected on a public site), the way the association fires off legal threats based on this spider results alone seems wrong. Since this spider does not actually look at the whole title of the file, or even it's content, I figured I could have some fun at their expense:

What if I could write a `tarpit' script that could create a large number of interlinked automatically generated web sites. If their spider tried to scan my server it would be fooled into thinking that it had found a treasure trove of MP3 sites. Anybody who took the time to look at the site could see that the site contains no pirate content at all.

How might the RIAA react to such a thing?

  • They could upgrade their spider so that it only recognises valid tracknames that are in-fact MP3s. (e.g. it would know that `elephant_wiggle-Madonna.mp3' is not a real Madonna song). This would limit their ability to detect only correctly named MP3 files, and force them to use their spider responsibly.
  • Every single suspect site would need to be hand-checked in order to verify that a genuine breach of copyright has taken place - this would substantially decrease the return on investment for their spidering project because it would be labour intensive, again forcing a more responsible approach to detecting offenders.
  • They could blacklist my server to prevent their spider from looking at it in future - that would be at least a small victory. If they blacklisted enough servers it would be the same as giving up!
  • They could send me a legal nastygram instructing me to disable my tarpit... Since I do not live in the USA, this might not be enforceable.

How it works

The Pit of Confusion is a pure PHP script that can automatically generate a very large number of web-sites with links to MP3s. It contains a settings file which contains lists of famous artist names and random words that can be used to make silly song titles. There is also a download manager component - designed to deliver MP3 files in the most inefficient possible way.

As with any web-site, the action starts with a URL. Normally, the first parts of the URL just signifies the server on which the site runs, however I have used a Dynamic DNS service to encode the two key site parameters into the hostname. I learnt that trick from this website. The first two parts of the domain name tell the script how to build the page: If you visit:

http://madonna.ricky.music.stodge.org

It will show you `Ricky's' Madonna page. The script does not know anything about Madonna or any of her songs - it just uses information provided at run-time to set up the basic variables. Anything in the form of a.b.music.stodge.org will get handled by the same server.

Notice how slowly the page loads - that is because there is a configurable `annoying delay' built into each transaction. Assuming that the spider system has a fixed maximum number of threads, it makes sense to tie these up for as long as possible - but not so long as to deter a person wishing to verify that there are no pirated files on the site.

Next it builds up a list of randomly named MP3 links that include the the chosen Artist's name in the title. If you try to click on the link, instead of delivering a pirated file it sends a non-copyrighted music file via a download manager that ensures that the download will take a very long time. The idea is to tie-up as many threads as possible on whatever system is doing the spidering.

Finally it makes some links to a selection of other random sites produced by the same system. The idea is to keep the spider in the tarpit for as long as possible

Notes

This is just my first attempt. No doubt, by now more talented scripters can see weaknesses in my plan - this is why I intend to share the source-code of my project with anybody who wants it. If you want to help out, please leave a message in this board and I will get back to ya!

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
What do you think of this idea?
o It will never work! 2%
o It's just plain stupid! 6%
o It's a million to one shot but it just might work. 10%
o I think you might have something there. 37%
o Quite good, but not good enough. 15%
o Inspired! 27%

Votes: 113
Results | Other Polls

Related Links
o simply exploited
o fires off legal threats
o this
o http://mad onna.ricky.music.stodge.org
o Also by salimfadhley


Display: Sort:
RIAA Pit of Confusion | 83 comments (67 topical, 16 editorial, 0 hidden)
This reminds me (4.57 / 7) (#1)
by ph317 on Fri May 16, 2003 at 04:56:34 PM EST


This reminds me of a trick I once saw implemented to stop spiders/crawlers in general dead in their tracks (and/or force spider admins to blacklist as you said).  Someone had modified either their webserver (or perhaps the scripts which dynamically generated every page on it... I never saw the code, just the site) such that every page had an innocent-looking link on it where the link name was XXXXXXXX (8 random alphanum characters), which linked to http://sitename.com/XXXXXXXX.  Once you followed that first random link, each of those generated pages had 5 links within them to other random strings, which led to pages with 5 links of other random strings, ad inifinitum.

Obviously, the basic method was that none of his valid page URLs were 8-character alphanumeric strings, so any request for a name formatted like that was responded to with a dynamically generated page with 5 links to the same.  This could be combined of course with slow-loading and slow tcp response and whatnot as well.  It's an infinite trap.

Of course there's probably less-obvious ways to conduct this - like using words from a dictionary file instead of random strings, etc....  The key thing in this idea to take with you is the idea of auto-generated infinite levels of recursion for the spider.

How about this? (3.00 / 2) (#28)
by mumble on Sat May 17, 2003 at 08:52:58 AM EST

Is this similar to what you want?

Enjoy.

-----
stats for a better tomorrow
bitcoin: 1GsfkeggHSqbcVGS3GSJnwaCu6FYwF73fR
"They must know I'm here. The half and half jug is missing" - MDC.
"I've grown weary of googling the solutions to my many problems" - MDC.
[ Parent ]

Sounds like (none / 0) (#41)
by Quila on Sun May 18, 2003 at 09:39:30 AM EST

Porn rings. Is this where he got the idea?

[ Parent ]
Yeah (none / 0) (#42)
by ucblockhead on Sun May 18, 2003 at 02:34:55 PM EST

It's not that hard to do. I've got a page like that on my site. I'm not going to say where, because I log all the ips that hit it. The page is a cgi script that intentionally takes a long time to load, and then puts up links that look different, but actually link back to the same page.

See this code.
-----------------------
This is k5. We're all tools - duxup
[ Parent ]

Won't work as is. (4.00 / 2) (#3)
by i on Fri May 16, 2003 at 05:02:57 PM EST

They won't need to modify their spider to recognise valid mp3s. They will only need to weed out spoof pages like yours, which is a hell of a lot easier.

You need a smarter page generator tht can fool a robot but not a human reader. You also need lots of (second level?) domains and IP addresses, lest they start filtering you out by these.

and we have a contradicton according to our assumptions and the factor theorem

Missing the point (none / 0) (#4)
by salimfadhley on Fri May 16, 2003 at 05:17:21 PM EST

Can you be clearer about what exactly will not work? The RIAA use the same DNS as we do, so it is unlikely that will be a problem - if they filter out my domain name or my IP address then it is fine by me. The whole point of this is to keep the RIAA out of my server.

The cool thing about the way this is programmed is that it is REALLY easy to re-skin. The templates are stored in plain HTML files that can be replaced.

If others start running their own tarpits, thats even better - but are the RIAA going to turn a blind eye to blacklisted sites? If so we win... if not then they spend expensive human effort on monitoring these sites

[ Parent ]

Answars. (none / 0) (#6)
by i on Fri May 16, 2003 at 05:40:10 PM EST

  • If this is your goal, then yes, they won't be able to ignore your domain or IP. They will still be able to ignore just the pit pages, if they can recognise them.
  • You need to reskin faster than they can teach their bots to recognise your templates.
  • No, RIAA is going to harass the hell out of you.


and we have a contradicton according to our assumptions and the factor theorem

[ Parent ]
silly? (4.00 / 5) (#7)
by jt on Fri May 16, 2003 at 06:02:15 PM EST

random words that can be used to make silly song titles

Those song titles are great!

No kidding (5.00 / 1) (#24)
by Kaki Nix Sain on Sat May 17, 2003 at 01:28:22 AM EST

I've been hitting reload for a few minutes now. I get at least a small chuckle each time, thinking about what some of these songs would be about.

  • tarantula-swolen-tits-stodge-knife.mp3
  • woogie-retro-cumberland-dispair.mp3
  • zeitgeist-boogaloo.mp3
  • doobie-tits.mp3
  • park-brains-woman.mp3
  • queen-lemon-churchill.mp3
  • llama-jesus-clever-time-digger.mp3


[ Parent ]
+1 fp (4.66 / 6) (#8)
by circletimessquare on Fri May 16, 2003 at 06:24:25 PM EST

what the fuck do you think you're doing?

http://thesmokinggun.com/archive/madonnasplash1.html

The tigers of wrath are wiser than the horses of instruction.

I've always wondered (4.62 / 8) (#10)
by mwalker on Fri May 16, 2003 at 06:58:09 PM EST

Do the RIAA's little hell hounds obey robots.txt? What User-Agent do they present when they knock at your door?

This is an interesting little arena of non-legality. I do not believe in copying copyrighted materials - I don't listen to 'pirated' material. That said, I'm not sure I like the idea of corporate police forces scanning the Internet, pretending to be the law.

If the FBI had a web spider, well, that would be one thing. It's behaviour could be queried and regulated, to some extent. But the contractors employed by the RIAA... they make their own rules.

There's a long history of accidentally-accused and over-accused companies freaking out when the RIAA robot fires a letter at them. There are no consequences. Our company was accused of sharing files from an IP address which wasn't even routable; the mistake cost us hundreds of man hours of legal department encouraged employee shakedowns.

The good news is, it's not illegal in any way to fuck with corporate cyber-cop wannabees.

If I had a feature request (and I do) it would be to ensure that every page load contains different false mp3's, like a fingerprint, and to log this fingerprint. Then when you get the legal notice, you can read which mp3's they want removed, go back to your logs, and find their source IP. Then you can publish it, and others can firewall their IP block, or target it for misinformation scripts like yours.


You just (3.25 / 3) (#12)
by i on Fri May 16, 2003 at 07:36:10 PM EST

hide behind a flash intro, with no real links on the front page.

/me runs for the hills

and we have a contradicton according to our assumptions and the factor theorem

[ Parent ]

Or perhaps even (4.00 / 1) (#21)
by mwalker on Fri May 16, 2003 at 10:30:25 PM EST

Press the (random color here) button to receive a valid session cookie.

Interesting idea.

[ Parent ]

Forget the logs (none / 0) (#82)
by DJ Starlyte on Tue May 27, 2003 at 10:49:33 PM EST

"find their source IP. Then you can publish it, and others can firewall their IP block"

It's allready been done. Downlaod PeerGuardian.

[ Parent ]
That's kind of a cool idea (5.00 / 1) (#16)
by morkeleb on Fri May 16, 2003 at 09:35:14 PM EST

+1 section page - I'm really suprised I haven't heard of other hackers who have thought about throwing a wrench into the monkey works that way (or maybe they have - it just hasn't been covered).

Anyway....it's a nice thread of an idea that could lead to some new and creative ways to fuck their shit up. Start a project on Source Forge as soon as possible!

BTW - does anyone know what kind of spider program they are using to check links?
"If I read a book and it makes my whole body so cold no fire can ever warm me, I know that is poetry." - Emily Dickinson
Media Enforcer is one (none / 0) (#20)
by mwalker on Fri May 16, 2003 at 10:28:23 PM EST

http://www.mediaenforcer.com/

Owned by BayTSP, the long arm of... somebody. Google BayTSP for more.

[ Parent ]

The RIAA sucks (4.00 / 8) (#17)
by BankofNigeria ATM on Fri May 16, 2003 at 09:45:41 PM EST

     I am against piracy, but I am also against the RIAA for their continual production of shitty bands and stupidity.  That being said, anti-piracy technologies as you and many others have pointed out contain many flaws, and should not be used, due to the great potential that it could harm innocents.  I personally wouldn't mind if the RIAA were nuked off the face of the Earth, for the good of the culture.  Why?  To me, the majority of the music they sponsor is not art, it contributes nothing of value to culture, except maybe a warning of what to avoid.  If it were Ancient Greece, rest assured, if the RIAA were a controversial philosopher, he would be given hemlock for corruption of youth and speaking against the gods of all that is holy, that is, good music.

FOR A GOOD TIME, AIM ME AT: Nigerian ATM

Not purely a spider. (4.00 / 1) (#18)
by j1mmy on Fri May 16, 2003 at 10:04:37 PM EST

If I'm not mistaken, the RIAA uses the spider to find potential abusers, which are then checked out by humans. Sometimes, things go wrong, though.

Checked out by humans? (5.00 / 4) (#23)
by FlipFlop on Fri May 16, 2003 at 11:53:34 PM EST

I wonder if the RIAA even reads the notices before sending them out. This one says:

This site...offers approximately 0 sound files for download. Many of these files contain recordings owned by our member companies, including songs by such artists as Creed. We have a good faith belief that the above-described activity is not authorized

The standard for good faith must not be very high.

AdTI - The think tank that didn't
[ Parent ]

An interesting letter... (none / 0) (#70)
by djeaux on Tue May 20, 2003 at 03:36:01 PM EST

... requesting that a hosting service remove a website that may be infringeing on copyrights.

I'll tell you what will happen the day a policeman attempts to write me a ticket because I may have run a stop sign: He's about to be a formerly employed policeman.

And folks, the RIAA is not law enforcement. They are a private organization which should have no more jack than any other private organization, say, the Boy Scouts. Take that back -- the Boy Scouts should have a LOT more clout than the RIAA.

djeaux

djeaux
"Obviously, I'm not an IBM computer any more than I'm an ashtray." (Bob Dylan)
[ Parent ]

WebPoison (4.50 / 2) (#19)
by IHCOYC on Fri May 16, 2003 at 10:15:47 PM EST

This script seems to be quite similar to the old WebPoison script, which was made to confound email harvesting spiders by serving up a constant stream of bogus or embarrasing links and email addresses.
 --
The color is black, the material is leather, the seduction is beauty, the justification is honesty, the aim is ecstasy, the fantasy is death.
I'm going to turn you in! (2.75 / 4) (#22)
by tiamat on Fri May 16, 2003 at 11:13:44 PM EST

Well, I keep trying to submit your site @ http://www.riaa.org/Protect-Report.cfm , but I can't even get the page to load right now.

First time that the RIAA's site being slow has ever pissed me off. Oh, the irony*.

*It's only ironic if you assume, as I do, that the site is slow because people are doing something nasty to it.

Turn me in too! (none / 0) (#34)
by mcgrew on Sat May 17, 2003 at 12:19:16 PM EST

I need the money, and teh RIAA have DEEP pockets. Whoever rated that post down, dude, turning anyone in to the RIAA for a tarpit is doing them a favor. This tarpit is entirely legal.

"The entire neocon movement is dedicated to revoking mcgrew's posting priviliges. This is why we went to war with Iraq." -LilDebbie
[ Parent ]

I'd very much like to see a copy of the script... (4.00 / 1) (#25)
by alizard on Sat May 17, 2003 at 03:23:27 AM EST

Is this sufficient?

Disclaimer: I am not only not employed by or a contractor for any major record label or the RIAA / MPAA / any other content provider organizations, but would very much like to see the people running the labels and the content provider organizations behind bars. Or worse.
"The horse is dead. Fuck it or walk away, but stop beating it." Juan Rico

Apache is not a good choice for this (4.75 / 4) (#26)
by chushin on Sat May 17, 2003 at 07:43:34 AM EST

Your server will run out of RAM before the spider runs out of threads. Apache's design is reasonably efficient for sending out files as fast as possible, but horribly inefficient when connections are held open for a long time. In addition, Apache has a very low connection limit (256 if compiled with default settings).

A multi-threaded server would be better, but the ideal architecture for something like this is event-based IO (what are commonly called single-process servers), as that gives the lowest per-connection overhead. Unfortunately, PHP's design is incompatible with that kind of server. You'd have to program in another language. The key point is, you only win if your per-connection overhead is lower than that of the spider.

Incidentally, there's a bug in your download manager. It returns headers:

Mime: audio/mpeg
Content-Type: text/html

That should be Content-Type: audio/mpeg.

While there are certainly plenty of stupid spiders out there, it's worth noting that any reasonably smart spider will use a breadth-first search, and limit how deeply it recurses, and hence will never be more than mildly inconvenienced by something like this.

Multithreading in Apache (5.00 / 3) (#29)
by greenrd on Sat May 17, 2003 at 10:39:05 AM EST

A multi-threaded server would be better

Apache 2.0 is a multi-threaded server. You're giving the impression that Apache still doesn't have threading support. This is simply not true.


"Capitalism is the absurd belief that the worst of men, for the worst of reasons, will somehow work for the benefit of us all." -- John Maynard Keynes
[ Parent ]

RIAA doing a good job (1.91 / 12) (#27)
by Ta bu shi da yu on Sat May 17, 2003 at 08:33:27 AM EST

We should congratulate the RIAA for their prompt action in finding and removing illegal copyright information.

For those who are using the illegal Kazaa and Grokster - shame on you! These networks are hotbeds of illegal activity, with up to 95% of users downloading music files which they do not own. Not only are you are taking up valuable Internet bandwidth, but you are stopping artists from making their living. It's commonly known that illegal file swapping has reduced the growth of the recording industries sales. When you reduce these sales, you reduce the number of artists in the market, because how would they receive any money? no money = no music artists. You do the math.

Instead of getting in the RIAA's way, we should be responsible citzens and write to your congressmen applauding the responsible behaviour of the RIAA. To those obstructing the RIAA in their necessary work, please try remember that the DMCA and copyright laws were put there to protect you, because what you are doing now is un-American!

---
AdTIה"the think tank that didn't".
ה

offtopic... (none / 0) (#30)
by theperfectelement on Sat May 17, 2003 at 10:45:57 AM EST

is your username supposed to be "he is not a big fish?" its hard to tell without the pinyin tone markers.

[ Parent ]
Dui (none / 0) (#31)
by bigchris on Sat May 17, 2003 at 10:48:28 AM EST



---
I Hate Jesus: -1: Bible thumper
kpaul: YAAT. YHL. HAND. btw, YAHWEH wins ;) [mt]
[ Parent ]
Very clever (none / 0) (#32)
by Ta bu shi da yu on Sat May 17, 2003 at 11:42:24 AM EST

Yes, bigchris is correct.

---
AdTIה"the think tank that didn't".
ה
[ Parent ]
I rated theis troll a zero (4.25 / 4) (#33)
by mcgrew on Sat May 17, 2003 at 12:16:07 PM EST

This is the most irresponsible post I have ever seen. There are trolls, and there are dangerous trolls. Kazaa is for PROMOTING INDIES. Like the band I saw last night, Lost Boys, who are working on their first CD and will have MP3s available for you to download and share.

I normally don't rate anyone down, just up, but in this case I felt I had to.

"The entire neocon movement is dedicated to revoking mcgrew's posting priviliges. This is why we went to war with Iraq." -LilDebbie
[ Parent ]

Shocked by your obvious censorship (3.75 / 4) (#39)
by Ta bu shi da yu on Sat May 17, 2003 at 10:47:22 PM EST

I cannot believe that you rated me a zero. We all know that the "Lost Boys", a band who make second rate music, are supporters of this nefarious Kazaa. While real artists like Michael Jackson and Britney Spears are out there, working hard on their music, bands like the Lost Boys are damaging the very fabric of the infrastructure that is keeping them going.

Your blatant censorship of my opinion shows that you don't understand the damage that is being done here. I must ask you to refrain from keeping my voice from being heard, because you are only delaying the inevitable!

Your humbly,
Ta bù shì dà yú

---
AdTIה"the think tank that didn't".
ה
[ Parent ]

Now THAT comment... (5.00 / 1) (#61)
by mcgrew on Mon May 19, 2003 at 08:40:27 PM EST

was a MUCH better troll. I actually laughed, and gave you a 5 for it. It had a 1...

"The entire neocon movement is dedicated to revoking mcgrew's posting priviliges. This is why we went to war with Iraq." -LilDebbie
[ Parent ]

I wish I'd seen this in voting (1.00 / 4) (#36)
by mcgrew on Sat May 17, 2003 at 12:43:48 PM EST

I would have given it +1FP.

"The entire neocon movement is dedicated to revoking mcgrew's posting priviliges. This is why we went to war with Iraq." -LilDebbie

So... if they sent you a notice... (3.00 / 1) (#38)
by Elkor on Sat May 17, 2003 at 04:34:17 PM EST

Could you counter notice them with an intent to file harassment charges in small claims court?

If I am not mistaken, small claims court doesn't allow lawyers, so whatever president/ceo if the RIAA would have to show up themselves.

Not sure what retaliatory actions they could make against you, but it might be amusing to find out, if you aren't worried about litigation.

Regards,
Elkor


"I won't tell you how to love God if you don't tell me how to love myself."
-Margo Eve
That's not how it works. (4.00 / 2) (#44)
by vectro on Sun May 18, 2003 at 05:54:57 PM EST

When you sue a large company in small claims court, they will simply not show up. The judge decides in your favor. Then they appeal to the superior court, and break out the lawyers. It is at this point that you are approximately screwed.

“The problem with that definition is just that it's bullshit.” -- localroger
[ Parent ]
Except if you live somewhere sane.. (none / 0) (#52)
by Eivind on Mon May 19, 2003 at 07:45:05 AM EST

In some parts of the world you're not screwed in this scenario. For example in Scandinavia, you automatically get your legal expenses paid for by the state (and in most cases reclaimed from the loosing "big firm" in cases where:)
  • One of the sides is human person, i.e. not a corporate entity.
  • This side won in small claims court.
  • Other side appealed.
This is to stop exactly the abuse you cite above from ocuring.

[ Parent ]
Well, snickerfritz (none / 0) (#64)
by Elkor on Tue May 20, 2003 at 11:25:39 AM EST

So much for that idea.

I'm surprised that small claims can be appealed to anything other than small claims.

But then, this is America. Things don't have to make sense.

Regards,
Elkor


"I won't tell you how to love God if you don't tell me how to love myself."
-Margo Eve
[ Parent ]
Song titles are also copyrighted... (3.00 / 1) (#40)
by polyglot on Sun May 18, 2003 at 04:48:34 AM EST

But I think we're still a little way off from "RIAA sues salimfadhley for discussing music; writing down track titles no longer considered fair use".
--
"There is no God and Dirac is his prophet"
     -- Wolfgang Pauli
‮־
Are they? (3.00 / 1) (#47)
by godix on Sun May 18, 2003 at 09:59:19 PM EST

I was always under the impression that titles are NOT copywrited, only the works themselves are. I know it works that way for books and I think it works that way for movies. I had just assumed the song titles aren't copywritted either, but the RIAA made sure popular titles don't get recycled to avoid confusion. If I'm right this is one of the very few advatages of the RIAA's near monopoly on music distribution.


"A disobedient dog is almost as bad as a disobedient girlfriend or wife."
- A Proud American
[ Parent ]
Artist names can be trademarked (none / 0) (#67)
by ebonkyre on Tue May 20, 2003 at 01:43:22 PM EST

Witness "Billy Joel™" - his name is trademarked, not once, but 3 times:

The truth hurts sometimes... Nothing beats a nice fat cock. ShiftyStoner
[ Parent ]
Trademark != Copyright (none / 0) (#75)
by igor on Wed May 21, 2003 at 04:28:19 PM EST

Song titles/artist names can certainly be trademarked, but not copyrighted.

Therefore unauthorized copying and distribution of the term "Madonna" is not illegal, but claiming that this post was originally written by Madonna would be.

-jeff


[ Parent ]

Tarpits (3.50 / 2) (#43)
by ucblockhead on Sun May 18, 2003 at 02:42:15 PM EST

"They could send me a legal nastygram instructing me to disable my tarpit..."
Having a tarpit is not illegal.
-----------------------
This is k5. We're all tools - duxup
True (none / 0) (#53)
by mwalker on Mon May 19, 2003 at 07:58:55 AM EST

And neither is sending out a legal nastygram claiming that your legal tarpit is infringing their copyright. Their lawyers are a sunk cost; it costs them nothing to harass you even if they have no case.

And sometimes, it works.

[ Parent ]

Ramp up the cost (5.00 / 2) (#63)
by squigly on Tue May 20, 2003 at 05:23:33 AM EST

Demand a signature

Say they're harrasing you and demand an apology.

If you get a second letter, tell them they're committing perjury.

[ Parent ]

If you get a second letter... (none / 0) (#69)
by djeaux on Tue May 20, 2003 at 03:29:29 PM EST

... inform your state attorney general that they're harassing you. THEN it will begin to cost them money.

djeaux

djeaux
"Obviously, I'm not an IBM computer any more than I'm an ashtray." (Bob Dylan)
[ Parent ]

email traps (5.00 / 1) (#45)
by codemonkey_uk on Sun May 18, 2003 at 06:05:34 PM EST

So this basically the same as the 'mailing list' that groups such as the ACCU haves on their server to trap the email-address harvisting robots spammers use, but with '.mp3' hrefs rather than mailtos.
---
Thad
"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell
Nice Idea! (4.66 / 3) (#46)
by S_hane on Sun May 18, 2003 at 09:22:05 PM EST

But it can be taken further. What would be really cool is a script that:

  • generates random pages (much as you currently have), AND
  • generates random links to other sites running the script.
The script could find these sites from a central repository; or some form of p2p network sharing active sites could be established. Random pages within these sites should be linked to, rather than the entrance to the tar pit.

Ideally, the script should be written to minimise resource consumption - the idea would be that _any_ server could download and run it without too much of a performance hit. Hopefully this would encourage a large number of legitimate server operators to download and run a copy.

Each site that runs the tar pit should give it a unique name so that the RIAA can't simply filter based on names. The entrance (and anything else that the site wants to hide from the RIAA) should be entered into robots.txt,so that it isn't indexed by webcrawlers that play by the rules.

The RIAA would then have one of 3 choices:

  • Leave things as they are, and waste an incredible amount of bandwidth on a very large number of interlinked sites;
  • Filter out anything in robots.txt (we win!); or
  • Hook into the p2p network / central repository, and blacklist anything on it (we win again!)
    -Shane


Multi-siting (3.00 / 1) (#49)
by Quila on Mon May 19, 2003 at 04:25:11 AM EST

"or some form of p2p network sharing active sites could be established. " That's the way to do it. Use some P2P method of finding at least one other node. Then for a list of sites, link out to a link in the other site, going a random number of site links deep. This would help randomize the IP addresses of the tarpit links, making it harder to block them.

[ Parent ]
Maybe I am a moron, but... (4.50 / 2) (#48)
by LuYu on Mon May 19, 2003 at 02:39:17 AM EST

Where is the script?

I love the idea of this. If you distribute this script to a lot of people, it would make mp3 bots totally useless for the RIAA. Since they do not really care who they accuse, anyway, this is the perfect tool to keep them from abusing the DMCA. You need to distribute this script as widely as possible.

Maybe you should post it on sf.net.

Anyway, please post the script somewhere so that I and other people can get it.



----------

"I will believe you are not an animal when you do not eat, sleep, urinate, or defecate for one month."

Refining the purpose (3.00 / 1) (#50)
by Quila on Mon May 19, 2003 at 04:28:36 AM EST

What exactly is the purpose here?
  • Slow down their crawlers for harrassment so their search for illegal MP3s is hindered?
  • Get them to stop crawling your servers so that once blacklisted you could put up illegal MP3s?
  • Or just because you don't like the RIAA on your servers?
  • Teach them a lesson about making sure a human has proofed takedown notices?
Or is it a mix?

Google (none / 0) (#51)
by codemonkey_uk on Mon May 19, 2003 at 05:14:08 AM EST

So what's to stop this also having a detrimental effect on google, and the other search engines, and reducing the searchability of the internet for everyone?
---
Thad
"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell
robots.txt (5.00 / 4) (#54)
by nstenz on Mon May 19, 2003 at 09:41:40 AM EST

The idea is to put a list of the tarpit pages in your /robots.txt so proper spiders such as Google ignore them (as they're supposed to). Any spider that ignores robots.txt deserves to have its resources wasted on a futile search.

[ Parent ]
Spammers (none / 0) (#58)
by ucblockhead on Mon May 19, 2003 at 12:15:49 PM EST

Some spammers use spiders to read robots.txt in order to deliberately find areas that spiders are not suppose to read.
-----------------------
This is k5. We're all tools - duxup
[ Parent ]
Tarpit the spammers, too (none / 0) (#68)
by djeaux on Tue May 20, 2003 at 03:26:10 PM EST

Add a little code so that there are large lists of bogus email addresses, ensuring the spammers get the maximum number of bounces.

djeaux

djeaux
"Obviously, I'm not an IBM computer any more than I'm an ashtray." (Bob Dylan)
[ Parent ]

your sig (none / 0) (#80)
by vmarks on Thu May 22, 2003 at 10:03:40 AM EST

your sig came from a Feb 1966 interview with Playboy.
Just sayin'.

[ Parent ]
Re: (none / 0) (#79)
by PoolSnoopy on Thu May 22, 2003 at 09:07:34 AM EST

As mentioned in another comment robots.txt should do. By the way: the really time consuming bit is the download if the mp3. The google bot would never start this download.

[ Parent ]
gimmie code (none / 0) (#62)
by gnovos on Tue May 20, 2003 at 02:27:24 AM EST

Hand over the code here : salimfadhley (at) chipped (dot) net

A Haiku: "fuck you fuck you fuck/you fuck you fuck you fuck you/fuck you fuck you snow" - JChen
An idea (none / 0) (#71)
by grzebo on Tue May 20, 2003 at 04:17:53 PM EST

Why not use p2p or some p2p indexing system (like filedonkey.com) to get some real names of the songs and filesizes? And ID tag the fake files as well. Why not extend the whole thing to movies while you're at it?


"My God, shouts man to Himself,
have mercy on me, enlighten me"...
extension ideas (none / 0) (#72)
by alexr on Wed May 21, 2003 at 02:12:37 PM EST

I had considered a similar thing when I read Mark Pilgrim's accounts of how often the RIAA spider was hitting his site.

The primary extension idea I had was to feed the spider a click-through license that acknowledged that it was in violation of the posted terms of service for your site (by ignoring robots.txt) and agreed to pay a per-download fee for any downloads on the following pages.

The logs could then be correlated between the spider and the follow-up human and the impending lawsuit might have some teeth when you're asking them to pay their bill.

great idea - plz send script (none / 0) (#73)
by MrSpock on Wed May 21, 2003 at 02:48:54 PM EST

Hi - I think its a great idea to annoy the f...ing MI. I'd be quite interested to have a look at the script. I'm currently setting up a new homepage for my lady using PHP which will soon go online. So maybe I could let it run there. Please send it to mrspock99@hotmail.com
People who are willing to sacrifice essential freedoms for security deserve neither freedom nor security. Benjamin Franklin
Summary of suggestions (5.00 / 1) (#74)
by pla on Wed May 21, 2003 at 04:04:50 PM EST

1) Include email addresses to trap two birds (spammers as well as RIAA) with one tarpit.

2) Include far more links per page (a few dozen, perhaps), as most spiders have a fairly shallow search depth limit.

3) Fix your mime headers.

4) Make the page more random... I would suggest starting with a sort of "style gallery" that included, at a minimum, a number of common playlist formats (most "real" MP3 sites on the web seem to just use something like a WinAmp playlist with links to the actual songs on the server). From those base styles, randomize just about every aspect of the page you can think of... HTML version, background colors and image names, font colors, names, and sizes, whitespace use, etc.

5) Include the ability to have cross-tarpit links, to allow others to participate, thus both making the tarpit harder to blacklist, as well as taking some of the load off your own server.

6) Include some web-scraping ability to keep the list of artists (and possibly song names) current and accurate (and not automatically filterable). If not outright web-scraping, pehaps it could get such info (as well as "peer" tarpits to generate links to) from some central (but easily changed) repository of such info).

7) Share the script! I'd run this on at least four machines... Just with the participation of a fraction of the K5 community, we could probably make the RIAA's spiders completely useless.


my precious *grin* (none / 0) (#76)
by censor on Wed May 21, 2003 at 05:12:45 PM EST

finally! could u pleaze send the script to you-AT-faked-us , or post it on sourceforge or freshmeat.net or somewhere else? that scripts is too cool to keep it secret...

Cool Idea (none / 0) (#77)
by silkhood on Thu May 22, 2003 at 03:24:04 AM EST

Please send me the Script. I think I got an idea to make the pages more dynamic and actuall.

Please send me the source (none / 0) (#78)
by PoolSnoopy on Thu May 22, 2003 at 09:05:46 AM EST

Please send me the source! I'd really like to install this on my server. My email is tlatzelsberger [-at-] gmx.at Thanx a lot for that great idea!

Please send source (none / 0) (#81)
by dpreviti on Fri May 23, 2003 at 10:11:13 AM EST

Great idea, I'd love to help. Can this be used on a Win 2000 as well as a unix box? If so please send me the source I'd love to put it on my server. dpreviti72-at-yahoo-dot-com DP

The Source (none / 0) (#83)
by Blackbird on Fri Jun 20, 2003 at 12:17:59 AM EST

Greetings. Your idea sounds like a great one. I really like it. Could I please get the source? My e-mail is blackbird0@yahoo.com

RIAA Pit of Confusion | 83 comments (67 topical, 16 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!