Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

Yahoo's "Anti-Scripting" Filters Examined

By paine in the ass in Internet
Tue Jul 16, 2002 at 07:42:47 AM EST
Tags: Technology (all tags)

It's being reported that Yahoo's free email service is now changing certain words in email messages, supposedly to stop "cross-site scripting attacks". A good source of information is this article at Need To Know, linked to by (among others) Slashdot, where, predictably, discussion has sprung up about the ethical implications of unseen filtering of personal correspondence (a quick search through Yahoo's help pages turned up no information about this filtering that I could see). What's lacking in the articles I've seen, though, is an examination of just how this filtering is working, so I've spent a little bit of time fiddling with it to find out exactly what's going on.

Sponsor: rusty
This space intentionally left blank
...because it's waiting for your ad. So why are you still reading this? Come on, get going. Read the story, and then get an ad. Alright stop it. I'm not going to say anything else. Now you're just being silly. STOP LOOKING AT ME! I'm done!
comments (24)
active | buy ad
First, I tried to figure out what sorts of messages will trigger the filter; I logged in to an old Yahoo account that I've had for a while, and began sending myself test messages. Each one went both to the Yahoo account and to an external account to see if Yahoo was filtering both incoming and outgoing messages, or only incoming.

The filter didn't require that HTML tags be present in the message; simply checking the "Allow HTML tags" box on the Yahoo mail composer page was enough to cause Yahoo to filter. A message containing the simple text "medieval expression mocha" was received as "medireview statement espresso". However, simply sending a message with HTML markup in it would not trigger the filter - Yahoo apparently only recognizes the HTML if the message is explicitly sent as such, and renders tags as plain text otherwise.

Next I looked at whether the filter is case-sensitive, and whether it changes strings regardless of where they appear. I obtained a list of filtered strings from this page and then put each through a "grep string /usr/share/dict" to get all possible "normal" occurrences. Note that the strings being blocked include both terms such as "eval" and "javascript" which might appear in the body of a malicious script, and tags like IFRAME and OBJECT which could be used to embed or access such scripts (obviously, the SCRIPT tag itself is filtered, too). Here's the text of the HTML message I used to test the word filtering:

<font face="Verdana" color="#336699" size="-1">






The message was, as previously mentioned, sent to both the Yahoo account and to an account in another domain, and the version sent to Yahoo was the only one altered. It appeared as the following (altered words in bold):
evaluate evaluated evaluates evaluating evaluation evaluations evaluative evaluator evaluators medireview prevalence prevalent prevalently primreview reevaluate reevaluated reevaluates reevaluating reevaluation retrireview retrievals unevaluated espresso statement expressions substatement subexpressions java-script java-script j-script vb-script live-script
As you can see, the filter isn't case-sensitive (it changed both "javascript" and "JavaScript"), but it does seem to only change the filtered strings when they appear at the end of a word; hence "prevalently" is unaltered while "primeval" becomes "primreview". Note that this contradicts the NTK article, which claims that "evaluate" will be altered. This got my curiosity up, so I checked whether the filter is fooled by alternative spacing, with the message
<font face="Verdana" color="#336699" size="-1">m e d i e v a l m o c h a e x p r e s s i o n</font>
The message was unaltered; thus, spacing changes can get around the filter if you really need to use a particular word in an HTML email. However, substituting HTML character entities (such as #&97; in place of the letter "a") for letters did not fool the filter; it still changed the strings, regardless of whether they contained letters or HTML entities.

I then tested the list of HTML tags supposedly filtered; I composed a message consisting of all of them (the list of tags was obtained from the same source as list of filtered words):

<link rel="stylesheet" href="nonexistent.css">
<script>document.write("Will print text if the script works"); //Testing</script>
<object data="nonexistent.mov" type="video/quicktime" alt="Test" title="Test"><img src="nothere.gif"></object>
<embed src="nonexistent.wav" autostart="false" loop="false"></embed>
<body bgcolor="#ffffff">Test test test</body>
<iframe src="http://kuro5hin.org"></iframe>
<meta http-equiv="refresh" content="5">
<form method="post" action="nonexistent.cgi">
<option selected="selected">Option 1
<option>Option 2
Again, this message was sent unaltered to the external account (and in fact caused Mozilla's mail client to barf rather unpleasantly all over the place; I had to use Pine to check the integrity of the message), but was filtered when received by the Yahoo account. The altered message had all of the tags changed just as NTK predicted except for the image (located inside the "object" element") and the form, which, instead of changing to "xform" as the NTK article predicted, had the attribute target="_blank" added to it. The final message rendered in the Yahoo inbox with this text:
document.write("Will print text if the script works"); //Testing Test test test
Yahoo's filter changed all the tags except the two mentioned (the image, being nonexistent, showed up as a broken box), and rendered the text within the filtered elements. It also rendered the form properly, giving me a drop-down selection box with two options. Interestingly, though, a look at the source of the message as viewed on Yahoo revealed that only the opening tags were changed; closing tags were left alone.

Finally, while it's obvious that the filter is applied upon viewing a received message and never when sending (the unaltered copies received by the alternate address prove this), the filter applies to messages viewed in the "Sent" folder and to viewed attachments (I tried each message both as an attachment and as the message itself) as well as to those in the Inbox and other folders, but isn't applied to unfinished messages saved to the "Drafts" folder, or to the preview of an HTML message before sending.

These tests covered every case I could imagine save one, which was beyond my ability; I wasn't able to download and view messages from a Yahoo account via POP3, as that is now a "premium" service and I haven't subscribed to it. If anyone who does have a premium subscription to Yahoo's mail service would like to try, I'll leave it to them to find out whether the filtering applies to messages downloaded via POP3.

So to summarize, Yahoo's filter operates as follows:

  • It only changes strings in messages explicitly marked as HTML; plain-text messages are unaffected and rendered normally.
  • It only applies when viewing the message and only after the message has been received by a Yahoo account; messages simply sent out from Yahoo are not subjected to the filter.
  • It applies to both message body and attachments.
  • The word filter is case-insensitive, but only changes words when one of the strings is found at the end of a word or in isolation.
  • The word filter can be evaded if necessary by tbe use of unconventional spacing.
  • The tag filter doesn't stop forms from rendering, but adds an attribute to them.
  • The tag filter only changes opening tags, leaving closing tags alone.
  • I don't know if filtering is applied to messages retrieved via POP3; someone with premium access can test if anyone's interested.

While the spacing trick will evade the word filter, I couldn't come up with a quick and easy way to evade the tag filter. So if you merely want to send pretty HTML-formatted emails, this shouldn't hamper you too much as long as you know what you're doing. If you're a h4x0r who wants to send scripts to people's inboxes for some reason, or a regular user who just feels a need to send Java applets or feature films embedded in emails, though, you're out of luck.


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Should Yahoo be filtering like this?
o Yes. 4%
o No. 26%
o Maybe, if they allow you to opt out of it. 14%
o Maybe, if they make it "opt-in". 16%
o Maybe, if they notify users of the filters and how they work. 38%

Votes: 42
Results | Other Polls

Related Links
o Kuro5hin
o Yahoo
o Yahoo's
o this article
o this page
o Also by paine in the ass

Display: Sort:
Yahoo's "Anti-Scripting" Filters Examined | 34 comments (31 topical, 3 editorial, 1 hidden)
Did you try using character entities? (4.66 / 6) (#2)
by Joe Groff on Tue Jul 16, 2002 at 12:47:04 AM EST

The usual way I get around filters like this is to throw &#nnn;-style character entities into the affected words; for example, "Javascript" could be written as "Jav&#97;script". While character entities aren't valid in tag names (though they're fine in attribute values), replacing select characters (such as vowels) in the text itself should keep it from being adulterated.

Just for reference, here are the ASCII values for all the English vowels:

A = &#65;
E = &#69;
I = &#73;
O = &#79;
U = &#85;

a = &#97;
e = &#101;
i = &#105;
o = &#111;
u = &#117;

How long must I travel on
to be just where you are?

I just tried it. (4.50 / 2) (#3)
by paine in the ass on Tue Jul 16, 2002 at 12:55:20 AM EST

It still changed the word - "medieval" became "medireview" anyway. Thanks for the suggestion, though, I'll add that to the story.

I will dress in bright and cheery colors, and so throw my enemies into confusion.
[ Parent ]

Zero Width Entities (4.00 / 2) (#23)
by Hai Etlik on Tue Jul 16, 2002 at 05:06:38 PM EST

What about a zero width entity embeded in the filtered word?

Zero Width Joiner &zwj; or Zero Width Non-Joiner &zwnj;

[ Parent ]
Obviously won't work (3.00 / 1) (#15)
by Aquablue on Tue Jul 16, 2002 at 10:57:41 AM EST

You might have guessed from Praine's review that this would not work. IMHO the whole message is run through somthing similar as the unescape() function of javascript wehn it is being displayed. From a programmer's perspective it's just much simpler to apply such a rule to the whole message than to test parts here and there or build logic into it.

They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Fr
[ Parent ]
Automatic filters are stupid, film at 11 (3.00 / 1) (#5)
by andrewm on Tue Jul 16, 2002 at 01:14:02 AM EST

It seems odd that some people (even apparantly technically literate people) are still only just discovering this - isn't that old news?

In any case, the 'real' solution is for javascript (etc) to not give up complete control of a user's machine. (Fortunately for us advertisers are, er, making the most of various scripting languages to take over people's web browsers and email clients. I knew it was useful for something important, other than virus writing.)

Of course, I still don't understand people who need to write a program in order to make their email look neat. As much fun as coding is, I can't imagine why I need javascript for a simple email. (And, im case anyone still doesn't realise this: even if you must email a neat animation or something, why does it need unlimited access to my machine? Why do I want someone to email me a program that will format my hard drive as soon as I lick on it? What moron thought that was a good idea?)

Why it's only just hitting the news (4.00 / 1) (#6)
by paine in the ass on Tue Jul 16, 2002 at 01:21:22 AM EST

I would guess that it's simply because there hasn't been much reason to notice it yet. Yahoo's been doing this for a while, but without really telling anyone about it, so you have to encounter it happening in the wild to notice it. The NTK article mentioned that it's getting to the point where people are wondering what "medireview" means, which may be why there's suddenly attention being paid to the issue.

I will dress in bright and cheery colors, and so throw my enemies into confusion.
[ Parent ]

Hahahaha (4.00 / 1) (#8)
by ariux on Tue Jul 16, 2002 at 02:08:47 AM EST

According to the linked article, the word "medireview" is now used by thousands of web pages, including book reviews, academic papers, newspaper columns, ... There's a half-facetious suggestion that it now be treated as a fait accompli and added to dictionaries!

[ Parent ]

Sandboxes (4.00 / 1) (#9)
by swr on Tue Jul 16, 2002 at 02:40:42 AM EST

In any case, the 'real' solution is for javascript (etc) to not give up complete control of a user's machine.

Javascript is not supposed give up complete control of the machine. Javascript, Java and Flash all run in a sandbox.

Occasionally bugs are found that allow sandboxed code to do things they aren't supposed to do, but it is relatively rare that those bugs give up complete control of the machine.

Cross-site scripting is one example of being bad without totally busting out of the sandbox. If K5 weren't as strict as it is about HTML, it would be possible for people to post JavaScript in comments that would take the contents of your login cookie and send it to the comment author's site as a form submission. That would allow people to access your K5 account, but it's a far cry from being able to format your hard drive.

The other major program-in-a-browser technology is ActiveX. It's by Microsoft ('nuff said).

[ Parent ]
medireview (5.00 / 3) (#11)
by baronben on Tue Jul 16, 2002 at 08:24:45 AM EST

and interesting story that I heard is that medireview, Yahoo's replacement for medieval (they don't want the term eval getting through) is turning up everywhere as an accepted replacement for medieval among people who do notspeak English fluently. a Google Search turns up a course depscription at the Momnican University of California using it as a part of a course description for medieval histroy, a study guide a Forhdam, and the torrest page for a town called Staffordshire.

Its hard to say if these are just the signs of bad editing, or that something as wide spread and as popular as yahoo mail can make a new word.

Ben Spigel sic transit gloria

Freedom of statement (none / 0) (#28)
by FreeBarking on Wed Jul 17, 2002 at 10:50:31 AM EST

A similar search for the phrase "freedom of statement" yields similar results... (591 pages found, at last look.)

A little scary if you ask me...

[ Parent ]

It's been going on for a while... (none / 0) (#32)
by cpatrick on Wed Jul 31, 2002 at 02:17:55 PM EST

Take a look at this message to a Yahoo group on medieval leather, sent quite some time ago, expressing curiosity at the term medireview.

[ Parent ]
Why are Yahoo doing this? (3.00 / 1) (#12)
by rdskutter on Tue Jul 16, 2002 at 08:30:45 AM EST

What is the point in changing the content of messages?

If you're a jock, inflict some pain / If you're a nerd then use your brain - DAPHNE AND CELESTE

I dont get it (2.00 / 1) (#13)
by FredBloggs on Tue Jul 16, 2002 at 09:39:48 AM EST

is it a bug? What does Yahoo say about it? Anyone asked them?

[ Parent ]
as I see it (4.50 / 2) (#14)
by Fuzzwah on Tue Jul 16, 2002 at 10:08:32 AM EST

Yahoo are attempting to limit if not remove any chance of people sending malicious emails to Yahoo email users. With out the filtering it's quite simple to construct an html email which when viewed by the Yahoo user messes with their browser along with other things.

It's Yahoo trying to protect their users by creating a system which may well do it, but not with out fiddling with real parts of messages.

If you're an archeologist and you're using a Yahoo mail account you'd better start evaluating reviewing your choice.

The best a human can do is to pick a delusion that helps him get through the day. - God's Debris
[ Parent ]

Further tests (5.00 / 1) (#16)
by bobpence on Tue Jul 16, 2002 at 11:06:43 AM EST

After reading this, I was curious about a couple things. "Script" in isolation and words ending in "script" are not affected, only the valid targets like "vbscript." Prefixing a dash does not prevent changing "eval" to "review," "mocha" to "expresso," or "expression" to "statement." However, appending a "'" or "~" (or presumably other non-whitespace character) to the end of the three words or a targeted fooscript does work, though it muddles communication in its own way.

Yahoo's "HTML Preview" function does not alert you to how your message will be changed, since as noted the functionality is on incoming mail. While the email display may mask these words, when replying with Yahoo mail, the original appears unchanged, so as a side-effect it may not be obvious that the original message has been altered when the sender receives a reply.

This does give me pause. I primarily use Yahoo mail for mailing lists, but some of them are code-related, so confusion could arise. "They did not have freedom of statement in medireview times." Hmmm. Preventing script attacks is a fine goal, but perhaps it would be best to bracket the original, offending word instead of changing it entirely.
"Interesting. No wait, the other thing: tedious." - Bender

Unreproduceable (3.66 / 3) (#17)
by Dolohov on Tue Jul 16, 2002 at 01:07:51 PM EST

I just followed the directions with my own Yahoo account, and did not get those results. Nothing in the test email was changed in the slightest. Could Yahoo have changed it as a result of criticism?

Worked yesterday (3.00 / 2) (#19)
by CaptainSuperBoy on Tue Jul 16, 2002 at 01:14:52 PM EST

As of yesterday, they were still replacing words.. I tried it and it worked fine. Did you make sure you were sending an HTML mail? It doesn't filter plaintext e-mails. Maybe they fixed it today after all the press it's gotten recently from NTK and Slashdot.

jimmysquid.com - I take pictures.
[ Parent ]
Yeah (3.00 / 2) (#21)
by Dolohov on Tue Jul 16, 2002 at 02:21:29 PM EST

I made sure it was an HTML email. I wonder if the age of the account (I've had mine for three years now) has any bearing on whether they filter?

[ Parent ]
You might also try this (3.50 / 2) (#22)
by paine in the ass on Tue Jul 16, 2002 at 03:07:46 PM EST

I don't know if it's relevant, but it may also be part of their "new" interface. Somewhere there's a pref you can enter to switch over to it, see if that does it. I'm also using an extremely old account, and Im still seeing filtering.

Alternatively, you could let me know where to send a test message and we'll see what happens ( I did enough of them while writing this that I think I can manage to set off the filter one more time).

I will dress in bright and cheery colors, and so throw my enemies into confusion.
[ Parent ]

New interface (none / 0) (#30)
by Dolohov on Wed Jul 17, 2002 at 03:19:28 PM EST

As a matter of fact, I am using the "beta" interface. I'd been using it so long, I had forgotten.

I just tried using the "classic" interface -- no dice. This doesn't mean anything, though, because Yahoo has by now gotten a lot of flak over it.

[ Parent ]

Yup (3.00 / 2) (#20)
by CaptainSuperBoy on Tue Jul 16, 2002 at 01:17:40 PM EST

I had to try it.. still works.

Medieval Retrieval Mocha Javascript -> Medireview Retrireview espresso java-script/FONT>

Geez.. they can't even parse the FONT tag properly.

jimmysquid.com - I take pictures.
[ Parent ]

yahoo.co.uk (none / 0) (#33)
by sgp on Fri Aug 02, 2002 at 09:13:13 PM EST

Just sent "Javascript script foo form expresso java mocha tomcat img" from myself @yahoo.co.uk to myself @yahoo.co.uk, got "_Javascript script foo form expresso java _mocha tomcat img" back<P>
In other words, it had done - as suggested here by some - prefixed "javascript" and "mocha" with "_". The rest was unchanged.<P>
So I sent myself:

<IMG SRC="http://steve-parker.org/foo.gif">
<FORM action="http://steve-parker.org/foo.cgi?foo">

and got it displayed exactly - not processed by my browser whatsoever. Ie, they'd changed "<" to "&lt;". Though the trailing "</SCRIPT>" became lowercase: "</script">

So I tried:

Sample C code:

int bigger(int a, int b) {
  if (a>b) return a;
  else return b;

which was sent okay. The message:

I propose a new tag <SPIN> which would mark government spin. EG:
<SPIN>Blair said that all trains would be on-time by Monday</SPIN>

which was also displayed as entered.

There are 10 types of people in the world:
Those who understand binary, and those who don't.

[ Parent ]

They do it on POP3 too (3.50 / 2) (#18)
by CaptainSuperBoy on Tue Jul 16, 2002 at 01:12:21 PM EST

They are filtering messages downloaded through POP3 as well, according to posts on Slashdot. This suggests that the filtering happens when a mail is received, rather than when it is downloaded. If they filtered upon viewing, why would they care about filtering messages sent to POP3 clients? Presumably they should already be running a protected mail client.

jimmysquid.com - I take pictures.
How could this be handled better? (4.00 / 1) (#24)
by bobpence on Tue Jul 16, 2002 at 06:13:26 PM EST

But It is disturbing that they actually change written words to other words, namely "eval" to "review," "expression" to "statement," and "mocha" to "espresso." It feels like - though it is not quite - an infringement on one's freedom of statement. I mean expression.

Nonetheless Yahoo!(R) apparently has a worthy goal, namely preventing malicious scripts executing while someone is reading their Yahoo or POP3 email within a Yahoo web page. So can it be done better?

Could merely parsing the fooscript words be enough to prevent scripts from executing? The mid-word hyphens do not interfere with readability, in my opinion. If the other words must be changed, perhaps they could be [bracketed], and a polite notice could be appended to the email, such as: "Yahoo has inserted hyphens into or surrounded with square brackets some words that are sometimes part of malicious automatic code."
"Interesting. No wait, the other thing: tedious." - Bender

Not really (none / 0) (#25)
by kurtmweber on Tue Jul 16, 2002 at 06:53:08 PM EST

It feels like - though it is not quite - an infringement on one's freedom of statement. I mean expression.

Not quite--they're not stopping you from expressing yourself as you please; they're just telling you that if you choose to do it and use their servers (their private property) at some point in the process, they may just alter your words somewhat. The only way a private entity (person or business) can infringe upon your freedom of expression is by threatening you with violent force--anything else that you may want to call such is merely an exercise of the owner's property rights.

Kurt Weber
Any field of study can be considered 'complex' when it starts using Hebrew letters for symbols.--me
[ Parent ]
If browsers were more flexible (none / 0) (#31)
by Sir Runcible Spoon on Thu Jul 25, 2002 at 04:19:58 AM EST

Mail clients now tend to have Javascript disabled to avoid the problems of malicous scripting. However, many sites don't work well if you have scripting disabled in your browser. So with a web based mail client you are stuck with javascript enabled.

The page designer needs is to be able to control the dynamic content within the page. Even when that content comes from an unknown source like an email. Javascript is associated with frames. So it would be useful to be able to disable javascript within specific frames.

<IFRAME SECURITY="scripting=no,images=yes" SRC="...">

[ Parent ]

Why mocha!?!?! (none / 0) (#26)
by kellan on Wed Jul 17, 2002 at 02:39:43 AM EST

I understand why you filter most of these words, or at least what the rationale was.  But what damage could the word "mocha" possibly do?  I'm totally stumped by it.

RE: Why mocha!?!?! (5.00 / 1) (#27)
by tgross on Wed Jul 17, 2002 at 05:07:27 AM EST

Mocha was (or is) used by Netscape as an alias for javascript. This is sort of an easter egg left by the programmers.

[ Parent ]
Filter seems to have changed... (none / 0) (#29)
by FreeBarking on Wed Jul 17, 2002 at 10:59:53 AM EST

I tried the filter this morning with my yahoo account.  There is still a filter there, but it's now a little less, umm, Orwellian....

My text included the words "expression", "medieval", and "mocha".  "medieval" was left untouched, while "expression" and "mocha" each had an underscore prepended: _expression, _mocha.

This likely achieves the same goals without distorting the content of the message...

so is the filter worth it? (none / 0) (#34)
by ACG on Tue Aug 20, 2002 at 08:35:01 PM EST

I don't understand most of this - I read the article because I use Yahoo for email. What I want to know is, is the filtering effective against spam (a lot still gets through), should I keep it on, should I click that HTML box, and what are the chances of an email that I'd actually like to receive ending up in the "Bulk" folder? Thanks.

Yahoo's "Anti-Scripting" Filters Examined | 34 comments (31 topical, 3 editorial, 1 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!