Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Mining for terrorists

By imrdkl in Technology
Mon Feb 04, 2002 at 01:43:48 PM EST
Tags: Software (all tags)
Software

To paraphrase an old tootsie-roll pop commercial:

How many CPU cycles does it take to get to the center of a terrorist ring?

And the answer from Mr. Owl:

Lets find out. one, two, three.

Three..

Several major airlines are already working with Accenture to implement what will probably be the largest and most expensive shared data mining application ever sold.


Now, I ask you for a bit of forbearance here. I work in a largish telecom, and occasionally work with databases that contain more than an hundred-million records, but that's peanuts compared to the effort described by Robert O'Harrow, the author of the Washington Post (one) article , also linked above.

Given my limited experience then, I pose the question, is an undertaking of the scale described doable? O'Harrow notes that it would be years before such a system could be fully in place, of course, but does anyone care to discuss the theoretical deployment?

There are, of course, many other issues raised by the notion of nationwide correlation of data to look for terrorists on airplanes (or anywhere else, presumably). Perhaps Paul Werbos, from the NSF, and a neural networks specialist, said it best:

Such systems need to be used carefully. While there is no doubt that profiling can improve security we have to be very careful not to create punishments, disincentives, for being different from average.

But this is not op-ed, neither is it freedom/politics. I would like to discuss, primarily, the logistics and design of a system which would be able to correlate data between and among all of the airlines, credit-card companies, ticket-payment records, local/state/federal government agencies, telephone records, and other datasets which would be required to obtain a usable, and valuable passenger profile (or Risk Factor). The article gives a very simple example, wherein a common purchaser is found for a group of passengers' tickets, but unless this data were saved at ticket purchase time, making this determination would require an n^2 comparison algorithm over all passengers on the flight. (A "Cursor" in SQL terminology)

So, the way I see it, there are four primary requirements:

  • Shared data model - data-mapping to the common model from all carriers, with updates
  • Shared network - with plenty of fat pipe
  • Shared CPU - distributed processing (beowulf, anyone?)
  • Determining the "Risk Factor" for a passenger - using neural networks
According to O'Harrow, it's claimed and cautiously accepted by an increasing number of people that an reasonable scoring of a passenger can be obtained. But this also implies that the shared common data are in place and up to date, and most importantly, being properly queried to construct relevant datasets for the passengers, flight, airline, city of origin/destination, amount of fuel in plane, nearby large buildings in the (early) flight path, or any grouping of the above, along with many other variables which I can't possibly begin to imagine.

Just think of the confusion that is already there, and then imagine trying to make a unified data model for 20 or more different airline companies to share customer profile data in realtime, just to get you through the security gates. And I wont even start in on the network bandwidth requirement, except to say that it would probably be enough to flood most pipes, not to mention the shared topology between the carriers that would be required.

Anyone care to speculate how any one or more of the requirements could be met? There's alot of good fodder here for techs and engineers/developers, of course. Not to mention the liberty/privacy issues.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
A useful profile can be developed for every passenger?
o yup 7%
o nope 52%
o Only with Rusty's help 40%

Votes: 42
Results | Other Polls

Related Links
o one
o two
o three
o Three.
o article
o confusion
o Also by imrdkl


Display: Sort:
Mining for terrorists | 29 comments (25 topical, 4 editorial, 0 hidden)
Typo... (4.00 / 2) (#1)
by jkmiecik on Sun Feb 03, 2002 at 07:16:27 PM EST

Several major airlines are already working with Ande^H^H^Hccenture to implement what will probably be the largest and most expensive shared data mining application ever sold. Interesting company, Ande^H^H^Hccenture. DUH!

Re: Typo... (none / 0) (#3)
by protactin on Sun Feb 03, 2002 at 07:25:42 PM EST

I believe the author is referring to "Arthur Anderson who, for whatever reason, changed their name fairly recently to Accenture..

[ Parent ]
Re: Typo (5.00 / 1) (#4)
by danimal on Sun Feb 03, 2002 at 07:29:42 PM EST

Andersen Consulting changed their name to Accenture because they split off and went independent from Arthur Andersen the accounting firm and could no longer use the Andersen name.


--
<tin> we got hosed, tommy
<toy> clapclapclap
<tin> we got hosed

[ Parent ]
Preemptive Move (5.00 / 1) (#7)
by truth versus death on Sun Feb 03, 2002 at 08:01:31 PM EST

Isn't Arthur Andersen the company that helped Enron cheat its investors?

"any erection implies consent"-fae
[ Trim your Bush ]
[ Parent ]
oh yes (4.00 / 1) (#8)
by Arkady on Sun Feb 03, 2002 at 08:17:46 PM EST

Yes indeedy; you got that in one. ;-)

-robin

Turning and turning in the widening gyre
The falcon cannot hear the falconer;
Things fall apart; the centre cannot hold;
Mere Anarchy is loosed upon the world.


[ Parent ]
doh! (none / 0) (#29)
by danimal on Mon Feb 18, 2002 at 07:28:51 PM EST

i meant consulting, not accounting. Enron was too on my mind :)
--
<tin> we got hosed, tommy
<toy> clapclapclap
<tin> we got hosed

[ Parent ]
Bet they're glad they lost (5.00 / 1) (#13)
by TON on Sun Feb 03, 2002 at 09:43:30 PM EST

The consulting division fought hard to keep the then valuable Andersen name. They only changed to the meaningless Accenture after losing in corporate divorce court. Sometimes it's good to lose.

"First, I am born. Then, the trouble begins." -- Schizopolis

Ted


[ Parent ]

IIRC... (none / 0) (#16)
by deefer on Mon Feb 04, 2002 at 08:01:26 AM EST

Their marketing spin was "we've changed our name because we wouldn't expect our clients to do anything we wouldn't"

I remember at the time thinking "eh?", but you comment explains the split and subsequent "rebranding"


Kill the baddies.
Get the girl.
And save the entire planet.

[ Parent ]

definition of "likely" terrorist (5.00 / 5) (#2)
by Arkady on Sun Feb 03, 2002 at 07:18:27 PM EST

The biggest problem (which, unlike procesing power of data collection/storage, cannot be solved with just more money) is that no one has come up with a viable description of a likely terrorist which could then be translated into the sort of tracks such a model would leave across the data they're using.

For example, how likely are "terrorists" to use credit/debit cards versus cash or checks and on what sorts of transactions will they use each? Is a "terrorist" more or less likely to have a gold card than anyone else? As the planners of the World Trade Center attack demonstrated, it is possible to be both a terrorist _and_ a reasonably "normal" person as well.

An additional complication is the wide variety of potentially "terrorist" causes. Consider the S.F. Chrinicle article on the Animal Liberation Front on their site today. The ALF, by the F.B.I.'s own admission, has never killed or injured a human in any of their actions, yet the F.B.I. considers them an extremely important terrorist organization. Now, try to come up with a model which will describe 1) a field operative for Al Quaeda, 2) a member of the I.R.A. _and_ 3) a member of A.L.F. without dropping important attributes of one which do not fit the others. ;-)

-robin

Turning and turning in the widening gyre
The falcon cannot hear the falconer;
Things fall apart; the centre cannot hold;
Mere Anarchy is loosed upon the world.


As I understand it, they dont need completeness (5.00 / 1) (#12)
by imrdkl on Sun Feb 03, 2002 at 09:08:29 PM EST

The basis for an neural implementation would not need to be a complete model. Of course, for the first few trillion cycles, it might make alot of bad guesses...

[ Parent ]
Okay (5.00 / 1) (#23)
by ghjm on Mon Feb 04, 2002 at 04:07:55 PM EST

The idea with neural networks is that they approximate the pattern-matching capabilities of the human brain. However, unless I'm badly behind on the news, no neural network has yet matched the human brain in this regard.

So let's suppose we were able to use people instead of machines for the task. There's an obvious scaling issue here, but for the purposes of discussion, suppose you could spend an infinite amount of time working at the problem. Could you find a terrorist?

What you have are records of credit card transactions, airline ticket purchases, hotel accomodations and rental car contracts. You can see what was paid in cash, when, whether the individual is a frequent traveler or not, and maybe even look at a picture of the person.

How far could you take this? Maybe you could say "this person should be looked at more closely" but you would have absolutely no basis for calling them a terrorist. You certainly wouldn't have probable cause, for example.

If people can't do it, why do we think machines can do it?

[ Parent ]
But the idea is (none / 0) (#26)
by imrdkl on Mon Feb 04, 2002 at 06:40:04 PM EST

to get a "Risk Factor", which would then notify a human who does in fact lack infinite time. A Risk Factor does not a terrorist make, hopefully. But a bad guess will make a smarter neural, yes?

[ Parent ]
Is this new? (1.00 / 1) (#5)
by kaemaril on Sun Feb 03, 2002 at 07:37:12 PM EST

Don't the guys in the black helicopters already have this? :)


Why, yes, I am being sarcastic. Why do you ask?


It will never work (5.00 / 3) (#6)
by theboz on Sun Feb 03, 2002 at 07:57:17 PM EST

Much like their counterpart Anderson, which has proven their worth with the moral and intelligence they have shown with the Enron case, Accenture is full of idiots.

I'd be suprised if they get Oracle installed without switching consultants about five times, much less actually developing the software to run on it. I think we should all be concerned that the airlines and the U.S. government are willing to waste our money on a database that will not work and require years of maintenance. I'd rather my security money go to something useful, like putting security guards on airplanes rather than this joke.

Stuff.

Luckily, for civil liberties, it won't work (none / 0) (#28)
by Johnny Mnemonic on Tue Feb 05, 2002 at 01:38:24 PM EST

Andersen aren't really Accenture's counterparts. Remember, until recently they were the same company. I don't have any figures to hand, but my guess is that Accenture billed Enron far more than Arthur Andersen did.

[ Parent ]
Not n^2, n log n (5.00 / 1) (#15)
by KWillets on Mon Feb 04, 2002 at 05:10:26 AM EST

The article gives a very simple example, wherein a common purchaser is found for a group of passengers' tickets, but unless this data were saved at ticket purchase time, making this determination would require an n^2 comparison algorithm over all passengers on the flight. (A "Cursor" in SQL terminology)

Thanks for making me feel needed. At worst this operation would require a sort (nlogn) and a scan of the output for groups larger than a chosen size.

In SQL terms, this would be written:

select purchaser, count(*) from tickets group by purchaser having count(*) > 1 -- or 2, etc.

Of course the data could be indexed, which would move the overhead to the time the data is inserted, but it's the same either way. Those terrorists will cut and run once they see the power of heapsort.



Fantasy... (4.33 / 3) (#17)
by Znork on Mon Feb 04, 2002 at 11:29:40 AM EST

'"This is not fantasy stuff," said Joseph Del Balzo.'

No, it is not. It is stark raving lunacy and a complete fraud. Just like facial recognition technology.

Yes, datamining is useful, and yes you can do some things with it, but you cannot use it in this way. Just like facial recognition, you will get a whole bunch of 'risky' people out of this system. Of which none will be a terrorist.

The trouble lies in the problem space itself. These systems are not perfect, and they cannot be perfected. You have to balance to avoid as many false positives and false negatives as you can, but you always get an error margin. When you are going through tens or hundreds of millions of passengers and try to sort out half a dozen or so potential real threats, you will get thousands or tens of thousands of false alarms. The systems wont end up trusted because they constantly cry wolf and violate innocent peoples integrity through whatever actions are taken based on the false alarms. These systems are once in a lifetime 'correct', several times each day 'wrong'.

Yes, systems like this can be useful when you want to either accept false positives (such as searching crime databases for possible suspects) or when you want to accept false negatives (such as ID checking, where you can try again when you're rejected). But they're not useful when you cannot accept either, and especially not when you have the kind of datasets you have in an airport.

Count Your False Positives (none / 0) (#24)
by cbbrowne on Mon Feb 04, 2002 at 05:07:58 PM EST

On the average day, how many terrorsts do you have in the air, planning raids?

One? Ten? A thousand?

More likely there's something like a 1/100 probability of there being one

Now, consider the number of passengers getting on aircraft, on a typical day. Is it ten? No. 10,000? Nope; that's a typical hour at O'Hare. The number is more likely on the order of several million.

If a system points at 10 would-be terrorists per day, that's likely to be inaccurate by a factor of thousands and is doing a whopping lot of false-positive reports.

They're trying to pick out something really, really, really rare. If the system's working, then on the average day, it should pick up ZERO people for special examination.

Have you any expectation that this is what is likely to fall out of this? I certainly don't; I think any such system that gets implemented will be error-prone to the point of reporting thousands of times as many would-be "terrorists" as actually exist.
"Microsoft OS's are good because they encourage Intel to produce faster CPUs for the rest of us to run Unix on." -- George Dau
[ Parent ]

unlikely to work at all. (none / 0) (#25)
by bobzibub on Mon Feb 04, 2002 at 05:28:04 PM EST

Trying to predict a statistically insignificant percentage of the population from the huge (and diverse) data set of the flying public sounds like barrels of pork to me.

For this to have any possibility of catching a terrorist the tests would have to be so weak that prohibitively large numbers of false positives would be generated. So the airport security lineups themselves might become a target.

It is one thing if you are searching for 5% or 10% of a large population. It is another when we are talking in the order of about 10 out of 40 million people! And it also assumes that the 10 do not take any counter measures or simply avoid flying. Even if it does prevent a disaster in the air, does this mean that terrorists would be prevented from any terrorist activity?

So what exactly does this system buy you? Though I feel flying is quite safe, this kind of thing wouldn't make me feel safer.

Months prior to our last trip, my wife put a 12" long flat-head screwdriver in the bottom of her laptop case. She uses these tools for work, and I never checked the laptop case closely enough. I was borrowing the laptop to code and ease my boredom on an international flight and the thing made it through the security checks! I was horrified when I was rummaging in the bottom of the case and found this thing--while cruising at X thousand feet. I had visions of being arrested and two F-16s escorting the plane to the nearest strip of pavement. All the while me saying to the other passengers: "sorry! I'm so sorry!"

While departing, I took the laptop out as per standard procedure and the case went through the xray machine. This was about 3 weeks ago.

Perhaps there are better things to spend limited funds on than this boondoggle of a data mining system?

btw I have it in front of me: a R148 Xcelite screwdriver. Nice tool. Nasty weapon.












[ Parent ]
Last trip... (none / 0) (#27)
by lucidvein on Tue Feb 05, 2002 at 12:13:24 AM EST

My best friend is works for a distributor. He flew to Cali for a convention without any problems, besides the normal security lines. On the way back though, he was pulled to the metal table where they swab your bag and do searches of the contents. The security agent directed him to place his bag on the table and 'do them both a favor, and just give up the box knife.' Not having packed a box knife, he asked why they thought he did. They said a thin metal object was seen in the x-ray and all they wanted was to confiscate the knife. Searching through his bag he pulled out his work badge and showed it to them. With a quick dismissal he was then allowed to continue to the gate.

Unpacking at home he pulled out a pair of work pants he had packed but not worn. In the pocket was his box knife.

My travels have not been so eventful. I usually get the full body search and pat down along with sending my shoes through the x-ray machine.

On that note, why not use a karma based system like slash or scoop to determine the trust of passengers. I'm half joking, but think about it. If you have no flight history you get searched regardless. As your flight history grows, agents can discern whether this is more or less a regular flight by browsing your profile. When you've posted, err flown enough flights and maintained a good rating you can begin rating/vouching for other flyers. This in no way stops someone from faking out the system, but it would be more interesting...

An openly discriminatory database is better than relinquishing control over to a private group of administrators who have unlimited or unknown control over the data. Isn't it?

[ Parent ]
No wonder ... (none / 0) (#18)
by streetlawyer on Mon Feb 04, 2002 at 11:49:58 AM EST

my fucking broadband doesn't work, if major telecoms are staffed by the likes of you.

The article gives a very simple example, wherein a common purchaser is found for a group of passengers' tickets, but unless this data were saved at ticket purchase time, making this determination would require an n^2 comparison algorithm over all passengers on the flight

This can't be right. It can't be. The n^2 comparison algorithm would be the most boneheaded, brute-force way possible to go about solving this problem. I can't quite work out how it could be bettered (I suspect that you could sort the records by purchaser in significantly less than N^2 leaving plenty of time to search the sorted list), but I simply can't bring myself to beleive that the best computer science can do for this extremely common problem would be O(N^2).

--
Just because things have been nonergodic so far, doesn't mean that they'll be nonergodic forever

NLogN (none / 0) (#19)
by imrdkl on Mon Feb 04, 2002 at 12:18:51 PM EST

As already pointed out, as I hoped it would be. But thanks for your insightful feedback. I'll see what I can do about your broadband right away.

[ Parent ]
yeah sorry about that (none / 0) (#20)
by streetlawyer on Mon Feb 04, 2002 at 12:29:53 PM EST

I'm just a bit bitter about it at the moment. Why oh why, etc. My suggestion would be that the database ought to have more information about the problem than "THING NEEDS DONE!", to reduce the number of times some fucker shows up on my doorstep with the wrong toolkit. But there you go.

--
Just because things have been nonergodic so far, doesn't mean that they'll be nonergodic forever
[ Parent ]
The real problem (none / 0) (#22)
by trhurler on Mon Feb 04, 2002 at 12:58:15 PM EST

You can sort the data by purchaser, but there are several things to keep in mind.

First, practical problems. I can think of two. They will stop using one guy to buy multiple tickets as soon as you implement this, making it useless, and also, even with only one guy, if he has two credit cards in different names, you still get nothing. (Getting fake ID in the US that will stand up to a credit card company is not trivial, but it isn't impossible either, and it doesn't cost that much if you know where to look. The proposed national ID system wouldn't help.)

Second, computer problems. In order to sort the data by purchaser(actually, in a db, you'd just have an index keyed to passenger, but that's not relevant here,) you have to maintain the sort(or index) which makes insertions at least O(n) and probably higher, because you aren't going to use hash tables for this quantity and variety of data(you'd need more memory than exists on the planet, for one thing:) Counting in network delays(primarily latency and connection setup/teardown, but there are others,) and the extreme load of having every airport everywhere connected to one database, you're talking about a delay probably measured in minutes per ticket sold. That's a disaster. Even using an O(n) or O(n log n) lookup doesn't help; insertion on O(n log n)(some sort of tree, probably,) is at least O(n log n), which is not better than O(n), and going to an O(n) lookup requires an O(n) insert at minimum also.

I personally doubt this thing can be built from existing technology at all if it is to be actually practically usable nationwide, and in any case nobody is accounting for the fact that the people it is trying to spot will increasingly spent greater proportions of their resources to appear normal as their expertise in exploiting common things(ie things that don't have huge costs to them, like airliners,) as weapons grows. The more normal they look, the less chance you have to catch them, and we don't even know that the next attack will have anything to do with an airplane of any kind whatsoever; they might come over on a ship and sink an oil tanker using a few pounds of commercially available Semtex for all we know. (Do that in a busy harbor, and you can outright shut it down for weeks or months. Think of the economic damage. This is just one of many possibilities; our infrastructure is not designed to withstand deliberate assault by people willing to die for their causes.)

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Sonic Foundry is also getting into the gane. (4.00 / 1) (#21)
by bmasel on Mon Feb 04, 2002 at 12:47:04 PM EST

ADVISORY/Sonic Foundry to Hold Congressional Briefing Offering Alternative to National ID Card

"Designed to help government agencies capture and manage multiple types of human identification information, Sonic Foundry's Unified Security View(TM) (USV) utilizes continuous enrollment capabilities and a multi-biometric analysis engine to provide a more thorough, timely ID dossier that ensures greater accuracy in the identification process. The USV platform is derived from a seven-year, $20 million Carnegie Mellon University research effort funded by leading government agencies and private corporations, including NASA, DARPA, NSF, Bell Atlantic, Boeing, CNN, Intel and Microsoft."



I am not currently Licensed to Practice in this State.
Mining for terrorists | 29 comments (25 topical, 4 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!