Mining for terrorists
By imrdkl in Technology
Mon Feb 04, 2002 at 01:43:48 PM EST
Tags: Software (all tags)
To paraphrase an old tootsie-roll pop commercial:
How many CPU cycles does it take to get to the center of a terrorist ring?
And the answer from Mr. Owl:
Lets find out.
Several major airlines are already working with Accenture to implement what will
probably be the largest and most expensive shared data mining application ever sold.
Now, I ask you for a bit of forbearance here. I work in a largish telecom, and
occasionally work with databases that contain more than an hundred-million records, but that's peanuts compared to the effort described by Robert O'Harrow, the
author of the Washington Post (one)
, also linked above.
Given my limited experience then, I pose the question, is an undertaking of the
scale described doable? O'Harrow notes that it would be years before such
a system could be fully in place, of course, but does anyone care to discuss the
There are, of course, many other issues raised by the notion of nationwide correlation of data to look
for terrorists on airplanes (or anywhere else, presumably). Perhaps Paul Werbos, from the NSF, and a neural networks specialist,
said it best:
Such systems need to be used carefully. While there is no doubt that profiling can
improve security we have to be very careful not to create punishments, disincentives,
for being different from average.
But this is not op-ed, neither is it freedom/politics. I would like
to discuss, primarily, the logistics and design of a system which would be
able to correlate data between and among all of the airlines, credit-card companies,
ticket-payment records, local/state/federal government agencies, telephone records,
and other datasets which
would be required to obtain a usable, and valuable passenger profile (or Risk Factor).
The article gives a very simple example, wherein a common purchaser is found for a group
of passengers' tickets, but unless this data were saved at ticket purchase time, making this
determination would require an n^2 comparison algorithm over all passengers on
the flight. (A "Cursor" in SQL terminology)
So, the way I see it, there are four primary requirements:
According to O'Harrow, it's claimed and cautiously accepted
by an increasing number of people that an reasonable scoring of a passenger can be obtained.
But this also implies that the shared common data are in place and up to date,
and most importantly, being properly queried to construct relevant datasets for
the passengers, flight, airline, city of origin/destination,
amount of fuel in plane, nearby large buildings in the (early) flight path, or any
grouping of the above, along with many other variables which I can't possibly begin
- Shared data model - data-mapping to the common model from all carriers, with updates
- Shared network - with plenty of fat pipe
- Shared CPU - distributed processing (beowulf, anyone?)
- Determining the "Risk Factor" for a passenger - using neural networks
Just think of the
confusion that is already there, and then imagine trying to make a unified
data model for 20 or more different airline companies to share customer profile
data in realtime, just to get you through the security gates.
And I wont even start in on
the network bandwidth requirement, except to say that it would probably be enough
to flood most pipes, not to mention
the shared topology between the carriers that would be required.
Anyone care to speculate how any one or more of the requirements could be met? There's
alot of good fodder here for techs and engineers/developers, of course. Not to mention
the liberty/privacy issues.