Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Advanced Website Usage Reporting with Open Source Tools

By chroma in Internet
Thu Feb 23, 2006 at 10:18:59 AM EST
Tags: Technology (all tags)
Technology

O wad some Power the giftie gie us
To see oursels as ithers see us!

Those lines by Scottish poet Robert Burns reflect a desire to look through the eyes of another as a means of self discovery. This same desire burns in the hearts of every website creator: to find out exactly how people see and their web pages. The most commonly used web log analyzers provide an incomplete picture at best. But with a little effort, I was able to create my own analysis tools that provide great potential for finding out exactly how my web site is used. I'll describe my techniques for advertising and referral tracking, and follow that with a description of employing a relational database to perform deep analysis of how my website is used.


The best laid schemes

I had just created a website for online machining. The site has gotten a decent buzz on a few weblogs, message boards, and the like. Also, in my other life as a consultant, one of my clients was inquiring about starting web advertising so I felt the time was right to try keyword targeted advertising through Google.

Now, Google has some great features in AdWords, its advertising software. For instance, AdWords can tell you how many people clicked on a particular ad. It can also tell you how many people saw your ad when searching. But it can’t reliably give you much information about how people who saw your ad behaved once they got to your site. It also doesn’t make it easy to tell which ads are the most effective; it only gives aggregate data for all ads grouped into a “Campaign”.

It seemed logical to try to use the Apache server logs to try to gain more information about my users. I tried several open source and shareware packages for web logfile analysis. The shareware packages seemed to hold little advantage over their open source kin other than ease of setup. The open source packages vary in usefulness, with webalizer being my favorite for basic analysis. It gives information about number of pages per day, most popular referrers, and the like.

But nothing seemed to be able to give me the level of detail that I needed. I wanted to track users, categorize them, and analyze their behavior in detail. In particular, I wanted to find out:

  1. Which Google ads were the most effective at getting people to buy stuff from me.

  2. The differences among users who came to Big Blue Saw from the various Google ads, as well as from other linking websites.

A cup o’ kindness

Eventually, I did figure out a way to distinguish among visitors coming from various Google ads.

AdWords allows you to set a different target URL for each ad. For those of you who are unfamiliar with AdWords, you create an ad by filling out a form that looks something like this:

Headline:              _____________          Max 25 characters

Description line 1:    _____________          Max 35 characters

Description line 2:    _____________          Max 35 characters

Display URL:           http://_____________   Max 35 characters

Destination URL:       http://_____________   Max 1024 characters

Originally, I filled out this form to look something like this:

Headline:             Easy Waterjet Cutting               Max 25 characters

Description line 1:   Instant quote on waterjet cutting   Max 35 characters

Description line 2:   Save yourself time and trouble      Max 35 characters

Display URL:          http://bigbluesaw.com               Max 35 characters

Destination URL:      http://bigbluesaw.com               Max 1024 characters


A user's click on the ad would yield a line in my server’s access log that looked something like this:

65.182.228.185 - - [13/Jan/2006:15:57:06 -0500] "GET / HTTP/1.1" 200 6010 "http://www.google.com/search?hl=en&lr=&q=metal+cutting" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)"

The part that says “GET /” indicates that the web browser was attempting to access the web page at http://www.bigbluesaw.com/. If I tweaked the destination URL thusly:

Destination URL:     http://bigbluesaw.com/saw/content/view/29?gg-k   Max 1024 characters


I would get lines in the access log like this:

65.182.228.185 - - [13/Jan/2006:15:57:06 -0500] "GET /saw/content/view/29?gg-k
HTTP/1.1" 200 6010 "http://www.google.com/search?hl=en&lr=&q=water+jet+cutting"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)"

If I customized each the destination URL for each ad, I could now figure out who had clicked on what ad. So I was part of the way there. I was still wading through logfiles manually, though. 

Newly sprung

I decided that I needed to throw all of the logfile data into a relational database. This would provide maximum flexibility in performing queries. With creative use of SQL, I could get any data I needed and format it however I wanted.

A couple things had to come first, though. I wanted to use a cookie to uniquely identify each user of the site. This is easiest to achieve through the Apache web server’s mod_usertrack module. I added the following lines to my httpd.conf to enable user tracking cookies:

LoadModule usertrack_module modules/mod_usertrack.so CookieTracking on CookieName BigBlueSaw

Next, I wanted to make sure that I could uniquely identify each request in order to provide a key value for looking up the request in the database and to ensure that I wouldn’t run into problems if I accidentally loaded the same logfile data twice into the database. This meant enabling Apache’s mod_unique_id by adding the following line to httpd.conf:

LoadModule unique_id_module modules/mod_unique_id.so

The last change I made was to Apache’s logging, so that the user tracking cookie and the request ID were both logged. The logging configuration line

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" combined

became

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}n\" %{UNIQUE_ID}e" combined

Now, when a user made a request for a web page, the log file would record a line like the following:

65.182.228.185 - - [13/Jan/2006:15:57:06 -0500] "GET /saw/content/view/29?gg0 HTTP/1.1" 200 6010 "http://www.google.com/search?hl=en&lr=&q=water+jet+cutting" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)" "65.182.228.185.1137185826083340" tTTWfkU9SKYAADAJyWwAAAAP

Requests were now tied to a one user profile on one computer. All requests coming from a particular user would have the same cookie value, so that I could track him even if he changed IP addresses.

Like a red, red rose

The logfile data was now ready to go into the database. I made a table in MySQL that could store the values from the logfile, with a database field corresponding to each field from each log line. Here's the DDL to create the REQUEST table:


CREATE TABLE `request` (
`request_id` varchar(20) NOT NULL default '',
`remote_host` varchar(30) NOT NULL default '',
`ident` varchar(20) NOT NULL default '',
`req_time` datetime NOT NULL default '0000-00-00 00:00:00',
`response_code` int(11) NOT NULL default '0',
`response_size` int(11) NOT NULL default '0',
`referer` varchar(100) NOT NULL default '',
`user_agent` varchar(110) NOT NULL default '',
`tracking_cookie` varchar(30) NOT NULL default '',
`req_type` varchar(10) NOT NULL default '',
`req_url` varchar(130) NOT NULL default '',
`req_version` varchar(10) NOT NULL default '',
PRIMARY KEY (`request_id`),
KEY `tracking_cookie_index` (`tracking_cookie`)
)

I needed a way to get the data from the log files into the database, so I whipped up a simple Java program to do the job. It's pretty straightforward, using StringTokenizer to split up the line.

StringTokenizer matcher = new StringTokenizer(currentLogLine); String remoteHost = matcher.nextToken(); ...

I use the following SQL statement to insert a new record into the REQUEST table:

REPLACE INTO request ( request_id , remote_host , ident ,
req_time , response_code , response_size ,
referer , user_agent , tracking_cookie ,
req_type , req_url , req_version )
VALUES (?,?,?, STR_TO_DATE(?,'%d/%b/%Y:%k:%i:%s'),?,?, ?,?,?, ?,?,?)

The MySQL's SQL dialect uses REPLACE to either insert a new record, or update an existing record with new values. This technique, combined with the unique request identifier from Apache, overcomes a major hassle with other web analysis tools. Most analysis tools become confused if you have multiple log files and you parse them out of order. Not so here. You can load logfiles in any order. You can even analyze multiple logfiles from different webservers in the same database table.

To see ourselves

At this point, I could now try to get data from the database. I can enter a simple SQL query into the database client:

mysql> SELECT count(request.req_url) FROM request
where request.req_url like concat('%','?gg-k','%');

This query simply asks the question “How many hits were on pages that contain the string “?gg-k”, the special string I appended to the URLs from certain ads. The database gives me back the answer:

+------------------------+
| count(request.req_url) |
+------------------------+
| 950                    |
+------------------------+
1 row in set (1.94 sec)

So the Google ad in question has been clicked 950 times (roughly; I'm not taking into consideration things like page reloading and a few other factors here). A good start, but nothing that any decent log analyzer or even Google itself couldn't tell me.

Here's another query. This one says: show me all of the user tracking cookies in the database:

mysql> SELECT distinct request.tracking_cookie
FROM request
where request.req_url like concat('%','?gg-a','%');

+--------------------------------+
| tracking_cookie                |
+--------------------------------+
| 68.162.193.157.113528952417095 |
| 68.52.245.11.1135318360113805  |
| 68.82.175.165.1135644890112333 |
| 68.102.200.55.1135736458184226 |
| 64.212.135.225.113574146607752 |
...

This query next says “Show me the most popular files requested on the site and how many times they have been requested:”

mysql> SELECT req_url, count(request_id) as reqCount from request group by req_url order by reqCount desc;
+-------------------------------------------------------+----------+
req_url                                               | reqCount |
+-------------------------------------------------------+----------+
| /saw/                                        
        | 6691     |
| /                                            
        | 5342     |
| /saw//templates/saw_shop/css/template.css    
        | 4816     |
| /saw//templates/saw_shop/css/template_css.css
        | 4813     |
...

Neither one of the previous two examples are all that interesting, as they're pretty ordinary web log analysis. But now, since we're operating in the realm of SQL, we can combine the above queries to get something far more interesting. This is the reason I went down the road of developing my own web log analysis tools and this is why I am writing this article now. If we wanted to know which files were most requested by the users who came in through a particular ad, that information is just a query away:

SELECT req_url, count(request_id) as reqCount
from request
where request.tracking_cookie in
( SELECT distinct request.tracking_cookie
FROM request
where request.req_url like concat('%','?gg-k','%')
)
group by req_url
order by reqCount desc;
+----------------------------------------------------------------------+-------- --+
| req_url                                                              | reqCount |
+----------------------------------------------------------------------+-------- --+
| /saw/templates/saw_shop/css/template.css
                            | 645      |
| /saw/templates/saw_shop/css/template_css.css
                        | 638      |
| /saw/templates/saw_shop/images/bluesaw-words-med.png
                | 638      |
| /saw/templates/saw_shop/images/bluesaw-med.png                       | 631      |
| /saw/images/stories/process-all-small.png                            | 545
     |
| /saw/?gg-k
                                                          | 449      |
| /saw/components/com_docman/cad_img.php?file=A0A0-A0A0-bfly-small.dxf | 370
      |
| /saw/templates/saw_shop/images/bluesaw-icon.png
                     | 245      |
| /saw/component/option,com_docman/task,upload/Itemid,33/step,1/       | 242
      |
| /saw/content/category/3/15/32/
                                      | 229      |
...

Now I'm getting information that was previously hidden from view. I can now say that my users who come in from Google typically end up on the first page of my part ordering process (that's the one that ends with the “step,1”) and on my parts FAQ page (the one with “3/15/32”).

I could use these techniques to compare these numbers to those from users who come in through blogs, find out which pages are viewed by people about to order, determine how often repeat visitors come back before buying, or any number of interesting statistics. If you choose to go down this road, your capabillities will be limited only by your imagination. You will have the ability to see your website as others see it.

A Man's A Man For A' That

This article describes powerful techniques for analyzing human behavior and thought. We must keep in mind, however, that these and other related technologies have significant potential for abuse. I need not recite the list of freedom draining techniques dreamt up by leaders on government and commerce. It is up to every one of us to keep a vigilant eye on those who wish to use information gathering techniques to gain dominion over our actions and thoughts, and, if necessary, act to stop them.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Google
o a website for online machining
o AdWords
o webalizer
o http://www .bigbluesaw.com/
o mod_usertr ack module
o mod_unique _id
o simple Java program
o Also by chroma


Display: Sort:
Advanced Website Usage Reporting with Open Source Tools | 83 comments (43 topical, 40 editorial, 0 hidden)
I still +1 FP this....so there /nt (2.00 / 2) (#1)
by FeatheredSerpent on Mon Feb 20, 2006 at 09:50:57 PM EST



-- THE GEORGE W. BUSH CONSPIRACY GENERATOR --
Website editing? (2.50 / 2) (#16)
by rpresser on Tue Feb 21, 2006 at 09:54:28 AM EST

(Couldn't decide if this was "editorial" since it deals with text nitpicking, or "topical" since it refers to the website, not the article)

Anyhow, I found two strangenesses/annoyances on your website, both in the FAQ section. One is that whenever you want to use some kind of punctuation - not sure exactly what you meant to use, but it appeared to be punctutation meant to signify a measurement in inches - it comes out very screwy in the browser. For example, on the page
...saw/content/view/19/32, we have this interesting sentence:

Parts will have a slight taper along the cut edge of the part, typically between 0.007�? and 0.0005�?.

I tried several browser encodings and they all came out worse than that (which was UTF-8).

The other nitpick was the misspelling of "dimension" on page saw/content/view/17/32.
------------
"In terms of both hyperbolic overreaching and eventual wrongness, the Permanent [Republican] Majority has set a new, and truly difficult to beat, standard." --rusty

"editorial" vs "topical" (none / 1) (#18)
by LodeRunner on Tue Feb 21, 2006 at 12:12:35 PM EST

perhaps "offtopic" since the article is not about the website itself? ;)

---
"dude, you can't even spell your own name" -- Lode Runner
[ Parent ]

maybe (none / 1) (#19)
by chroma on Tue Feb 21, 2006 at 01:33:05 PM EST

I should just do an article about the website.

[ Parent ]
Did you do any investigations... (1.00 / 2) (#21)
by CanSpice on Tue Feb 21, 2006 at 04:33:12 PM EST

...into the combination of AdSense and Google Analytics? I don't run AdSense on my website, but it looks like Analytics can give you all sorts of ad-related information that you're looking for.

Analytics (none / 1) (#23)
by chroma on Tue Feb 21, 2006 at 05:38:03 PM EST

I tried to sign up for Google Analytics, but they're not accepting new registrations. I think that it's just the old Urchin software though.

[ Parent ]
+1 FP ---excellent, (2.00 / 2) (#26)
by terryfunk on Tue Feb 21, 2006 at 09:57:35 PM EST

i can use this. Thanks

I like you, I'll kill you last. - Killer Clown
The ScuttledMonkey: A Story Collection

I'll never understand (1.00 / 2) (#27)
by Psychology Sucks on Tue Feb 21, 2006 at 10:25:11 PM EST

the fascination with anything even remotely resembling programming.  It strikes me as nothing more than incredibly tedious (and frustrating).  

When a program finally comes together, I really don't understand the whole "wow! look what I accomplished" feeling.  

Whatever floats your boat. (none / 1) (#53)
by mr strange on Wed Feb 22, 2006 at 10:39:50 AM EST

The same could be said of any endeavour that requires effort.

Try not to be so judgemental.

intrigued by your idea that fascism is feminine - livus
[ Parent ]

pot.kettle.black, motherfucker [nt] (1.33 / 3) (#57)
by Cyan Magenta Yellow Black on Wed Feb 22, 2006 at 12:32:31 PM EST



[ Parent ]
What I'm trying to say is... (none / 0) (#59)
by mr strange on Wed Feb 22, 2006 at 02:34:58 PM EST

Don't be judgemental unless you know how to do it properly.

If I was president I would just blow up their fucking shitty island [Aruba] and be done with it - Acidify

intrigued by your idea that fascism is feminine - livus
[ Parent ]

Dealing with the Website Usage Problem (1.60 / 10) (#28)
by Josh Ferien on Tue Feb 21, 2006 at 10:57:16 PM EST

Experts agree: Website usage is a problem in America today. At home and in our schools, children log into websites with lurid and inappropriate content. A recent study showed that 4 out of 5 middle school children with internet access at school knew how construct a "bong," a device used to smoke marijuana (a.k.a. "weed"), out of an ordinary apple. Compare this to only 2 in 5 at traditional schools with no internet and 1 in 10 at Christian Academies with no internet access.

Add to this the well-publicized use of the web amongst Islamic extremists to organize terrorist activities, such as the recent attacks on the Danish embassy in Lebanon, and it is easy to see the need for powerful web site usage reporting applications.

While I agree that the website usage problem requires a great deal more attention, I am not so sure Open Source Software (OSS) is the right approach. Ultimately, the philosopy of OSS is that "information wants to be free," but if there's anything world events of the past few years have taught us, it's that you can't always get what you want. Indeed, if everyone got what they wanted, we'd all be dead.

In an era when "free information" could put the plans for an explosive device to level a federal building in the hands of an Islamic fanatic, we can't afford Free Software, especially when the new version of GMU General Public License explicity forbids use DRM (Digital Rights Management) technologies essential to maintaining the security and integrity of such vital information.

What is really needed is a partnership between established professional software manufacturers and defense contractors to come up with effective website usage reporting methodologies and applications. Perhaps some kind of federal level internet licensing or a model tied to some personally identifiable information (like a credit card number, for example) would be the right final solution to the illicit website usage problem.

Cordially,

Josh Ferien

The J is for Justice!

excuse me, but (none / 0) (#29)
by zombie actmodern on Tue Feb 21, 2006 at 11:26:22 PM EST

who the fuck are you?

[ Parent ]
a bong out of an apple? (none / 1) (#30)
by Russell Dovey on Tue Feb 21, 2006 at 11:36:33 PM EST

How the hell does that work?

"Blessed are the cracked, for they let in the light." - Spike Milligan
[ Parent ]

core out center of apple (2.50 / 2) (#32)
by great blue heron on Tue Feb 21, 2006 at 11:40:34 PM EST

push pipe stem through side of apple

fill with water and weed

smoke

repeat with other fruit/vegetables as time or inclination permits.


it's brainfart art - CTS
[ Parent ]

Even I know... (none / 1) (#38)
by Russell Dovey on Wed Feb 22, 2006 at 12:09:04 AM EST

...you're not supposed to put the weed in with the water.

"Blessed are the cracked, for they let in the light." - Spike Milligan
[ Parent ]

Encourage (3) - reads like Adequacy [nt] (2.50 / 4) (#55)
by Aurochs on Wed Feb 22, 2006 at 10:51:01 AM EST


--
if you willingly engange in political discussions with people named "Hans Testikelgr
[
Parent ]
Two Thumbs Up. (none / 0) (#81)
by student on Thu Feb 23, 2006 at 07:23:33 PM EST

I agree.  Grandparent is a nice troll.


Simon's Rock College of Bard, a college for younger scholars.
[ Parent ]
-1, StringTokenizer is deprecated (1.33 / 3) (#33)
by build test release on Tue Feb 21, 2006 at 11:42:31 PM EST

real men use String.split(char)

Is it? (2.50 / 2) (#34)
by LodeRunner on Tue Feb 21, 2006 at 11:48:45 PM EST

I've been away from the Java world as of late. I can see uses for a tokenizer that does its work on demand. Tokenizing the entire string at once can be costly for arbitrarily large strings, especially when you may not need the last tokens.

---
"dude, you can't even spell your own name" -- Lode Runner
[ Parent ]

yup (2.50 / 2) (#36)
by build test release on Tue Feb 21, 2006 at 11:54:37 PM EST

"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split  method of String or the java.util.regex package instead."

http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html

[ Parent ]

No, real men use assembly. [n/t] (3.00 / 2) (#50)
by sudog on Wed Feb 22, 2006 at 03:33:07 AM EST



[ Parent ]
Hell: *Java* is deprecated. (3.00 / 2) (#54)
by mr strange on Wed Feb 22, 2006 at 10:42:04 AM EST

I deprecate it, anyway.

intrigued by your idea that fascism is feminine - livus
[ Parent ]
like shootin fish in a barrel [nt] (2.50 / 2) (#56)
by Cyan Magenta Yellow Black on Wed Feb 22, 2006 at 12:31:50 PM EST



[ Parent ]
I'm so confused!!! (2.66 / 3) (#35)
by emo kid on Tue Feb 21, 2006 at 11:48:45 PM EST

what does "A Man's A Man For A' That" mean?

Adwords (none / 1) (#46)
by ccdotnet on Wed Feb 22, 2006 at 01:48:33 AM EST

It [Adwords] also doesn't make it easy to tell which ads are the most effective; it only gives aggregate data for all ads grouped into a "Campaign".

Have a closer look at Adwords. You can concurrently run multiple ad variations within a single group, reacting to the same set of keywords. Adwords will then show you separate Impression counts and CTRs for each variation. Allows you to play Darwin: mutation+natural selection, as applied to PPC advertising.

I think your last paragraph doesn't belong. It comes across as a knee-jerk attempt to make up for the fact you've devoted an entire article to teaching people how to invade user's privacy. I wonder if we'll reach the point one day where arriving at a website presents you with an up-front privacy warning: "this host employs advanced web log analysis, so we can work out your shoe size, click here to accept".

Adwords (none / 1) (#47)
by chroma on Wed Feb 22, 2006 at 01:56:12 AM EST

The Adwords tools don't let you figure out differences in behavior for users attracted by different ads once they reach your website. Impressions and click through ratios mean little to me; it's sales that I'm ultimately after.

[ Parent ]
sales (none / 1) (#52)
by ccdotnet on Wed Feb 22, 2006 at 04:29:31 AM EST

Impressions and click through ratios mean little to me; it's sales that I'm ultimately after.

Have you tried Adwords conversion tracking?

[ Parent ]

The final paragraph (none / 1) (#48)
by chroma on Wed Feb 22, 2006 at 02:02:48 AM EST

I'm using the web log data to make the website better. I'm not sharing details of the data with anyone (the results in the article are fudged). If you're using a website, you should expect that the creators know exactly what data you're sending and receiving. No privacy is invaded, as you've gone out of your way to visit my site, look at the pages, and occasionally fill out forms or upload files.

Tracking people across multiple sites is a different matter, as is sniffing requests that should simply be passing through your network.

[ Parent ]

Not earth-shattering, IMHO. (none / 1) (#51)
by sudog on Wed Feb 22, 2006 at 03:37:32 AM EST

You're being a little over-dramatic. You're not analyzing human thought by looking at patterns in an Apache log. Leave that to people like +fravia.


pglogd (none / 0) (#61)
by Vs on Wed Feb 22, 2006 at 04:00:55 PM EST

You could have saved some time by using pglogd:
http://www.digitalstratum.com/pglogd/

Hehe, it has some irony...you had to grow your own tool, they didn't consider using ODBC.
--
Where are the immoderate submissions?

plogd (none / 1) (#66)
by chroma on Wed Feb 22, 2006 at 07:32:34 PM EST

It doesn't seem to include a place for the tracking cookie. Also, it requires me to run PostgreSQL on the web server, which is yet another thing to deal  with.

[ Parent ]
Re: pglogd (none / 0) (#75)
by Vs on Thu Feb 23, 2006 at 04:15:04 AM EST

Yeah, it seems such a waste to have to go through "the analog hole" by dumping a string into a pipe (or file) and having to parse it again.

Especially since pipes suck, as taking a closer look into pglogd will show you.

I think you could get famous by providing a ODBC-comnpliant solution, integrated with the server.
DB-error handling will be a problem, though...

Volker
--
Where are the immoderate submissions?
[ Parent ]

why? (none / 0) (#65)
by Cattle Rustler on Wed Feb 22, 2006 at 07:31:34 PM EST

Um, google analytics rendered pretty much all of this useless. That is, unless you like pain and wasting your time writing yet another reporting app.

Google Analytics (none / 0) (#67)
by chroma on Wed Feb 22, 2006 at 07:43:00 PM EST

Please elaborate. Also, where is my Google Analytics invitation?

[ Parent ]
Well... (none / 0) (#73)
by Cattle Rustler on Wed Feb 22, 2006 at 10:04:58 PM EST

Conversion goal tracking - check
Cross References with just about anything - check
eCommerce tracking - check
ad tracking - check
IP -> Geolocation - check

Even if the package was $500 (it is free), it would still be cheaper to purchase it then it would to build it on your own.  Not only that but it is shiney, polished, and just works out of the box making a roll your own SQL solution a hard sell.

Not to beat up on you too much, but I really dont see how ANY of this has to do with privacy invation.  All these things do is track anonymous visitors.  Really, who cares?

[ Parent ]

The truth is (none / 0) (#74)
by chroma on Thu Feb 23, 2006 at 01:20:38 AM EST

I did this while waiting for my Google Analytics invitation to arrive.

When you say "Cross References with just about anything", what does that mean, exactly? Can  you give an example? How does it track repeat visits by individual users when they change IP addresses? Does it let you make reports that make sense to business users (e.g. "Your most popular products viewed are...")?

I'm genuinely curious about what Google Analytics can do, and Google isn't telling.

[ Parent ]

Cookies baby (none / 0) (#79)
by Cattle Rustler on Thu Feb 23, 2006 at 11:32:09 AM EST

So yes, it will track you across IP changes if the user has the same session key. Of course, if you went the DB route you could track via logged in users across cookies & IP's Cross references mean you can go "I want to see the breakdown of Mac users who came in from myspace, and by the way, how many of them bought my hot dog making kit?". And yes, you can get top selling products. Of course all of this requires you add little nuggets of HTML & javascript to plant the tracking cookie and to notify google when and how much each coversion was for.

[ Parent ]
Google Analytics hm. (none / 0) (#83)
by k1wi on Thu Jul 13, 2006 at 06:17:39 AM EST

Im actively involved in working on a metrics monitoring solution at the moment for a large site and I can definately tell that there is no "one size fits all solution". Sure, Google Analytics will give you a fair amount of information but it will not and can not substitute a custom build metrics enviroment. One has to build his own tool to be able to define the level of granularity his data analysis will operate on and the different comparison scenarios he will be using.

[ Parent ]
+1SP (1.25 / 4) (#77)
by fleece on Thu Feb 23, 2006 at 07:09:13 AM EST

that would be the boring nerdy crap section



I feel like some drunken crazed lunatic trying to outguess a cat ~ Louis Winthorpe III
+1FP _technology_ and culture, from the trenches (none / 0) (#78)
by LodeRunner on Thu Feb 23, 2006 at 07:27:12 AM EST

[nt] did not fit, again.

---
"dude, you can't even spell your own name" -- Lode Runner
[ Parent ]

there is even a module for that (none / 1) (#80)
by garaged on Thu Feb 23, 2006 at 02:17:29 PM EST

http://www.outoforder.cc/projects/apache/mod_log_sql/ Save yourself some time :-)

Nice (none / 0) (#82)
by MissMatch on Sat May 13, 2006 at 03:25:52 PM EST

This is a nice walk through on tracking the clicks though as pointed out by others there already tons of scripts that already do this though this shows us how it works.

Advanced Website Usage Reporting with Open Source Tools | 83 comments (43 topical, 40 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!