Curated by: Luigi Canali De Rossi
 


Friday, August 31, 2001

Monitoring your web traffic online - Part II log analysis tools

Sponsored Links

In the last issue of MasterMind (Issue 9 - Summer Pack) I have presented and reported on how to monitor a web site traffic by utilizing a special set of services and tools called Live Online Trackers.

In this issue I am inviting the more adventurous individuals to start testing and evaluating a professional "log analysis tool" to complement and eventually partly replace your "online live tracker" system.

If you are completely SOLD on the effectiveness of monitoring your website traffic and of systematically using that information to improve your web site, you may want to seriously begin considering the use and purchase of a full- fledged log analysis tool.

Monitoring traffic on the Web means the ability to analyze the "trails" that each online visitor leaves behind, when she comes to see your web site.

All Internet web servers systematically save on their hard disks a so-called "log file" containing all of the visitors' trails. By utilizing a program that can churn this large amount of data and convert it into readable statistics, you have the embryo of a so-called log-analysis tool.

As I have explained in Issue 9 of MasterMind, the advantage of using live trackers versus log analysis tools is the ease of setup and use, as well as immediate access to traffic statistics.

So, you can either opt for ease of use and immediacy, or choose a delayed response with much greater depth and breadth of data.

A combination of the two would work best for most web sites, as the need for immediacy of reports can be easily covered by only a few trackers placed on key strategic pages.

The overall traffic analysis, (if you are serious about getting anywhere on the Web) must be executed through a log analysis tool. My suggestion is to do a good log analysis every 2-3 months, so you can better identify the trends and the "bigger picture" of what is happening on your web site.

Use instead live trackers to monitor tactical promotional actions on your part, and for example to verify the peaks of traffic you may receive after a certain announcement, or after you have sent out your newsletter or ezine.

So, while the trackers extract this information "live" from the prospective customers coming to your web site, the "access logs" on your server safely store this information automatically for you to access it afterwards.(After a set period of time, each month log file is automatically deleted by your hosting provider. Make sure you contact your hosting provider and find out how to access and download your log files).

Having first checked with your Internet 'hosting provider', you can directly access and download such "log" files.

To the human eye these files are pretty unreadable and would look something like this:

206.135.203.174 - - [19/Jul/1999:00:00:04 -0600] "GET /studio/drives.html HTTP/1.1" 200 20607 "http://www.marketing-of-training.com/ prosmotemasters.htm" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

If fed into a "log analysis" software, all those unreadable lines can produce very interesting and useful statistics.

The differences between a log analysis tools and a live online tracker are the following:

a) The analysis is performed only when you decide to use it

b) There is no performance penalty on any web page being monitored

c) Statistic and analysis produced can be far more detailed than the ones produced by live trackers

d) A great number of options and filters can be set to post-process and explore only the relevant data one is interested in uncovering

e) Log analysis tools work best when fed with large historical data loads, showing detailed comparison data between different years or months. Live trackers do not generally have this ability.

f) Traffic analysis can be run on any time period selected, while Live Trackers always report on the whole time period covered.

g) There are no recurring costs to use it.

 

If you go around most organizations and ask what they know about traffic monitoring, they will promptly show you a WebTrends report.

WebTrends has been one of the very first providers of log analysis tools since the early 90's, when the web was in its infancy. By being first on the market, WebTrends has been able to create tremendous strategic partnerships with hardware, software and Internet connectivity providers of all kinds. So it comes as no surprise that when you subscribe to any serious qualified web hosting provider, they offer access to "WebTrends" statistics for your web site within their monthly fee. They run the Web Trends log analysis software at their end, and you can access the statistics when you want.

And so for many hundreds of thousands of people WebTrends has become synonimous with Traffic Analysis. While many where hung up on this dream, quite a few companies and alternative products have come to market augmenting and improving on WebTrends own limitations and quirks. Not that WebTrends is a bad product. My concern is that you may consider all Toyotas as Ferraris and vice versa. While there maybe similarities between the two (both being cars), they are indeed world aparts.

So, if you are serious about log analysis, be aware that beyond the nice, colored charts of WebTrends, there are indeed many other capable products with different talents and skills. These could provide information reports and views that you may not have imagined could be possible using WebTrends.

Among the best log analysis software tools that I would personally recommend are as follows:

1) Funnel Web Analyzer 4.0 (USD $445)
http://www.quest.com/funnel_web/ analyzer/
Download 30-day trial version at:
http://www.quest.com/funnel_web/ analyzer/download.asp

2) Sawmill 6 (USD $99-399)
http://www.flowerfire.com/sawmill/
Download 30-day trial version at:
http://www.flowerfire.com/sawmill/ downloads.html

3) Summary 1.5 (USD $59-249)
http://www.summary.net/
Try a live demo of Summary.net at:
http://summary.net:7000/~demo/menu
or try the trial version for 30 days free at:

4) Coast WebMaster Pro 4.1 (US $995)
http://www.coast.ca/products/ webmasterpro.html

5) NetTracker 5.0 Prof. (USD $495)
http://www.sane.com/products/ NetTracker/

 

Low cost solutions, to get your feet wet include:

1) Openwebscope (USD $99)
http://openwebscope.com/
See this tool report demo at:
http://openwebscope.com/samples/ montaukstats.html
Download trial at:
http://www.maximized.com/download/ products/

2) Flashstats (USD $99)
http://www.maximized.com/products/ flashstats/
Free 30-day download trial
http://www.maximized.com/download/ products/

3) Analog FREE
http://www.statslab.cam.ac.uk/~sret1/ analog/
Download Analog software at:
http://www.statslab.cam.ac.uk/~sret1/ analog/download.html

 

or the industry-de facto standard solution:

WebTrends Log Analyzer 6.5 (USD $838)
http://www.webtrends.com/products/ log/default.htm
Free trial version available for download at:
http://www.webtrends.com/register/ trial.htm?regtype=Trial%20Install& prodtype=WebLog

 

If you think you would like to dig more to find out what other tools are available out there check out my:
Power Lists of Log Analysis Tools

a) Yahoo's selection
http://dir.yahoo.com/ Business_and_Economy/ Business_to_Business/ Communications_and_Networking/ Internet_and_World_Wide_Web/ Software/World_Wide_Web/ Log_Analysis_Tools/

b) Cnet Web site statistics tools
http://download.com.com/3120-20-0.html ?qt=Web+site+statistics&tg=dl-20& search=+Go%21+

c) List of immediately downloadable Freeware/Shareware/Demo tools for log analysis
http://download.cnet.com/downloads/ 1,10150,0-10001-103-0-1-7,00.html?tag =srch&qt=log+analysis&cn=&ca=10001

 

Read about the 10 key performance metrics other webmasters monitor in this free guide sponsored by WebTrends.
http://www.webtrends.com/products/ enterprise/contact.htm

Among the questions answered in this report find:

a) what paths do visitors follow through my web site?

b) what are the how many errors are visitors experiencing on my site?

c) what search engines and phrases are visitors using to get to my web site?

d) how many people fail to order products successfully on my web site?

e) how is my web site performing?

 

As I have said, a log file contains information about requests for files from your web site. Individual lines within your log file, contain information relating to the specific web page requested, the elements contained in it (images, sounds, downloadable PDF files, etc.), and the date/time of that request. Also to be found in the log file is much information about the user technographics (browser type and model, operating system, screen resolution and more).

Different web server software (e.g.: Apache vs. Microsoft Internet Information Server), save this information in the log file in different ways. In any case, they all consist of tens of thousands of lines that the log analysis tool has to process and input inside a custom database (usually an integral part of the log analysis tool itself). Once there, the data can be further processed to generate intelligent reports and statistics about web site traffic.

The key differences between different log analysis tools is not only in the number and typology of reports that can be created, or in the quality and design of their charts and diagrams. Differences can also be found in their:

a) reporting depth

b) methods of analysis

c) capability to further process and manipulate data

Along these indicators, Funnel Web Analyzer is the best performer in terms of speed and in quality of reports, charts and diagrams.

Sawmill and Summary also offer many interesting options, including their types of reports and effective diagramming features. They are also compatible with a large number of systems of server software.

 

Read an interesting article clarifying the most common site traffic indicators.

Understanding Web Metrics
http://cnet.com/webbuilding/pages/ Servers/Traffic/ss01.html

 

Seven unusual web traffic statistics I would want to get from my log analysis tool.

 

I have listed here a few traffic statistics that are uniquely important and which you should look for when you make your log analysis tool selection. I here provide the possible names for this statistic reports and the description of what they would be supposed to report on:

1) Search phrases - While most log analysis tools and online live trackers report on all single keywords utilized by visitors arriving to your web site via a search engine query, it would be very useful to see also "search phrases" in this statistical report. These would be made up of the growing number of search engine queries made with 2 or more words.

2) Referrer Steps - How deep inside your web site move visitors coming from different referring web sites. Which are the most valuable?

3) Least Requested - Instead of just showing only which pages get the highest number of visits, why not listing those that are basically useless?

4) Known Robots Shows data for hosts that are known to be robots. There are many different kinds of robots, they may index your site for a search engine, extract e-mail addresses for spamming, check to see if your site is working, or any of several other things. These hosts use 'agent' strings that are in Summary's internal database of known robots.

5) Transfer Rate in Bits/Sec Shows the distribution of modem speeds observed by accessing this site. Data is based on the measured transfer time for medium and larger files.

6) Domains Hijacking Graphics Shows the portion of visits referred by a given domain that loaded only graphics. This is typically caused by sites using your graphics on their pages.

7) Unique and repeat visitors. Let me tell you something that might surprise you: DO NOT GET EXCITED AT ALL WITH THE NUMBER OF VISITORS TO YOUR SITE. Why?

Read on.

// Myth breaking \\

The number of unique visitors to your site can be very misleading. It should not be treated per se as a success factor evaluation criteria, as it does not directly reflect effective sustainability or profitability of any web site.

Why?

First you and your organization staff are all affecting visitors data on the tracker by accessing the web site on a daily basis.

Some people have the home page of their organization set as the default page for their browser. So each time they fire up their navigation tool, you get another visitor to your site.

Even the webmaster who is working offline on her FrontPage or Dreamweaver application, continuously affects the tracker reports every time she previews or tests any page change. Yes, even when she is offline! As long as in the background she has an open Internet connection, and as long as the tracker code is on that page, each reload on her PC will be counted as a "real" visit.

Second, the number of visitors is really not a measure of success.

You can have the largest store on earth, and have even millions of people that go through it, but if they do not contribute directly to your welfare by purchasing, subscribing, exchanging, contributing, supporting or making some form of donations, your commercial life expectancy is not going to be very long.

So please, do not get excited yet. It is more important that we make sure that whoever comes is in a bad need of something, which you can immediately offer them as a solution..

Therefore we are looking at driving highly targeted prospects to the site, and not simply traffic.

Traffic by itself is actually a burden because it:

A) makes your website slower

B) distracts you from your key goals

C) requires extra customer service

D) requires extra technical support

E) produces inquiries and exchanges frequently not relevant to the key company objectives

F) makes your website less reliable

G) increases your web hosting costs

 

// The end of Myth Breaking \\

 

So, you may ask, how accurate are web traffic statistics?

While most of the information collected on web server log files is positively unambiguos, there are a number of factors to take into consideration when making final judgements or evaluations based on log analysis tools reports.

For example the mapping of individual IP addresses to specific users is by far not an exact process. To calculate the exact number of individual users in a reliable way, is by having users register to the site and access it through an authentication method.

The large use of "caching" technology by Internet providers like AOL and many others, produces lower access numbers than reality. What happens is that major Internet providers as well as large organizations approach the optimization of content delivery by saving the most recent and most accessed web content on a local hard disk. In this way users of that Internet service provider are often accessing the local provider cache instead of the actual online content of that web site. Since log files only record files requested from the server, the number registered in the log file can be misleadingly low.

Another area in which log analysis tools may not be telling the complete truth is by evaluating user sessions. More specifically, it evaluates the number of times a user visits your web site. Frequently this data may be flawed by wrong assumptions about IP addresses made by the log analysis tool. In other words, the way and criteria by which each tool determines what a "session" is can create misleading numbers.

Normally visitors sessions are computed by looking at the same IP address requesting one or more pages from your web site. By looking at the date/time stamp of each request, one could assume the overall time spent on the site. Further analysis could also show the individual times spent on each web page. These are considered sessions.

Criteria within log analysis tools can also help identify more precisely user behaviour on a web site. For example, if after 5 minutes, no further activity is recorded for a specific IP address accessing one of your web pages, it is assumed that this user has ended her session. She may have gone to answer a phone call, only to return ten minutes later to the session. This criteria can generally be manipulated easily within the better performing log analysis tools. Thus it is easy to manipulate the statistics in favour of what one wants to show.

It would be more correct to say that user sessions are an estimate and not an accurate account of real visits.

A well known solution around this is the use of cookies. These are relatively inoffensive information "headers" which are sent together with each one of your web page requests to the web site server. This codes an "identifier" specifically to you and can easily track you, and not somebody else from your same organization or company (which would normally appear under the same IP address).

The issue here is that many people fear cookies because of privacy issues. Therefore you cannot account or expect all of your web site visitors to have cookies enabled. According to my own online statistics gathered through seven online international web sites, I am presently monitoring more than 95% of web users who have cookies turned on. So draw your own conclusions.

The ability to track countries of origin may also be origin of unprecise information. While in many cases this is not an issue, my African friends for example, log into the Internet through their organization direct connection are recorded in my web server log as originating from a generic .org domain. There has been no way for my log analysis tool to discern what part of their world they are accessing my web site from.

All of my Asian friends utilize a dial-up connection from homes to an international service provider like Exodus.com. When they are logged onto the Internet, Exodus issues an IP address for them. That address carries either a .com, or an IP numeric address that traces back to the US or UK.

As you become more familiar with these tools, you can customize them to the point where the ambiguous figures and data do not significantly alter the statistics you need to look at. For example, some of the best tools in this roundup offer not only the ability to exclude your own IP address from the traffic reports, but to exclude search engine spiders and crawlers. These freely navigate web sites to index content and pages for your Intranet search engine or for one of the major search engines online.

 

Reference Readings

*If you want to see most of the tools I have recommended here reviewed against each other by a qualified independent magazine go to:
www.zdnet.com/products/stories/ reviews/0,4161,2570466,00.html

*An exhaustive set of reference articles on traffic monitoring and log analysis principles and tools can be found at Cnet Builder web site. It is a bit dated (1998) but it is still recommended reading. Find it at:
http://cnet.com/webbuilding/pages/ Servers/Traffic/index.html

*A more recent whitepaper from Sane's NetTracker company covers all the basics about traffic analysis for non-technical users.
This is a good read:
http://www.sane.com/products/ NetTracker/whitepaper.pdf

 

http://www.zdnet.com/products/stories/ reviews/0,4161,2570466,00.html

 
 
 
Readers' Comments    
blog comments powered by Disqus
 
posted by Robin Good on Friday, August 31 2001, updated on Tuesday, May 5 2015


Search this site for more with 

  •  

     

     

     

     

    478




     




    Curated by


    Publisher

    MasterNewMedia.org
    New media explorer
    Communication designer

     

    POP Newsletter

    Robin Good's Newsletter for Professional Online Publishers  

    Name:
    Email:

     

     
    Real Time Web Analytics