August 31, 2001



Monitoring your web traffic online - Part II log analysis tools

 

In the last issue of MasterMind (Issue 9 - Summer Pack) I have presented and reported on how to monitor a web site traffic by utilizing a special set of services and tools called Live Online Trackers.

In this issue I am inviting the more adventurous individuals to start testing and evaluating a professional "log analysis tool" to complement and eventually partly replace your "online live tracker" system.

If you are completely SOLD on the effectiveness of monitoring your website traffic and of systematically using that information to improve your web site, you may want to seriously begin considering the use and purchase of a full- fledged log analysis tool.

Monitoring traffic on the Web means the ability to analyze the "trails" that each online visitor leaves behind, when she comes to see your web site.

All Internet web servers systematically save on their hard disks a so-called "log file" containing all of the visitors' trails. By utilizing a program that can churn this large amount of data and convert it into readable statistics, you have the embryo of a so-called log-analysis tool.

As I have explained in Issue 9 of MasterMind, the advantage of using live trackers versus log analysis tools is the ease of setup and use, as well as immediate access to traffic statistics.

So, you can either opt for ease of use and immediacy, or choose a delayed response with much greater depth and breadth of data.

A combination of the two would work best for most web sites, as the need for immediacy of reports can be easily covered by only a few trackers placed on key strategic pages.

The overall traffic analysis, (if you are serious about getting anywhere on the Web) must be executed through a log analysis tool. My suggestion is to do a good log analysis every 2-3 months, so you can better identify the trends and the "bigger picture" of what is happening on your web site.

Use instead live trackers to monitor tactical promotional actions on your part, and for example to verify the peaks of traffic you may receive after a certain announcement, or after you have sent out your newsletter or ezine.

So, while the trackers extract this information "live" from the prospective customers coming to your web site, the "access logs" on your server safely store this information automatically for you to access it afterwards.(After a set period of time, each month log file is automatically deleted by your hosting provider. Make sure you contact your hosting provider and find out how to access and download your log files).

Having first checked with your Internet 'hosting provider', you can directly access and download such "log" files.

To the human eye these files are pretty unreadable and would look something like this:

206.135.203.174 - - [19/Jul/1999:00:00:04 -0600] "GET /studio/drives.html HTTP/1.1" 200 20607 "http://www.marketing-of-training.com/ prosmotemasters.htm" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

If fed into a "log analysis" software, all those unreadable lines can produce very interesting and useful statistics.

The differences between a log analysis tools and a live online tracker are the following:

a) The analysis is performed only when you decide to use it

b) There is no performance penalty on any web page being monitored

c) Statistic and analysis produced can be far more detailed than the ones produced by live trackers

d) A great number of options and filters can be set to post-process and explore only the relevant data one is interested in uncovering


e) Log analysis tools work best when fed with large historical data loads, showing detailed comparison data between different years or months. Live trackers do not generally have this ability.

f) Traffic analysis can be run on any time period selected, while Live Trackers always report on the whole time period covered.

g) There are no recurring costs to use it.

 

If you go around most organizations and ask what they know about traffic monitoring, they will promptly show you a WebTrends report.

WebTrends has been one of the very first providers of log analysis tools since the early 90's, when the web was in its infancy. By being first on the market, WebTrends has been able to create tremendous strategic partnerships with hardware, software and Internet connectivity providers of all kinds. So it comes as no surprise that when you subscribe to any serious qualified web hosting provider, they offer access to "WebTrends" statistics for your web site within their monthly fee. They run the Web Trends log analysis software at their end, and you can access the statistics when you want.

And so for many hundreds of thousands of people WebTrends has become synonimous with Traffic Analysis. While many where hung up on this dream, quite a few companies and alternative products have come to market augmenting and improving on WebTrends own limitations and quirks. Not that WebTrends is a bad product. My concern is that you may consider all Toyotas as Ferraris and vice versa. While there maybe similarities between the two (both being cars), they are indeed world aparts.

So, if you are serious about log analysis, be aware that beyond the nice, colored charts of WebTrends, there are indeed many other capable products with different talents and skills. These could provide information reports and views that you may not have imagined could be possible using WebTrends.

Among the best log analysis software tools that I would personally recommend are as follows:

1) Funnel Web Analyzer 4.0 (USD $445)
http://www.quest.com/funnel_web/ analyzer/
Download 30-day trial version at:
http://www.quest.com/funnel_web/ analyzer/download.asp

2) Sawmill 6 (USD $99-399)
http://www.flowerfire.com/sawmill/
Download 30-day trial version at:
http://www.flowerfire.com/sawmill/ downloads.html

3) Summary 1.5 (USD $59-249)
http://www.summary.net/
Try a live demo of Summary.net at:
http://summary.net:7000/~demo/menu
or try the trial version for 30 days free at:

4) Coast WebMaster Pro 4.1 (US $995)
http://www.coast.ca/products/ webmasterpro.html

5) NetTracker 5.0 Prof. (USD $495)
http://www.sane.com/products/ NetTracker/

 

Low cost solutions, to get your feet wet include:

1) Openwebscope (USD $99)
http://openwebscope.com/
See this tool report demo at:
http://openwebscope.com/samples/ montaukstats.html
Download trial at:
http://www.maximized.com/download/ products/

2) Flashstats (USD $99)
http://www.maximized.com/products/ flashstats/
Free 30-day download trial
http://www.maximized.com/download/ products/

3) Analog FREE
http://www.statslab.cam.ac.uk/~sret1/ analog/
Download Analog software at:
http://www.statslab.cam.ac.uk/~sret1/ analog/download.html

 

or the industry-de facto standard solution:

WebTrends Log Analyzer 6.5 (USD $838)
http://www.webtrends.com/products/ log/default.htm
Free trial version available for download at:
http://www.webtrends.com/register/ trial.htm?regtype=Trial%20Install& prodtype=WebLog

 

If you think you would like to dig more to find out what other tools are available out there check out my:
Power Lists of Log Analysis Tools

a) Yahoo's selection
http://dir.yahoo.com/ Business_and_Economy/ Business_to_Business/ Communications_and_Networking/ Internet_and_World_Wide_Web/ Software/World_Wide_Web/ Log_Analysis_Tools/

b) Cnet Web site statistics tools
http://download.com.com/3120-20-0.html ?qt=Web+site+statistics&tg=dl-20& search=+Go%21+

c) List of immediately downloadable Freeware/Shareware/Demo tools for log analysis
http://download.cnet.com/downloads/ 1,10150,0-10001-103-0-1-7,00.html?tag =srch&qt=log+analysis&cn=&ca=10001

 

Read about the 10 key performance metrics other webmasters monitor in this free guide sponsored by WebTrends.
http://www.webtrends.com/products/ enterprise/contact.htm

Among the questions answered in this report find:

a) what paths do visitors follow through my web site?

b) what are the how many errors are visitors experiencing on my site?

c) what search engines and phrases are visitors using to get to my web site?

d) how many people fail to order products successfully on my web site?

e) how is my web site performing?

 

As I have said, a log file contains information about requests for files from your web site. Individual lines within your log file, contain information relating to the specific web page requested, the elements contained in it (images, sounds, downloadable PDF files, etc.), and the date/time of that request. Also to be found in the log file is much information about the user technographics (browser type and model, operating system, screen resolution and more).

Different web server software (e.g.: Apache vs. Microsoft Internet Information Server), save this information in the log file in different ways. In any case, they all consist of tens of thousands of lines that the log analysis tool has to process and input inside a custom database (usually an integral part of the log analysis tool itself). Once there, the data can be further processed to generate intelligent reports and statistics about web site traffic.

The key differences between different log analysis tools is not only in the number and typology of reports that can be created, or in the quality and design of their charts and diagrams. Differences can also be found in their:

a) reporting depth

b) methods of analysis

c) capability to further process and manipulate data

Along these indicators, Funnel Web Analyzer is the best performer in terms of speed and in quality of reports, charts and diagrams.

Sawmill and Summary also offer many interesting options, including their types of reports and effective diagramming features. They are also compatible with a large number of systems of server software.

 

Read an interesting article clarifying the most common site traffic indicators.

Understanding Web Metrics
http://cnet.com/webbuilding/pages/ Servers/Traffic/ss01.html

 

Seven unusual web traffic statistics I would want to get from my log analysis tool.

 

I have listed here a few traffic statistics that are uniquely important and which you should look for when you make your log analysis tool selection. I here provide the possible names for this statistic reports and the description of what they would be supposed to report on:

1) Search phrases - While most log analysis tools and online live trackers report on all single keywords utilized by visitors arriving to your web site via a search engine query, it would be very useful to see also "search phrases" in this statistical report. These would be made up of the growing number of search engine queries made with 2 or more words.

2) Referrer Steps - How deep inside your web site move visitors coming from different referring web sites. Which are the most valuable?

3) Least Requested - Instead of just showing only which pages get the highest number of visits, why not listing those that are basically useless?

4) Known Robots Shows data for hosts that are known to be robots. There are many different kinds of robots, they may index your site for a search engine, extract e-mail addresses for spamming, check to see if your site is working, or any of several other things. These hosts use 'agent' strings that are in Summary's internal database of known robots.

5) Transfer Rate in Bits/Sec Shows the distribution of modem speeds observed by accessing this site. Data is based on the measured transfer time for medium and larger files.

6) Domains Hijacking Graphics Shows the portion of visits referred by a given domain that loaded only graphics. This is typically caused by sites using your graphics on their pages.

7) Unique and repeat visitors. Let me tell you something that might surprise you: DO NOT GET EXCITED AT ALL WITH THE NUMBER OF VISITORS TO YOUR SITE. Why?

Read on.

// Myth breaking \\

The number of unique visitors to your site can be very misleading. It should not be treated per se as a success factor evaluation criteria, as it does not directly reflect effective sustainability or profitability of any web site.

Why?

First you and your organization staff are all affecting visitors data on the tracker by accessing the web site on a daily basis.

Some people have the home page of their organization set as the default page for their browser. So each time they fire up their navigation tool, you get another visitor to your site.

Even the webmaster who is working offline on her FrontPage or Dreamweaver application, continuously affects the tracker reports every time she previews or tests any page change. Yes, even when she is offline! As long as in the background she has an open Internet connection, and as long as the tracker code is on that page, each reload on her PC will be counted as a "real" visit.

Second, the number of visitors is really not a measure of success.

You can have the largest store on earth, and have even millions of people that go through it, but if they do not contribute directly to your welfare by purchasing, subscribing, exchanging, contributing, supporting or making some form of donations, your commercial life expectancy is not going to be very long.

So please, do not get excited yet. It is more important that we make sure that whoever comes is in a bad need of something, which you can immediately offer them as a solution..

Therefore we are looking at driving highly targeted prospects to the site, and not simply traffic.

Traffic by itself is actually a burden because it:

A) makes your website slower

B) distracts you from your key goals

C) requires extra customer service

D) requires extra technical support

E) produces inquiries and exchanges frequently not relevant to the key company objectives

F) makes your website less reliable

G) increases your web hosting costs

 

// The end of Myth Breaking \\

 

So, you may ask, how accurate are web traffic statistics?

While most of the information collected on web server log files is positively unambiguos, there are a number of factors to take into consideration when making final judgements or evaluations based on log analysis tools reports.

For example the mapping of individual IP addresses to specific users is by far not an exact process. To calculate the exact number of individual users in a reliable way, is by having users register to the site and access it through an authentication method.

The large use of "caching" technology by Internet providers like AOL and many others, produces lower access numbers than reality. What happens is that major Internet providers as well as large organizations approach the optimization of content delivery by saving the most recent and most accessed web content on a local hard disk. In this way users of that Internet service provider are often accessing the local provider cache instead of the actual online content of that web site. Since log files only record files requested from the server, the number registered in the log file can be misleadingly low.

Another area in which log analysis tools may not be telling the complete truth is by evaluating user sessions. More specifically, it evaluates the number of times a user visits your web site. Frequently this data may be flawed by wrong assumptions about IP addresses made by the log analysis tool. In other words, the way and criteria by which each tool determines what a "session" is can create misleading numbers.

Normally visitors sessions are computed by looking at the same IP address requesting one or more pages from your web site. By looking at the date/time stamp of each request, one could assume the overall time spent on the site. Further analysis could also show the individual times spent on each web page. These are considered sessions.

Criteria within log analysis tools can also help identify more precisely user behaviour on a web site. For example, if after 5 minutes, no further activity is recorded for a specific IP address accessing one of your web pages, it is assumed that this user has ended her session. She may have gone to answer a phone call, only to return ten minutes later to the session. This criteria can generally be manipulated easily within the better performing log analysis tools. Thus it is easy to manipulate the statistics in favour of what one wants to show.

It would be more correct to say that user sessions are an estimate and not an accurate account of real visits.

A well known solution around this is the use of cookies. These are relatively inoffensive information "headers" which are sent together with each one of your web page requests to the web site server. This codes an "identifier" specifically to you and can easily track you, and not somebody else from your same organization or company (which would normally appear under the same IP address).

The issue here is that many people fear cookies because of privacy issues. Therefore you cannot account or expect all of your web site visitors to have cookies enabled. According to my own online statistics gathered through seven online international web sites, I am presently monitoring more than 95% of web users who have cookies turned on. So draw your own conclusions.

The ability to track countries of origin may also be origin of unprecise information. While in many cases this is not an issue, my African friends for example, log into the Internet through their organization direct connection are recorded in my web server log as originating from a generic .org domain. There has been no way for my log analysis tool to discern what part of their world they are accessing my web site from.

All of my Asian friends utilize a dial-up connection from homes to an international service provider like Exodus.com. When they are logged onto the Internet, Exodus issues an IP address for them. That address carries either a .com, or an IP numeric address that traces back to the US or UK.

As you become more familiar with these tools, you can customize them to the point where the ambiguous figures and data do not significantly alter the statistics you need to look at. For example, some of the best tools in this roundup offer not only the ability to exclude your own IP address from the traffic reports, but to exclude search engine spiders and crawlers. These freely navigate web sites to index content and pages for your Intranet search engine or for one of the major search engines online.

 

Reference Readings

*If you want to see most of the tools I have recommended here reviewed against each other by a qualified independent magazine go to:
www.zdnet.com/products/stories/ reviews/0,4161,2570466,00.html

*An exhaustive set of reference articles on traffic monitoring and log analysis principles and tools can be found at Cnet Builder web site. It is a bit dated (1998) but it is still recommended reading. Find it at:
http://cnet.com/webbuilding/pages/ Servers/Traffic/index.html

*A more recent whitepaper from Sane's NetTracker company covers all the basics about traffic analysis for non-technical users.
This is a good read:
http://www.sane.com/products/ NetTracker/whitepaper.pdf

 

http://www.zdnet.com/products/stories/ reviews/0,4161,2570466,00.html

Conversation Tags:
 
Readers' Comments    
Recent Articles


May 15, 2008
Video Metrics And Analytics For YouTube Clips: YouTube Insight


"Insight turns YouTube into one of the world's largest focus groups." (Source: Google Blog)If you have a YouTube account and you do upload video clips to it you may be interested in knowing that YouTube Insight, a free tool that enables anyone with a YouTube account... read more




April 8, 2008
Web Traffic Monitoring In Real-Time: Track And Interact Live With Your Site Visitors With Woopra


Woopra is a new real-time web traffic monitoring service which provides extensive visitors and traffic data, alongside some very cool analysis features and the unique ability to track and live interact with your selected site visitors. Woopra, the new advanced real-time traffic tracker As a matter of... read more




March 26, 2008
Analyze Your Web Traffic With Google Analytics: Video Tutorial


One of the most useful things you can do to better understand and improve the quality and quantity of web traffic on your site or blog, is to learn how to track, monitor, measure and make sense of the large quantity of data your web visitors... read more




January 22, 2007
Browser Compatibility Testing: BrowserCam Gets Better - Video Review


Browser compatibility testing is a web development practice that allows a web site owner to verify how her web site or blog, appears on computers utilizing browsers, operating systems and screen resolutions that are different from the one used in originally designing / developing a web... read more




January 6, 2007
Medical Imaging Technologies 2.0: Portable, Disposable, Wireless - Pillcam Is Here


The ubiquity of cheap, all but throwaway cameras is something we have come to take for granted. Web-cams and the advent of online video sharing have made broadcasting our internal lives to the world as easy and as commonplace as picking up the phone. But... read more




November 27, 2006
Digital Content Distribution Made Easy: Web Widgets - What They Are How They Can Bring New Life To Your Blog - Video Guide


Web widgets are tiny web applications that allow online publishers to easily distribute their content in a way that facilitates snappy integration by other blogs and web sites. In this video guide to web widgets I introduce you to these easy to use, highly interactive micro-applications... read more




posted by Robin Good on Friday, August 31 2001, updated on Saturday, January 21 2006


 

 

 

 

Understanding comes from exploration

Home | Subscribe | RSS Feeds | Site map | Syndicate
Consulting | Publications
About | Privacy | Contact

 

Creative Commons License
This work is licensed under a Creative Commons License.





View blog authority

 

478