Server Log Files
Website statistics are based on server logs. A server log is a simple text file which records activity on the server. There are several types of server log — website owners are especially interested in access logs which record hits and related information.
Access logs come in several different formats but they all tend to look something like this:
151.44.15.252 - - [25/May/2004:00:17:20 +1200] "GET /cgi-bin/forum/commentary.pl/noframes/read/209 HTTP/1.1" 200 6863 "http://search.virgilio.it/search/cgi/search.cgi?qs=download+video+illegal+Berg&lr=&dom=s&offset=0&hits=10&switch=0&f=us" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:21 +1200] "GET /js/common.js HTTP/1.1" 200 2263 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:21 +1200] "GET /css/common.css HTTP/1.1" 200 6123 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:21 +1200] "GET /images/navigation/home1.gif HTTP/1.1" 200 2735 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:21 +1200] "GET /data/zookeeper/ico-100.gif HTTP/1.1" 200 196 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:22 +1200] "GET /adsense-alternate.html HTTP/1.1" 200 887 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
151.44.15.252 - - [25/May/2004:00:17:39 +1200] "GET /data/zookeeper/status.html HTTP/1.1" 200 4195 "http://www.mediacollege.com/cgi-bin/forum/commentary.pl/noframes/read/209" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 4.4.7.0)"
Each line in the log file represents one request (hit). If a visitor requests an HTML page which contains two images, three lines will be added to the log (one for the page and two for the images).
Each line may include some or all of the following information:
- The IP address of the computer making the request (i.e. the visitor)
- The identity of the computer making the request
- The login ID of the visitor
- The date and time of the hit
- The request method
- The location and name of the requested file
- The HTTP status code (e.g. file sent successfully, file not found, etc)
- The size of the requested file
- The web page which referred the hit (e.g. a web page containing a hyperlink which the visitor clicked to get here)
Making it Readable
The example above is known as a "raw log file" — it lists the information exactly as recorded by the server. Obviously this format is not particularly easy to read, and is all but useless for most people. Although some server admins have a use for raw log files, most webmasters need something much more human-friendly.
This is where log analysers come in. A log analyser is a software application which runs on either the server or a personal computer. It's job is to interpret log files and present the information in easy-to-read lists, graphs, etc. For more information see web statistics software.
Note that raw log files can become incredibly large very quickly. Most servers have a system of automatically deleting old log files periodically. A good log analyser will be able to retain the basic statistical information in it's own database after the raw log files have been deleted.