Scripts description

These scripts allow you to monitor additional informations than the usual hits and pages statistics. The first three scripts will scan their own logfiles (or the standard one if you're using an extended logfile format).

You'll be able to watch out :

which browsers are used when people visits your Web.
where are they coming from and where do they access your Web.
deleted pages people are asking for or bad links in your HTML pages.

Two optional scripts (for registered user) will allow you to understand HOW people are visiting your Web and HOW to optimize your HTML pages.

All scripts are using the configuration file config.pl to know the options you have choosen. You don't have to modify in any case the scripts to run excepting if you find a bug of course or you want to alter the scripts.

Command line are supplied to run the script with specific needs. The '-x' flag will display default values.

List of scripts :

Agent, referer and error log
- cron-agent.pl (Optional)
- cron-refer.pl (Optional)
- cron-error.pl (Optional)
Session and Documents structures (for registered users only)
- cron-session.pl (Optional)
- cron-url.pl (Optional)

Cron-agent.pl

Aim Agent log stats
Compute browsers stats by scanning the agent log file.

Frequency None.
You can run the script when you want. I run it once every week or sometimes daily.

Time taken A few seconds to one or two minutes depending of the size of the agent log file.

Options

-d <number> number of days to scan (Extended NCSA logfile only)
-e output in english only
-f output in french only
-h help
-i <file> input agent logfile
-b bar charts graphs (Extended NCSA logfile only)
-l lines graphs (Extended NCSA logfile only)
-c lines filled graphs (Extended NCSA logfile only)
-p scan only HTML files (Extended NCSA logfile only)
-x display default values for flag options
-t <toplist> display only toplist browsers
-v display version

How it works It scan the agent log file to extract the most commonly used browser and operating system.

Notes Now work also with the extended common logfile produced by NSCA server > 1.5 Graphs is produced for NCSA server > 1.5 and show you which browsers are used versus time. You could add some new HTML features when people are using the right browser to show them.

Cron-refer.pl

Aim Referer log stats
Compute the pages where come most people.
It could be very useful to know which sites have a link to your web site (if you have to move to another URL for example).

Frequency None.
You can run the script when you want. I run it once every week.

Time taken A few seconds to one or two minutes depending of the size of the referer log file.

Options

-e output in english only
-f output in french only
-h help
-i <file> input agent logfile
-l local references include
-p <page> referer for this page
-t <toplist> display only toplist files
-x display default values for flag options
-v display version

How it works It scan the referer log file to extract where come people accessing your pages. It output the most frequent sites and pages where people come from and where they arrived in your site.

Notes Now work also with the extended common logfile produced by NSCA server > 1.5
If your host have several domain name, delete the rem in the code, it will be more slowly but more accurate !

Cron-error.pl

Aim Error log stats
It display the most common error from your web server.
A list of error due to files not found is produced also to check if the files are really missing.

Frequency None.
You can run the script when you want. I run it once every week.

Time taken A few seconds to one or two minutes depending of the size of the error log file.

Options

-a <tildealias> substitue ~ by the path alias
-e output in english only
-f output in french only
-h help
-i <file> input error logfile
-b bar charts graphs
-l lines graphs
-c lines filled graphs
-d <number> number of days to scan
-j <date> stats for this date only
-p 'file does not exist', HTML files only
-q <tri> 'file does not exist', matching string only
-r 'file does not exist', show referer page
-s <seuil> error with at least seuil requests is shown for 'file not found'
-t <toplist> display only toplist most found errors
-x display default values for flag options
-v display version

How it works It scan the error log file to extract the most common error server. It output also the documents your server is unable to futfill.

Notes You can add in the code other error message produced by your server. But be sure, your error message you'll add is not a part of another one. Graphs is produced showing you the error versus time. The aim is to have the most lower graphs with almost no 'file not found' error.
The page where come the missing file is also printed.
You don't have to wait for error happening to rectify wrong links in your pages....cron-url.pl is able to scan your documents tree and tell you about missing files in your links avoiding error log to become too big.

Cron-session.pl

Aim Session log stats
It compute how long people stay on your web by scanning the log file.

Frequency None.
You can run the script when you want.

Time taken A few minutes depending of the size of the log file.

Options

-e output in english only
-f output in french only
-h help
-i <file> input logfile
-t <tlim> session maximum length
-r <tleclim> maximum time to read a page
-v display version

How it works It's very hard to know how long people stay on your web as they can access a page, going to lunch and have a second access two hours later. But people usually have a more or less longer look at your web and only come back another day.
In the script, you have a maximum time limit session variable. If an access is made within this time, it's still the same session.
Another way is to select a time limit when reading a page. Usually, people doesn't need more than one hour to read a HTML page !
Accesses from network spider (robots) are removed.

Notes If you have a dynamic IP address (Internet provider for example), different people can come with the same IP address and it become very hard to know about session users !
The script will also output the average requests by hour and by the day of the week.

Cron-url.pl

Aim Documents stats
Compute how your web is looking. Do you have a multimedia, graphical and heavy web ?
It will also translate the URL to the TITLE of the file and show you the most recent html files on your web.
Also a detailled server tree is output.

Frequency None.
You can run the script when you want. I run it once a week.

Time taken A few minutes depending of the size of your web. .

Options

-e output in english only
-f output in french only
-h help
-a <tildealias> server alias for the ~
-d <nbdays> show file newest than nbdays days
-v display version

How it works It scans your web structure, counting for files, opening each file. Histograms showing how many links, images per document is produced and also a graphs showing you the documents size distribution.
A histogram show the most recent file updated in your web.
A translation table is made between the URL of a document and its name (found inside the TITLE tag).
The structure (tree) of your web is also show with detail about HTML pages inside each part of the tree.
It also check every links and report missing files.

Notes Could be useful if you want to check every HTML document have a TITLE tag and is unique. Could also show you if you have heavy pages !
People could go directly to new html documents in the tree server or from the 'new documents' pages.
If lots of web are running this script, it will be possible to compare web site !