What is the difference between Successful Requests for Pages vs. Successful Requests, and why are the two totals so different?
Analog recognizes that certain file extensions (.html, .pdf) usually indicate a complete "page" on a site, whereas other file extensions (.gif, .jpeg, .wav) only indicate a "piece" of a page such as a graphic or a media file.
Analog counts the following file extensions as complete "pages":
.shtml, .html, .htm, .jsp, .cfm, .pl, .php, .pdf, .txt, .wp, .wpd, .exe, .doc, .zip, and /.
(Note: The "/" refers to requests made to a directory instead of a specific file (www.epa.gov/ instead of www.epa.gov/index.html).)
The number of requests for files with these "page" extensions will appear in the Successful Requests for Pages total. However, most "pages" consist of several non-page elements such as image files (.gif, .jpeg), media files (.wav, .mpeg), style sheets (.css), etc. Successful Requests counts all of these non-page elements, as well as the page files listed above.
For this reason, in almost all cases, the number of Successful Requests will be greater than the number of Successful Requests for Pages.
A note about .pdf files and how their retrieval can affect the number of requests:
Analog counts lines in web server log files. A single PDF file request can cause many lines in a Web server log file because these files can be delivered one page at a time, depending on the version of software used to create the PDF, and the version of the helper application used by the Web browser. Each page delivered is a log file entry and thus an additional "count" for that file.
What exactly is a request? Each time a file is requested from your server, it is a request. A call to an inline image (.gif, .jpeg), which is loaded separately by people with graphical browsers, counts as a specific request.
Therefore, if you have an HTML page on your site with 4 images on it, a visit to that page will be considered as 5 requests (one request for the html page, and 4 requests for the images.)
Unfortunately, you cannot tell with complete certainty the number of times a given file has been read. The user may request the file from a proxy server which already has a copy, or retrieve it from a local cache. Since no connection was made to your server, no request is scored.
Distinct Hosts Served - what does this mean? Is this the number of visitors to my site? A host is a computer which has made a request. The number of Distinct Hosts Served indicates how many separate and distinct IP addresses visited your site. However, it may not indicate exactly how many people visited your site. Many ISPs (Mindspring, AOL, etc) have a "pool" of IP addresses. An IP address is assigned when a subscriber connects, and then assigned to another subscriber once the first subscriber logs off.
Therefore, if your site has two Distinct Hosts Served, this may mean that two people visited your site. Or it could mean that one person visited the site Tuesday, and then re-visited the site Wednesday with a different IP.
It is impossible to accurately track "visits", "sessions," length of time spent on a site, repeat visits, etc.
What are Distinct Files Requested?
Distinct Files Requested indicates how many separate and distinct files were requested, as opposed to the total number of all requests.
For example, consider a report which shows 1000 Successful Requests, and five Distinct Files Requested. This means only five files on the site were requested. There were 1000 total requests for those five files combined.
In another example, consider a site with ten files. If the number of Successful Requests is ten, and Distinct Files Requested is also ten, this means that each file was requested once.
Distinct Files Requested includes all file types: .html, .htm, .jsp, .cfm, .pl, .php, .pdf, .txt, .wp, .wpd, .gif, etc.
What are Failed Requests?
Requests which return a webserver status code in the 400s (error in request) or 500s (server error). Failed requests come about for a variety of reasons, but the most common are when the requested file is not found or is read-protected.
What are Redirected Requests?
Requests which return a webserver status code in the 300s (except for 304). The most common cause of these requests is that the user has incorrectly requested a directory name without the trailing slash. The server replies with a redirection ("you probably mean the following") and the user then makes a second connection to get the correct document (although usually the browser does it automatically without the user's intervention or knowledge).
What does Total Data Transferred mean?
Total Data Transferred refers only to successful requests, and does not include the message header. Only actual data is included in the total data transferred.
What are Corrupt Logfile Lines? Does this mean my web pages are corrupted?
The number of Corrupt Logfile Lines indicates the number of web server log entries that Analog could not read or parse. It does not indicate corrupt databases or corrupt files on your site.
Web server log files can run into hundreds of thousands of lines, and there are many, many reasons why Analog may be unable to read a few of these lines. Unless the number of corrupt lines is exceptionally large, it should not be cause for concern.
Why does my Request Report not show every request for the month?
A file must be requested from the server at least 20 times in a particular month to show be reported in that month's Request Report. This is called a "request floor."
The floor prevents the Analog reports from becoming extremely large. Reporting all requests would result in disk space and readability issues, particularly for sites with large number of files/pages.
This section describes how analog defines its terms, and exactly what is counted in each category. It gets a bit technical at times -- if you're just trying to understand the output, I recommend you read the section on Analog's reports first.
We start with some basic definitions. The host is the computer which has asked you for a file (often called the "client"). The file might be a page (i.e., an HTML document) or it might be something else, such as an image. By default filenames ending in (case insensitive) .html, .htm, or / count as pages, but you can tell analog to count any file as a page with the PAGEINCLUDE command.
The total requests counts all the files which have been requested, including pages, graphics, etc. (Some people call this the number of hits, but that word is also used in other ways by other people, so I avoid it). The requests for pages obviously only counts pages. One user can generate many requests by requesting lots of different files, or the same file many times.
The referrer for a request is the place that the user (or his computer) heard about your file from. If he followed a link to reach a page, it will be the previous page. In the case of a graphic on a page, the referrer will be the page containing the graphic.
Analog's kilobytes are 1024 bytes. (If you prefer to call these kibibytes, you can do so by editing your language file.)
Analog recognises four categories of request, based on the HTTP status code of the request. You can see the total number of requests for each status code, and what the codes mean, in the Status Code Report. (Or see the HTTP spec for a detailed description.)
First, successful requests are those with HTTP status codes in the 200's (where the document was returned) or with code 304 (where the document was requested but was not needed because it had not been recently modified and the user could use a cached copy). (Actually, you can configure code 304 to be a redirected request instead of a successful request with the 304ISSUCCESS command.) Successful requests for pages refers to those lines on which the file requested was named and was a page.
Redirected requests are those with other codes in the 300's, indicating that the user was directed to a different file instead. The most common cause of these requests is that the user has incorrectly requested a directory name without the trailing slash. The server replies with a redirection ("you probably mean the following") and the user then makes a second connection to get the correct document (although usually the browser does it automatically without the user's intervention or knowledge). The other common cause of redirected requests is their use as "click-thru" advertising banners.
Failed requests are those with codes in the 400's (error in request) or 500's (server error). They come about for a variety of reasons, but the most common are when the requested file is not found or is read-protected.
Finally, requests returning informational status code are those with status codes in the 100's. These are very rare at the moment.
There are a few other types of logfile lines listed in the General Summary. Lines without status code refers to those logfile lines without a status code, and the successful requests in the General Summary only counts the ones with a status code: except if the line contains the name of the file requested, and the filename is being counted (not starred in the LOGFORMAT), then it's counted as a success. Unwanted logfile entries are ones which you have explicitly excluded. Finally, corrupt logfile lines are those which analog didn't manage to parse. (The number given is the number of unparseable lines in the whole logfile, even if the rest of the analysis is restricted to a small part of the logfile, because analog doesn't know whether a line would have been wanted if it couldn't parse it! You can list all the corrupt lines by turning debugging on.)
Most reports only include successful requests in calculating the number of requests, requests for pages, bytes, and last date: unless, of course, the report is a redirection or failure report. There is a further restriction on the time reports, the Status Code Report, the Processing Time Report, the File Size Report, and the bytes lines in the General Summary: the logfile line must also contain the name of the file requested, and the filename must be being counted. This is necessary to stop double counting if the server uses separate logs.
The "not listed" line at the bottom of each of the non-time reports represents those items which were not listed because they were below the floor for the report. (It doesn't include items which you've explicitly excluded.)
The figures in parentheses in the General Summary are for the last seven days: either the seven days before the TO time, or if no TO time is given, the seven days before the time of the program start. (It would be nicer to use the seven days before the last time in the logfile, but we don't know when this is until we've read the whole logfile, and by then it's too late.) The figures for the last seven days are not included if all, or none, of the requests fall in the last seven days.
In the Domain Report, "domain not given" means that the hostname did not contain a dot. "Unknown domain" means that it did contain a dot, but that the domain name was not in the domains file (or that the domains file could not be read). The hosts and domains concerned can be listed by turning debugging on.
In the Operating System Report, which browsers count as robots is controlled by the ROBOTINCLUDE and ROBOTEXCLUDE commands.
REF: link
ALSO check out
F.A.Q. - link