The VTP project (vterrain.org) recently had an interesting experience where
bandwidth-monitoring (and billing) procedures reported an enormous bandwidth spike,
resulting in a big hosting bill.
The billing situation was resolved amicably, but in the spirit of understanding what
really happened, we've been investigating the cause and effect. It appears that the
traffic came from (legitimate, non-malicious) Far East users, probably on the other side
of a problematic Internet connection, most likely caused by the damage to an undersea
cable in December. The result of the network unreliability was that they had difficulty
downloading large data intact from the VTP server, and employed a common
download-accelerator that is able to keep retrying the server and resuming a file-transfer
partway through. All of this is very normal and common.
From the logs, it would appear that the download accelerator requests the file with a
Range header (with a fairly large range). Then, Apache begins sending the data. Shortly,
the connection fails (after the client has only received a little of the data) and Apache
logs the request. The process repeats. Eventually, the client does receive the entire
file, but not until after many many 206 (partial-content) entries are logged.
According to the host (Hurricane Electric, HE.net):
http://www.he.net/faq/traffic_storage.html
"What Methods do you use to determine traffic usage?
We determine web traffic usage by extracting information from the access_log files
generated by the HTTP daemon."
Based on this, we come to some interesting contradictions:
>From http://vterrain.org/.status/web.html
Time Requests Bytes Sent
----- -------- ------------
00:00 893 18955972
01:00 22384 141295771628
02:00 4330 3881123738
03:00 626 7740512
Yup, in one hour, a naive counting of Apache's "bytes transferred" says
there was 141 GB of traffic. About a month's worth of normal traffic - in
an hour.
Apache's log has endless amounts of this:
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 5127115
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362
If those were real bytes transferred, it would be something like 400
MB/second. This is to distant users happy to get K/second.
So, this raises a couple of interesting questions.
1. Is HE.net making a mistake in using access_log to count bandwidth?
1a. What would be the right way to do it?
1b. Are they doing it this way because they're using some existing tool that does it
this way?
1c. Are other hosts doing it this way, and therefore are mistakenly over-measuring and
over-charging customers?
2. What exactly does the number after the 206 code in the access_log mean? Is it simply
the range the client _requested_ via the Range header? In which case, it has no real
relationship to how much data was _actually_ transferred?
Appreciate any insight from anyone. As I said before, we resolved the billing problem
with HE.net without any problem, and have been very happy with their service. But, if this
is a common mistake (bandwidth measurement via access_log) then the nature of the
misunderstanding should be made more public so that others can ensure they aren't burned
by it.
I apologize for any mistakes in terminology or assumptions in my message. I'm not an
Apache guru and I don't play one on TV. I'm a 3D graphics programmer.
--
Chris 'Xenon' Hanson | Xenon @ 3D Nature | http://www.3DNature.com/
"I set the wheels in motion, turn up all the machines, activate the programs,
and run behind the scenes. I set the clouds in motion, turn up light and
sound,
activate the window, and watch the world go 'round." -Prime Mover, Rush.
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
" from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]