On 05/01/11 06:38, Alex Rousskov wrote:
Hello,
By default, Squid logs request URIs without any escaping. This works
OK in most cases because uri_whitespace defaults to "strip". Even when
URI has a space, the logged value does not have it, which keeps log
parsing scripts happy.
However, when Squid detects a malformed request (e.g., the URI scheme is
"rtsp"), Squid may log what it thinks is the raw URI, including any
spaces. This results in malformed access log entries such as:
[02/Jan/2011:20:55:15 +0100] 10.3.75.185
xml:lang="*" version="1.0" xmlns:stream="http://jabber.org/streams"
NONE/400 HTTP/0.0<stream:stream ...
[02/Jan/2011:21:03:54 +0100] 10.19.66.249
sip:10.38.26.67:80;transport=tcp SIP/2.0
NONE/400 HTTP/0.9 REGISTER ...
[02/Jan/2011:21:05:47 +0100] 10.228.123.186
rtsp://youtube.com/DjgMDA==video.3gp RTSP/1.0
NONE/400 HTTP/0.9 DESCRIBE ...
I split the logged lines above into three lines each for readability,
with the second line always being the request URI (%ru format code).
As you can see, such log entries are malformed and would be rather
difficult to interpret correctly due spaces in URIs and field-looking
protocol versions that are actually a part of %ru output.
While the above real-world examples use custom access log format, the
default behavior is the same.
We could (and possibly will) improve request parsing so that common
cases like RTSP and SIP requests do not get interpreted as malformed
HTTP/0.9 requests. However, that does not solve the more general case of
a truly malformed request like the very first example pasted above.
Our options include:
1) Apply uri_whitespace before logging malformed requests. This will
result in spaces stripped by default. The uri_whitespace option
description should probably be adjusted to recommend a different %ru
encoding the values that do not remove spaces from URLs.
2a) Strip spaces when logging %ru unless an explicit encoding is
specified for that option. To implement this, we would add
LOG_QUOTE_STRIP_SPACE log_quote value.
2b) Chop spaces when logging %ru unless an explicit encoding is
specified for that option. To implement this, we would add
LOG_QUOTE_CHOP_SPACE log_quote value.
2c) Replace spaces with %20 when logging %ru unless an explicit encoding
is specified for that option. To implement this, we would add
LOG_QUOTE_ENCODE_SPACE log_quote value. This is a little different from
encoding the entire URL because it would apply to spaces (and '%') only.
3) Add a new log_whitespace squid.conf option to allow the admin to
strip, chop, or encode space in all transaction log fields that do not
have an explicit setting. Default setting could be
LOG_QUOTE_ENCODE_SPACE, I guess. This will help avoid similar problems
in fields other than %ru.
My preference is (1), followed by (3), but I am not sure and may have
missed better options. What do you think?
definitely (1).
(3) seems like a good idea as a separate feature.
Along with (1) I think adding rtsp: and sip: as known protocols which
get rejected nicely until handled would be a good idea. The 3.2 parser
is ready now for handling unknown schemes as an error case.
Amos
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.10
Beta testers wanted for 3.2.0.4