On 01/04/2011 10:38 AM, Alex Rousskov wrote: > Hello, > > By default, Squid logs request URIs without any escaping. This works > OK in most cases because uri_whitespace defaults to "strip". Even when > URI has a space, the logged value does not have it, which keeps log > parsing scripts happy. > > However, when Squid detects a malformed request (e.g., the URI scheme is > "rtsp"), Squid may log what it thinks is the raw URI, including any > spaces. This results in malformed access log entries such as: > > [02/Jan/2011:20:55:15 +0100] 10.3.75.185 > xml:lang="*" version="1.0" xmlns:stream="http://jabber.org/streams" > NONE/400 HTTP/0.0 <stream:stream ... > > [02/Jan/2011:21:03:54 +0100] 10.19.66.249 > sip:10.38.26.67:80;transport=tcp SIP/2.0 > NONE/400 HTTP/0.9 REGISTER ... > > [02/Jan/2011:21:05:47 +0100] 10.228.123.186 > rtsp://youtube.com/DjgMDA==video.3gp RTSP/1.0 > NONE/400 HTTP/0.9 DESCRIBE ... > > I split the logged lines above into three lines each for readability, > with the second line always being the request URI (%ru format code). > > As you can see, such log entries are malformed and would be rather > difficult to interpret correctly due spaces in URIs and field-looking > protocol versions that are actually a part of %ru output. > > While the above real-world examples use custom access log format, the > default behavior is the same. > > > We could (and possibly will) improve request parsing so that common > cases like RTSP and SIP requests do not get interpreted as malformed > HTTP/0.9 requests. However, that does not solve the more general case of > a truly malformed request like the very first example pasted above. > > > Our options include: > > 1) Apply uri_whitespace before logging malformed requests. This will > result in spaces stripped by default. The uri_whitespace option > description should probably be adjusted to recommend a different %ru > encoding for those who do not want to remove spaces from logged URLs. > > 2a) Strip spaces when logging %ru unless an explicit encoding is > specified for that option. To implement this, we would add > LOG_QUOTE_STRIP_SPACE log_quote value. > > 2b) Chop spaces when logging %ru unless an explicit encoding is > specified for that option. To implement this, we would add > LOG_QUOTE_CHOP_SPACE log_quote value. > > 2c) Replace spaces with %20 when logging %ru unless an explicit encoding > is specified for that option. To implement this, we would add > LOG_QUOTE_ENCODE_SPACE log_quote value. This is a little different from > encoding the entire URL because it would apply to spaces (and '%') only. > > 3) Add a new log_whitespace squid.conf option to allow the admin to > strip, chop, or encode space in all transaction log fields that do not > have an explicit setting. Default setting could be > LOG_QUOTE_ENCODE_SPACE, I guess. This will help avoid similar problems > in fields other than %ru. > > > My preference is (1), followed by (3), but I am not sure and may have > missed better options. What do you think?
Any objections to option #1 or better ideas? Thank you, Alex. > P.S. One could argue that logging URIs with stripped or chopped spaces > is wrong because it hides potentially critical information, but that is > a different question that I do not want to discuss in this particular > thread.
