> Otherwise, if the octets in s starting at pos match any of the sequences of > octets in the first column of the following table, then the user agent MUST > follow the steps given in the corresponding cell in the second column of the > same row. |
What's the stray `|` character at the end of that doing? The ToC feels double spaced, is that normal? Would you mind quoting your attributes in source? Things like class=no-num or href=#web-data scare me. It's easier if you just quote all attributes :) Also, I generally recommend `<span ...>x</span> ` over `<span ...>x </span>` <- i.e. trailing space outside of span (see toc) > <p>Many web servers supply incorrect Content-Type header fields with their > HTTP Can you mark up `Content-Type` in something which results in roughly "typewriter" font? s/user agents/User Agents/ as in: > responses. In order to be compatible with these servers, user agents consider > Without a clear specification of how to "sniff" the media type, each user > agent implementor was forced to reverse engineer the behavior of the other > user agents and to develop s/the other/other/ -- there are some UAs who were ignored when the sniffing of a given UA was developed :) > their own algorithm I'm not sure if `algorithm` here belongs in singular or plural, I got distracted :) > an HTTP response to be interpreted as one media type but some user agents > interpret the responses as another media type. s/responses/response/ (agreement with first part) > However, if a user agent does interpret a low-privilege media type, such as > image/gif, as a high-privilege media type, such as text/html, the user agent > has created a privilege escalation vulnerability in the server. s/, the user agent/, then the user agent/ I believe abarth has addressed the above. > This document describes a content sniffing algorithm that carefully balances > the compatibility needs of user agent implementors with the security > constraints. `the security constraints` is problematic, I don't think `the` references anything so either drop `the`, or provide a reference :/ > and metrics collected from implementations deployed to a sizable number of > users . s/ ././ > (such as "strip any leading space characters" or "return false and abort > these steps") are to be interpreted with the meaning of the key word ("MUST", > "SHOULD", "MAY", etc) s/etc/etc./g "official-type" should probably be given some styling -- preferably not the same styling as "Content-Type" > (Such messages are invalid according to RFC2616. s/./.)/ The rfcs should be href references of some sort :) > For octets received via HTTP, the Content-Type HTTP header field, if present, > indicates the media type. Let the official-type be the media type indicted by > the HTTP Content-Type header field, if present. If the Content-Type header > field is absent or if its value cannot be interpreted as a media type (e.g. > because its value doesn't contain a U+002F SOLIDUS ('/') character), then > there is no official-type. (Such messages are invalid according to RFC2616. > If an HTTP response contains multiple Content-Type header fields, the User > Agent MUST use the textually last Content-Type header field to the > official-type. For example, if the last Content-Type header field contains > the value "foo", then there is no official media type because "foo" cannot be > interpreted as a media type (even if the HTTP response contains another > Content-Type header field that could be interpreted as a media type). The for example part here applies to the previous paragraph, the sentence needs to be moved to the paragraph before the instruction for multiple header fields. > FTP RFC0959 Is there a reason for the leading 0? > Comparisons between media types, as defined by MIME specifications, are done > in an ASCII case-insensitive manner. [RFC2046] You need to somehow note that this is merely a note about mime equivalence and doesn't relate to how the spec works. > If the official-type ends in "+xml", or if it is either "text/xml" or > "application/xml", then let the sniffed-type be the official-type and abort > these steps. Please mark up `sniffed-type` and `official-type` > If the official-type is an image type supported by the User Agent (e.g., > "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" > section below. s/etc// > If none of the first n octets are binary data octets then let the > sniffed-type be "text/plain" and abort these steps. > Binary Data Byte Ranges You don't actually define a `binary data octet` as any item within the ranges defined in the `binary data byte ranges`. > If the first octets match one of the octet sequences in the "pattern" column > of the table in the "unknown type" section below, ignoring any rows whose > cell in the "security" column says "scriptable" (or "n/a"), then let the > sniffed-type be the type given in the corresponding cell in the "sniffed > type" column on that row and abort these steps. If you could make `"unknown type" section` a link to the section, that would be helpful. > For each row in the table below: > If the row has no "WS" octets: I know that "WS" appears in the table below, but it hasn't been defined yet, and I don't want to guess what it means (whitespace?) -- I guessed wrong for the other one. > If the row has a "WS" octet or a "_>" octet: > "WS" means "whitespace", and allows insignificant whitespace to be skipped > when sniffing for a type signature. Oh, so that's where you hid the definition -- way too late :) > "_>" means "space-or-bracket", and allows HTML tag names to terminate with > either a space or a greater than sign. Oh _ doesn't mean underscore Please put those definitions before their use, not way below their use :( > If the octets of the masked-data matches the given pattern octets exactly, > then let the sniffed-type be the type given in the cell of the third column > in that row and abort these steps. s/matches/match/ > LOOP: If index-stream points beyond the end of the octet stream, then this > row doesn't match and skip this row. Please style `LOOP` > If the index-pattern-th octet of the pattern is a normal hexadecimal octet > and not a "WS" octet or a "_>" octet: s/or a/nor a/ s/not/neither/ > If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A > (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then > increment only the index-stream to the next octet in the octet stream. If you could style the 0xXX items in something <tt>-ish, that'd be appreciated. ... And if you could style the names (ASCII TAB, etc.) in something, that'd also be appreciated. > If the first n octets match the signature for MP4 (as define in ), then let > the sniffed-type be video/mp4 and abort these steps. s/define/defined/ -- The markup you're using failed to generate a visible-reference, could you get the tool to generate an XXX when it fails? :) > FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case > sensitivity and lack of trailing _>) s/sensitivity/sensitivity [mask = FF instead of DF]/ > A JPEG SOI marker followed by a octet of another marker. s/a octet/an octet/ -- the table doesn't currently handle .SWF; in the past, that has been a problem http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml > If n is less than 4, then the sequence does not match the signature for MP4 > and abort these steps. `and` doesn't work; s/ and/;/ ? In all previous cases, the form was `let foo and abort these steps`; here it's `then <statement of truth> and`. The fix is probably to move to "return TRUTH/FALSE value and abort these steps" (or let state-determined-truth-value-be TRUTH/FASLE value and ...). > For each I from 2 to box-size/4 - 1 (inclusive): If you could put `box-size/4 - 1` into some markup to indicate that it's a math section, that'd be helpful. > If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 > (the ASCII string "mp4"), then the sequence does match the signature for MP4 > and abort these steps. And here for `4*i` and `4*i + 2` I think you need s/If octets/If any octets/, otherwise, it's ambiguous between `any` and `all`. > 7 Images ... > Otherwise, let the sniffed-type be the official-type and abort these > steps. I'd rather otherwise be step 3 instead of part of the bulleted list inside step 2 > If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, > 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump > back to the previous step (the step labeled loop start) in the overall > algorithm in this section. `loop start` should be a link to the LOOP label and preferably have the same case as the LOOP label. > Return to step 2 in these substeps. It'd be nice if this was a link to an anchor in the right part of the steps. > If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be > "application/rss+xml" and abort these steps. s/and/or/ ??