I've taken all of your suggestions, except as noted below. Thanks for your detailed feedback.
Adam On Mon, Sep 26, 2011 at 2:27 PM, timeless <timel...@gmail.com> wrote: >> Otherwise, if the octets in s starting at pos match any of the sequences of >> octets in the first column of the following table, then the user agent MUST >> follow the steps given in the corresponding cell in the second column of the >> same row. | > > What's the stray `|` character at the end of that doing? > > The ToC feels double spaced, is that normal? > > Would you mind quoting your attributes in source? Things like > class=no-num or href=#web-data scare me. It's easier if you just quote > all attributes :) > > Also, I generally recommend `<span ...>x</span> ` over `<span ...>x > </span>` <- i.e. trailing space outside of span (see toc) > >> <p>Many web servers supply incorrect Content-Type header fields with their >> HTTP > > Can you mark up `Content-Type` in something which results in roughly > "typewriter" font? > > s/user agents/User Agents/ as in: >> responses. In order to be compatible with these servers, user agents >> consider > >> Without a clear specification of how to "sniff" the media type, each user >> agent implementor was forced to reverse engineer the behavior of the other >> user agents and to develop > > s/the other/other/ -- there are some UAs who were ignored when the > sniffing of a given UA was developed :) > >> their own algorithm > > I'm not sure if `algorithm` here belongs in singular or plural, I got > distracted :) > >> an HTTP response to be interpreted as one media type but some user agents >> interpret the responses as another media type. > > s/responses/response/ (agreement with first part) > >> However, if a user agent does interpret a low-privilege media type, such as >> image/gif, as a high-privilege media type, such as text/html, the user agent >> has created a privilege escalation vulnerability in the server. > > s/, the user agent/, then the user agent/ > > > > I believe abarth has addressed the above. > >> This document describes a content sniffing algorithm that carefully balances >> the compatibility needs of user agent implementors with the security >> constraints. > > `the security constraints` is problematic, I don't think `the` > references anything > so either drop `the`, or provide a reference :/ > >> and metrics collected from implementations deployed to a sizable number of >> users . > > s/ ././ There's actually a reference that goes there. I just haven't figured out how to do references yet. >> (such as "strip any leading space characters" or "return false and abort >> these steps") are to be interpreted with the meaning of the key word >> ("MUST", "SHOULD", "MAY", etc) > > s/etc/etc./g > > "official-type" should probably be given some styling -- preferably > not the same styling as "Content-Type" > >> (Such messages are invalid according to RFC2616. > > s/./.)/ > > The rfcs should be href references of some sort :) Yeah, I need to crack the references problem at some point. :) >> For octets received via HTTP, the Content-Type HTTP header field, if >> present, indicates the media type. Let the official-type be the media type >> indicted by the HTTP Content-Type header field, if present. If the >> Content-Type header field is absent or if its value cannot be interpreted as >> a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') >> character), then there is no official-type. (Such messages are invalid >> according to RFC2616. > >> If an HTTP response contains multiple Content-Type header fields, the User >> Agent MUST use the textually last Content-Type header field to the >> official-type. For example, if the last Content-Type header field contains >> the value "foo", then there is no official media type because "foo" cannot >> be interpreted as a media type (even if the HTTP response contains another >> Content-Type header field that could be interpreted as a media type). > > The for example part here applies to the previous paragraph, the > sentence needs to be moved to the paragraph before the instruction for > multiple header fields. It's an example that combines both rules. >> FTP RFC0959 > > Is there a reason for the leading 0? > >> Comparisons between media types, as defined by MIME specifications, are done >> in an ASCII case-insensitive manner. [RFC2046] > > You need to somehow note that this is merely a note about mime > equivalence and doesn't relate to how the spec works. I'm not sure I understand. It's in green and labeled as a "note". >> If the official-type ends in "+xml", or if it is either "text/xml" or >> "application/xml", then let the sniffed-type be the official-type and abort >> these steps. > > Please mark up `sniffed-type` and `official-type` > >> If the official-type is an image type supported by the User Agent (e.g., >> "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" >> section below. > > s/etc// > >> If none of the first n octets are binary data octets then let the >> sniffed-type be "text/plain" and abort these steps. >> Binary Data Byte Ranges > > You don't actually define a `binary data octet` as any item within the > ranges defined in the `binary data byte ranges`. > >> If the first octets match one of the octet sequences in the "pattern" column >> of the table in the "unknown type" section below, ignoring any rows whose >> cell in the "security" column says "scriptable" (or "n/a"), then let the >> sniffed-type be the type given in the corresponding cell in the "sniffed >> type" column on that row and abort these steps. > > If you could make `"unknown type" section` a link to the section, that > would be helpful. > >> For each row in the table below: >> If the row has no "WS" octets: > > I know that "WS" appears in the table below, but it hasn't been > defined yet, and I don't want to guess what it means (whitespace?) -- > I guessed wrong for the other one. > >> If the row has a "WS" octet or a "_>" octet: > >> "WS" means "whitespace", and allows insignificant whitespace to be skipped >> when sniffing for a type signature. > > Oh, so that's where you hid the definition -- way too late :) > >> "_>" means "space-or-bracket", and allows HTML tag names to terminate with >> either a space or a greater than sign. > > Oh _ doesn't mean underscore > > Please put those definitions before their use, not way below their use :( I'm tempted to just rename them to be less semantic. They're just symbols that don't mean anything, really. >> If the octets of the masked-data matches the given pattern octets exactly, >> then let the sniffed-type be the type given in the cell of the third column >> in that row and abort these steps. > > s/matches/match/ > >> LOOP: If index-stream points beyond the end of the octet stream, then this >> row doesn't match and skip this row. > > Please style `LOOP` > >> If the index-pattern-th octet of the pattern is a normal hexadecimal octet >> and not a "WS" octet or a "_>" octet: > > s/or a/nor a/ > s/not/neither/ > > >> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A >> (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then >> increment only the index-stream to the next octet in the octet stream. > > If you could style the 0xXX items in something <tt>-ish, that'd be > appreciated. > ... And if you could style the names (ASCII TAB, etc.) in something, > that'd also be appreciated. That's a lot of editing! I'm not sure that buys us much. >> If the first n octets match the signature for MP4 (as define in ), then let >> the sniffed-type be video/mp4 and abort these steps. > > s/define/defined/ > > -- The markup you're using failed to generate a visible-reference, > could you get the tool to generate an XXX when it fails? :) > >> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case >> sensitivity and lack of trailing _>) > > s/sensitivity/sensitivity [mask = FF instead of DF]/ > >> A JPEG SOI marker followed by a octet of another marker. > > s/a octet/an octet/ > > -- the table doesn't currently handle .SWF; in the past, that has been a > problem > http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml That is intentional. Sniffing SWF is bad times. >> If n is less than 4, then the sequence does not match the signature for MP4 >> and abort these steps. > > `and` doesn't work; s/ and/;/ ? > > In all previous cases, the form was `let foo and abort these steps`; > here it's `then <statement of truth> and`. > > The fix is probably to move to "return TRUTH/FALSE value and abort > these steps" (or let state-determined-truth-value-be TRUTH/FASLE value > and ...). Hum... I see the problem. >> For each I from 2 to box-size/4 - 1 (inclusive): > > If you could put `box-size/4 - 1` into some markup to indicate that > it's a math section, that'd be helpful. I put it in <code>. I'm not sure that's the prettiest, but we can iterate. >> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 >> (the ASCII string "mp4"), then the sequence does match the signature for MP4 >> and abort these steps. > > And here for `4*i` and `4*i + 2` > > I think you need s/If octets/If any octets/, otherwise, it's ambiguous > between `any` and `all`. > >> 7 Images > ... >> Otherwise, let the sniffed-type be the official-type and abort these >> steps. > > I'd rather otherwise be step 3 instead of part of the bulleted list > inside step 2 :) >> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, >> 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump >> back to the previous step (the step labeled loop start) in the overall >> algorithm in this section. > > `loop start` should be a link to the LOOP label and preferably have > the same case as the LOOP label. > >> Return to step 2 in these substeps. > > It'd be nice if this was a link to an anchor in the right part of the steps. > >> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be >> "application/rss+xml" and abort these steps. > > s/and/or/ ?? and is correct. I've made it strong. It's got to have both qualities before we'll change the type. Adam