I've taken all of your suggestions, except as noted below.  Thanks for
your detailed feedback.

Adam


On Mon, Sep 26, 2011 at 2:27 PM, timeless <timel...@gmail.com> wrote:
>> Otherwise, if the octets in s starting at pos match any of the sequences of 
>> octets in the first column of the following table, then the user agent MUST 
>> follow the steps given in the corresponding cell in the second column of the 
>> same row. |
>
> What's the stray `|` character at the end of that doing?
>
> The ToC feels double spaced, is that normal?
>
> Would you mind quoting your attributes in source? Things like
> class=no-num or href=#web-data scare me. It's easier if you just quote
> all attributes :)
>
> Also, I generally recommend `<span ...>x</span> ` over `<span ...>x
> </span>` <- i.e. trailing space outside of span (see toc)
>
>> <p>Many web servers supply incorrect Content-Type header fields with their 
>> HTTP
>
> Can you mark up `Content-Type` in something which results in roughly
> "typewriter" font?
>
> s/user agents/User Agents/ as in:
>> responses.  In order to be compatible with these servers, user agents 
>> consider
>
>> Without a clear specification of how to "sniff" the media type, each user 
>> agent implementor was forced to reverse engineer the behavior of the other 
>> user agents and to develop
>
> s/the other/other/ -- there are some UAs who were ignored when the
> sniffing of a given UA was developed :)
>
>> their own algorithm
>
> I'm not sure if `algorithm` here belongs in singular or plural, I got
> distracted :)
>
>> an HTTP response to be interpreted as one media type but some user agents 
>> interpret the responses as another media type.
>
> s/responses/response/ (agreement with first part)
>
>> However, if a user agent does interpret a low-privilege media type, such as 
>> image/gif, as a high-privilege media type, such as text/html, the user agent 
>> has created a privilege escalation vulnerability in the server.
>
> s/, the user agent/, then the user agent/
>
>
>
> I believe abarth has addressed the above.
>
>> This document describes a content sniffing algorithm that carefully balances 
>> the compatibility needs of user agent implementors with the security 
>> constraints.
>
> `the security constraints` is problematic, I don't think `the`
> references anything
> so either drop `the`, or provide a reference :/
>
>> and metrics collected from implementations deployed to a sizable number of 
>> users .
>
> s/ ././

There's actually a reference that goes there.  I just haven't figured
out how to do references yet.

>> (such as "strip any leading space characters" or "return false and abort 
>> these steps") are to be interpreted with the meaning of the key word 
>> ("MUST", "SHOULD", "MAY", etc)
>
> s/etc/etc./g
>
> "official-type" should probably be given some styling -- preferably
> not the same styling as "Content-Type"
>
>> (Such messages are invalid according to RFC2616.
>
> s/./.)/
>
> The rfcs should be href references of some sort :)

Yeah, I need to crack the references problem at some point.  :)

>> For octets received via HTTP, the Content-Type HTTP header field, if 
>> present, indicates the media type. Let the official-type be the media type 
>> indicted by the HTTP Content-Type header field, if present. If the 
>> Content-Type header field is absent or if its value cannot be interpreted as 
>> a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') 
>> character), then there is no official-type. (Such messages are invalid 
>> according to RFC2616.
>
>> If an HTTP response contains multiple Content-Type header fields, the User 
>> Agent MUST use the textually last Content-Type header field to the 
>> official-type. For example, if the last Content-Type header field contains 
>> the value "foo", then there is no official media type because "foo" cannot 
>> be interpreted as a media type (even if the HTTP response contains another 
>> Content-Type header field that could be interpreted as a media type).
>
> The for example part here applies to the previous paragraph, the
> sentence needs to be moved to the paragraph before the instruction for
> multiple header fields.

It's an example that combines both rules.

>> FTP RFC0959
>
> Is there a reason for the leading 0?
>
>> Comparisons between media types, as defined by MIME specifications, are done 
>> in an ASCII case-insensitive manner. [RFC2046]
>
> You need to somehow note that this is merely a note about mime
> equivalence and doesn't relate to how the spec works.

I'm not sure I understand.  It's in green and labeled as a "note".

>> If the official-type ends in "+xml", or if it is either "text/xml" or 
>> "application/xml", then let the sniffed-type be the official-type and abort 
>> these steps.
>
> Please mark up `sniffed-type` and `official-type`
>
>> If the official-type is an image type supported by the User Agent (e.g., 
>> "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" 
>> section below.
>
> s/etc//
>
>> If none of the first n octets are binary data octets then let the 
>> sniffed-type be "text/plain" and abort these steps.
>> Binary Data Byte Ranges
>
> You don't actually define a `binary data octet` as any item within the
> ranges defined in the `binary data byte ranges`.
>
>> If the first octets match one of the octet sequences in the "pattern" column 
>> of the table in the "unknown type" section below, ignoring any rows whose 
>> cell in the "security" column says "scriptable" (or "n/a"), then let the 
>> sniffed-type be the type given in the corresponding cell in the "sniffed 
>> type" column on that row and abort these steps.
>
> If you could make `"unknown type" section` a link to the section, that
> would be helpful.
>
>> For each row in the table below:
>> If the row has no "WS" octets:
>
> I know that "WS" appears in the table below, but it hasn't been
> defined yet, and I don't want to guess what it means (whitespace?) --
> I guessed wrong for the other one.
>
>> If the row has a "WS" octet or a "_>" octet:
>
>> "WS" means "whitespace", and allows insignificant whitespace to be skipped 
>> when sniffing for a type signature.
>
> Oh, so that's where you hid the definition -- way too late :)
>
>> "_>" means "space-or-bracket", and allows HTML tag names to terminate with 
>> either a space or a greater than sign.
>
> Oh _ doesn't mean underscore
>
> Please put those definitions before their use, not way below their use :(

I'm tempted to just rename them to be less semantic.  They're just
symbols that don't mean anything, really.

>> If the octets of the masked-data matches the given pattern octets exactly, 
>> then let the sniffed-type be the type given in the cell of the third column 
>> in that row and abort these steps.
>
> s/matches/match/
>
>> LOOP: If index-stream points beyond the end of the octet stream, then this 
>> row doesn't match and skip this row.
>
> Please style `LOOP`
>
>> If the index-pattern-th octet of the pattern is a normal hexadecimal octet 
>> and not a "WS" octet or a "_>" octet:
>
> s/or a/nor a/
> s/not/neither/
>
>
>> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A 
>> (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then 
>> increment only the index-stream to the next octet in the octet stream.
>
> If you could style the 0xXX items in something <tt>-ish, that'd be 
> appreciated.
> ... And if you could style the names (ASCII TAB, etc.) in something,
> that'd also be appreciated.

That's a lot of editing!  I'm not sure that buys us much.

>> If the first n octets match the signature for MP4 (as define in ), then let 
>> the sniffed-type be video/mp4 and abort these steps.
>
> s/define/defined/
>
> -- The markup you're using failed to generate a visible-reference,
> could you get the tool to generate an XXX when it fails? :)
>
>> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case 
>> sensitivity and lack of trailing _>)
>
> s/sensitivity/sensitivity [mask = FF instead of DF]/
>
>> A JPEG SOI marker followed by a octet of another marker.
>
> s/a octet/an octet/
>
> -- the table doesn't currently handle .SWF; in the past, that has been a 
> problem
> http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml

That is intentional.  Sniffing SWF is bad times.

>> If n is less than 4, then the sequence does not match the signature for MP4 
>> and abort these steps.
>
> `and` doesn't work; s/ and/;/ ?
>
> In all previous cases, the form was `let foo and abort these steps`;
> here it's `then <statement of truth> and`.
>
> The fix is probably to move to "return TRUTH/FALSE value and abort
> these steps" (or let state-determined-truth-value-be TRUTH/FASLE value
> and ...).

Hum...  I see the problem.

>> For each I from 2 to box-size/4 - 1 (inclusive):
>
> If you could put `box-size/4 - 1` into some markup to indicate that
> it's a math section, that'd be helpful.

I put it in <code>.  I'm not sure that's the prettiest, but we can iterate.

>> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 
>> (the ASCII string "mp4"), then the sequence does match the signature for MP4 
>> and abort these steps.
>
> And here for `4*i` and `4*i + 2`
>
> I think you need s/If octets/If any octets/, otherwise, it's ambiguous
> between `any` and `all`.
>
>> 7 Images
> ...
>>     Otherwise, let the sniffed-type be the official-type and abort these 
>> steps.
>
> I'd rather otherwise be step 3 instead of part of the bulleted list
> inside step 2

:)

>> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, 
>> 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump 
>> back to the previous step (the step labeled loop start) in the overall 
>> algorithm in this section.
>
> `loop start` should be a link to the LOOP label and preferably have
> the same case as the LOOP label.
>
>> Return to step 2 in these substeps.
>
> It'd be nice if this was a link to an anchor in the right part of the steps.
>
>> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be 
>> "application/rss+xml" and abort these steps.
>
> s/and/or/ ??

and is correct.  I've made it strong.  It's got to have both qualities
before we'll change the type.

Adam

Reply via email to