[whatwg] [MIME Sniffing] Editorial feedback

timeless Mon, 26 Sep 2011 14:27:27 -0700

> Otherwise, if the octets in s starting at pos match any of the sequences of 
> octets in the first column of the following table, then the user agent MUST 
> follow the steps given in the corresponding cell in the second column of the 
> same row. |


What's the stray `|` character at the end of that doing?

The ToC feels double spaced, is that normal?

Would you mind quoting your attributes in source? Things like
class=no-num or href=#web-data scare me. It's easier if you just quote
all attributes :)

Also, I generally recommend `<span ...>x</span> ` over `<span ...>x
</span>` <- i.e. trailing space outside of span (see toc)

> <p>Many web servers supply incorrect Content-Type header fields with their 
> HTTP

Can you mark up `Content-Type` in something which results in roughly
"typewriter" font?

s/user agents/User Agents/ as in:
> responses.  In order to be compatible with these servers, user agents consider

> Without a clear specification of how to "sniff" the media type, each user 
> agent implementor was forced to reverse engineer the behavior of the other 
> user agents and to develop

s/the other/other/ -- there are some UAs who were ignored when the
sniffing of a given UA was developed :)

> their own algorithm

I'm not sure if `algorithm` here belongs in singular or plural, I got
distracted :)

> an HTTP response to be interpreted as one media type but some user agents 
> interpret the responses as another media type.

s/responses/response/ (agreement with first part)

> However, if a user agent does interpret a low-privilege media type, such as 
> image/gif, as a high-privilege media type, such as text/html, the user agent 
> has created a privilege escalation vulnerability in the server.

s/, the user agent/, then the user agent/



I believe abarth has addressed the above.

> This document describes a content sniffing algorithm that carefully balances 
> the compatibility needs of user agent implementors with the security 
> constraints.

`the security constraints` is problematic, I don't think `the`
references anything
so either drop `the`, or provide a reference :/

> and metrics collected from implementations deployed to a sizable number of 
> users .

s/ ././

> (such as "strip any leading space characters" or "return false and abort 
> these steps") are to be interpreted with the meaning of the key word ("MUST", 
> "SHOULD", "MAY", etc)

s/etc/etc./g

"official-type" should probably be given some styling -- preferably
not the same styling as "Content-Type"

> (Such messages are invalid according to RFC2616.

s/./.)/

The rfcs should be href references of some sort :)

> For octets received via HTTP, the Content-Type HTTP header field, if present, 
> indicates the media type. Let the official-type be the media type indicted by 
> the HTTP Content-Type header field, if present. If the Content-Type header 
> field is absent or if its value cannot be interpreted as a media type (e.g. 
> because its value doesn't contain a U+002F SOLIDUS ('/') character), then 
> there is no official-type. (Such messages are invalid according to RFC2616.

> If an HTTP response contains multiple Content-Type header fields, the User 
> Agent MUST use the textually last Content-Type header field to the 
> official-type. For example, if the last Content-Type header field contains 
> the value "foo", then there is no official media type because "foo" cannot be 
> interpreted as a media type (even if the HTTP response contains another 
> Content-Type header field that could be interpreted as a media type).

The for example part here applies to the previous paragraph, the
sentence needs to be moved to the paragraph before the instruction for
multiple header fields.

> FTP RFC0959

Is there a reason for the leading 0?

> Comparisons between media types, as defined by MIME specifications, are done 
> in an ASCII case-insensitive manner. [RFC2046]

You need to somehow note that this is merely a note about mime
equivalence and doesn't relate to how the spec works.

> If the official-type ends in "+xml", or if it is either "text/xml" or 
> "application/xml", then let the sniffed-type be the official-type and abort 
> these steps.

Please mark up `sniffed-type` and `official-type`

> If the official-type is an image type supported by the User Agent (e.g., 
> "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" 
> section below.

s/etc//

> If none of the first n octets are binary data octets then let the 
> sniffed-type be "text/plain" and abort these steps.
> Binary Data Byte Ranges

You don't actually define a `binary data octet` as any item within the
ranges defined in the `binary data byte ranges`.

> If the first octets match one of the octet sequences in the "pattern" column 
> of the table in the "unknown type" section below, ignoring any rows whose 
> cell in the "security" column says "scriptable" (or "n/a"), then let the 
> sniffed-type be the type given in the corresponding cell in the "sniffed 
> type" column on that row and abort these steps.

If you could make `"unknown type" section` a link to the section, that
would be helpful.

> For each row in the table below:
> If the row has no "WS" octets:

I know that "WS" appears in the table below, but it hasn't been
defined yet, and I don't want to guess what it means (whitespace?) --
I guessed wrong for the other one.

> If the row has a "WS" octet or a "_>" octet:

> "WS" means "whitespace", and allows insignificant whitespace to be skipped 
> when sniffing for a type signature.

Oh, so that's where you hid the definition -- way too late :)

> "_>" means "space-or-bracket", and allows HTML tag names to terminate with 
> either a space or a greater than sign.

Oh _ doesn't mean underscore

Please put those definitions before their use, not way below their use :(

> If the octets of the masked-data matches the given pattern octets exactly, 
> then let the sniffed-type be the type given in the cell of the third column 
> in that row and abort these steps.

s/matches/match/

> LOOP: If index-stream points beyond the end of the octet stream, then this 
> row doesn't match and skip this row.

Please style `LOOP`

> If the index-pattern-th octet of the pattern is a normal hexadecimal octet 
> and not a "WS" octet or a "_>" octet:

s/or a/nor a/
s/not/neither/


> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A 
> (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then 
> increment only the index-stream to the next octet in the octet stream.

If you could style the 0xXX items in something <tt>-ish, that'd be appreciated.
... And if you could style the names (ASCII TAB, etc.) in something,
that'd also be appreciated.

> If the first n octets match the signature for MP4 (as define in ), then let 
> the sniffed-type be video/mp4 and abort these steps.

s/define/defined/

-- The markup you're using failed to generate a visible-reference,
could you get the tool to generate an XXX when it fails? :)

> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case 
> sensitivity and lack of trailing _>)

s/sensitivity/sensitivity [mask = FF instead of DF]/

> A JPEG SOI marker followed by a octet of another marker.

s/a octet/an octet/

-- the table doesn't currently handle .SWF; in the past, that has been a problem
http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml


> If n is less than 4, then the sequence does not match the signature for MP4 
> and abort these steps.

`and` doesn't work; s/ and/;/ ?

In all previous cases, the form was `let foo and abort these steps`;
here it's `then <statement of truth> and`.

The fix is probably to move to "return TRUTH/FALSE value and abort
these steps" (or let state-determined-truth-value-be TRUTH/FASLE value
and ...).

> For each I from 2 to box-size/4 - 1 (inclusive):

If you could put `box-size/4 - 1` into some markup to indicate that
it's a math section, that'd be helpful.

> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 
> (the ASCII string "mp4"), then the sequence does match the signature for MP4 
> and abort these steps.

And here for `4*i` and `4*i + 2`

I think you need s/If octets/If any octets/, otherwise, it's ambiguous
between `any` and `all`.

> 7 Images
...
>     Otherwise, let the sniffed-type be the official-type and abort these 
> steps.

I'd rather otherwise be step 3 instead of part of the bulleted list
inside step 2

> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, 
> 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump 
> back to the previous step (the step labeled loop start) in the overall 
> algorithm in this section.

`loop start` should be a link to the LOOP label and preferably have
the same case as the LOOP label.

> Return to step 2 in these substeps.

It'd be nice if this was a link to an anchor in the right part of the steps.

> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be 
> "application/rss+xml" and abort these steps.

s/and/or/ ??

[whatwg] [MIME Sniffing] Editorial feedback

Reply via email to