I present this draft of the complete algorithm for parsing a MIME type.  I 
would appreciate comments.

--Peter

----------------------------------------------------

An ASCII alphanumeric is a byte or character in the ranges 0x41-0x5A, 
0x61-0x7A, and 0x30-0x39.
A MIME type byte is an ASCII alphanumeric or one of the following bytes: ! # $ 
& ^ _ . + -
A parameter value byte is a MIME type byte or one of the following bytes: % ' * 
` | ~

To parse a MIME type, run the following steps:

1. Let length be the length of the byte sequence of the MIME type.
2. If length is less than 1, return undefined.
3. Let pointer be 0.  Pointer is a zero-based index to the current byte in the 
byte sequence.
4. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 (TAB).
5. Let type be the byte string from the current byte up to but not including 
the next "/" byte. Advance pointer to the next "/" byte.
6. If the current byte isn't "/", return undefined.
7. Increment pointer by 1.
8. Let subtype be the byte string from the current byte up to but not including 
the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.  Advance pointer to the next 
0x20 (SPACE), 0x09 (TAB), or ";" byte.
9. If type is empty, contains a byte that isn't a MIME type byte, or doesn't
begin with an ASCII alphanumeric, or is longer than 127 bytes, return undefined.
10. If subtype is empty, contains a byte that isn't a MIME type byte, or 
doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
undefined.
11. Convert type and subtype to ASCII lowercase.
12. Let parameters be an empty dictionary.
13. Run the following substeps in a loop.
     1. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 (TAB).
     2. If pointer is equal to length, return type, subtype, and parameters.
     3. If the current byte isn't ";", return undefined.
     4. Increment pointer by 1.
     5. If pointer is equal to length, return type, subtype, and parameters.
     6. Let parameter be the byte string from the current byte up to but not 
including the next "=" byte. Advance pointer to the next "=" byte.
     7. If parameter is empty, contains a byte that isn't a MIME type byte, or 
doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
undefined.
     8. If parameters contains a mapping for parameter, return undefined.
     9. Convert parameter to ASCII lowercase.
     10. If the current byte isn't "=", return undefined.
     11. Increment pointer by 1.
     12. If the current byte equals 0x22 (quotation mark), run the following 
substeps:
              1. Let value be an empty byte string.
              2. Increment pointer by 1.
              3. Run these substeps in a loop.
                      1. If pointer is equal to length, return type, subtype, 
and parameters.
                      2. If the current byte equals 0x7F or is less than 0x20, 
and the current byte isn't TAB (0x09), return type, subtype, and parameters.
                      3. If the current byte equals 0x22 (quotation mark), 
increment pointer by 1 and terminate this loop.
                      4. Otherwise, if the current byte is "\", increment 
pointer by 1. Then, if there is a current byte, append that byte to value.
                      5. Otherwise, append the current byte to value.
                      6. Increment pointer by 1.
              4. Add the mapping of parameter to value to the parameters 
dictionary.
     13. Otherwise, run these substeps:
              1. Let value be the byte string from the current byte up to but 
not including the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.  Advance pointer 
to the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.
              2. If value is empty or contains a byte that isn't a parameter 
value byte, return undefined.
              3. Add the mapping of parameter to value to the parameters 
dictionary.

-------------------


Reply via email to