On 1/7/11 12:27 PM, Aryeh Gregor wrote:
1)  If the input string contains any 16-bit units whose value is greater
than 0xff, throw INVALID_CHARACTER_ERR.

This seems redundant with step 4 below.

It's not, because after this step the input JS string is converted into a byte buffer by dropping the high byte of each 2-byte code unit. All the following steps operate on bytes.

2)  If the input string's length is greater than 0xFFFFFFFF / 3, throw a
generic failure code (because otherwise a 32-bit computation of the output
string length will overflow; this could probably be changed to use 64-bit
arithmetic).

This doesn't sound like it should be in the spec.  It can fall under
the hardware limitations clause if it actually comes up.  I don't like
the hardware limitations clause, but this case seems so unlikely to
come up on the web that it's not caring.  Passing around>1 GB strings
in JavaScript is going to cause a lot of pain no matter what.  (But if
I ran into this case somehow as a web developer, I'd definitely feel
justified in considering it a bug in Firefox.)

You wouldn't run into this case as a web developer at the moment, in any case, because JS strings in Spidermonkey have 28-bit lengths. So attempts to allocate a JS string long enough to trigger the above check would fail with an out of memory exception.

3)  If the length of the source string is 0 mod 4 and the string ends in
either "=" or "==" then chop off the trailing equals signs from the string.
  If after this step the length is 1 mod 4, throw INVALID_CHARACTER_ERR.

4)  If the string contains any characters other than those in [A-Za-z0-9+/]
then throw INVALID_CHARACTER_ERR.

Step 2 is certainly missing from your spec (and as I said, may not be
desirable); I haven't verified whether your regexp ends up enforcing exactly
3+4 above.

It looks the same to me, although I haven't looked *that* carefully.
Behavior matches in all the tests I could think up.

In that case, I would prefer that the character and length constraints just be explicitly specified. Specifying them via an unreadable regexp is hostile not just to implementors but to the users of the spec too.

If the regexp happened to use the equivalent of perl's /x and comments, I would be more OK with it, but then you might as well just write out the comments and leave off the regexp, unless you expect someone to actually try to use it to validate input to atob.

-Boris

Reply via email to