On Sun, 21 Dec 2008, Edward Z. Yang wrote:
>
> I suppose the big pivot point is "as if". A byte-wise implementation
> would replace character globally with byte, and any U+ designation
> with the UTF-8 encoded byte version. HTML 5 dictates end behavior, not
> the actual algorithm implementa
On 21 Dec 2008, at 16:35, Edward Z. Yang wrote:
I suppose the big pivot point is "as if". A byte-wise implementation
would replace character globally with byte, and any U+ designation
with the UTF-8 encoded byte version. HTML 5 dictates end behavior, not
the actual algorithm implementation,
On Sun, Dec 21, 2008 at 5:41 AM, Ian Hickson wrote:
> On Sat, 20 Dec 2008, Edward Z. Yang wrote:
>>
>> 1. Given an input stream that is known to be valid UTF-8, is it possible
>> to implement the tokenization algorithm with byte-wise operations only?
>> I think it's possible, since all of the char
Ian Hickson wrote:
> Yes. (At least, that's the intent; if you find anything that contradicts
> that, please let me know.)
Great. I'll be sure to ping you if I find out otherwise.
> Looking just at parsing, yes, probably...
I suppose the big pivot point is "as if". A byte-wise implementation
wo
On 21 Dec 2008, at 05:41, Ian Hickson wrote:
1. Given an input stream that is known to be valid UTF-8, is it
possible
to implement the tokenization algorithm with byte-wise operations
only?
I think it's possible, since all of the character matching parts of
the
algorithm map to characters
On Sat, 20 Dec 2008, Edward Z. Yang wrote:
>
> I am currently working on a PHP5 implementation of the HTML5
> specification. PHP has abysmal Unicode support, and implementing Unicode
> streams in userspace may be unacceptablu slow. Thus, my questions:
>
> 1. Given an input stream that is known t
I am currently working on a PHP5 implementation of the HTML5
specification. PHP has abysmal Unicode support, and implementing Unicode
streams in userspace may be unacceptablu slow. Thus, my questions:
1. Given an input stream that is known to be valid UTF-8, is it possible
to implement the tokeniz