subject:"Why do you decode \? \(Seriously\)"

Why do you decode ? (Seriously)

2012-08-02 Thread Dmitry Olshansky

Intrigued by a familiar topic in std.lexer. I've split it out. It's not as easy question as it seems. Before you start the usual because codepoint has semantic meaning, codeunit is just bytes ya-da, ya-da let me explain you something. Codepoint is indeed a complete piece of symbolic

Re: Why do you decode ? (Seriously)

2012-08-02 Thread Andrei Alexandrescu

On 8/2/12 12:47 PM, Dmitry Olshansky wrote: char[] input = ...; size_t idx = ...; size_t len = stride(input, idx); uint u8word = *cast(uint*)(input.ptr+idx); //u8word contains full UTF-8 sequence u8word = (1(8*len)) -1; //mask out extra bytes //now u8word is a complete UTF-8 sequence in one uint

Re: Why do you decode ? (Seriously)

2012-08-02 Thread Walter Bright

On 8/2/2012 11:42 AM, Andrei Alexandrescu wrote: I like a lot this idea of an minimally decoded character that's isomorphic with UTF-32 but much cheaper to extract. (We could use ulong if they add 5- and 6-byte characters). I wonder if people came up with this and gave it a name. If not, I'd say

Re: Why do you decode ? (Seriously)

2012-08-02 Thread Dmitry Olshansky

On 02-Aug-12 22:42, Andrei Alexandrescu wrote: On 8/2/12 12:47 PM, Dmitry Olshansky wrote: char[] input = ...; size_t idx = ...; size_t len = stride(input, idx); uint u8word = *cast(uint*)(input.ptr+idx); //u8word contains full UTF-8 sequence u8word = (1(8*len)) -1; //mask out extra bytes //now

Re: Why do you decode ? (Seriously)

2012-08-02 Thread Artur Skawina

On 08/02/12 18:47, Dmitry Olshansky wrote: char[] input = ...; size_t idx = ...; size_t len = stride(input, idx); uint u8word = *cast(uint*)(input.ptr+idx); So why do we use dchar and not UTF-8 word, as it's as good as dchar and faster to obtain? Iff unaligned accesses happen to be legal

Re: Why do you decode ? (Seriously)

2012-08-02 Thread Dmitry Olshansky

On 03-Aug-12 00:40, Artur Skawina wrote: On 08/02/12 18:47, Dmitry Olshansky wrote: char[] input = ...; size_t idx = ...; size_t len = stride(input, idx); uint u8word = *cast(uint*)(input.ptr+idx); So why do we use dchar and not UTF-8 word, as it's as good as dchar and faster to obtain?

Why do you decode ? (Seriously)

Re: Why do you decode ? (Seriously)

Re: Why do you decode ? (Seriously)

Re: Why do you decode ? (Seriously)

Re: Why do you decode ? (Seriously)

Re: Why do you decode ? (Seriously)

6 matches

Site Navigation

Mail list logo

Footer information