Can you shrink it further?

2016-10-09 Thread Andrei Alexandrescu via Digitalmars-d
Had a bit of a good time with popFront for UTF, got to this: https://github.com/dlang/phobos/pull/4848. I suspect there are cleverer things that can be done. Can anyone make it tighter? Take a look at the disassembly referred to in the pull request. Thanks! -- Andrei

Re: Can you shrink it further?

2016-10-09 Thread Stefan Koch via Digitalmars-d
On Sunday, 9 October 2016 at 22:11:50 UTC, Andrei Alexandrescu wrote: Had a bit of a good time with popFront for UTF, got to this: https://github.com/dlang/phobos/pull/4848. I suspect there are cleverer things that can be done. Can anyone make it tighter? Take a look at the disassembly referred

Re: Can you shrink it further?

2016-10-09 Thread Andrei Alexandrescu via Digitalmars-d
On 10/09/2016 07:05 PM, Stefan Koch wrote: On Sunday, 9 October 2016 at 22:11:50 UTC, Andrei Alexandrescu wrote: Had a bit of a good time with popFront for UTF, got to this: https://github.com/dlang/phobos/pull/4848. I suspect there are cleverer things that can be done. Can anyone make it tighte

Re: Can you shrink it further?

2016-10-09 Thread safety0ff via Digitalmars-d
On Sunday, 9 October 2016 at 22:11:50 UTC, Andrei Alexandrescu wrote: I suspect there are cleverer things that can be done. Less clever seemed to work for me: void popFront1(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80 || c >= 0xFE) { s = s.ptr[1 .. s.length];

Re: Can you shrink it further?

2016-10-09 Thread Andrei Alexandrescu via Digitalmars-d
On 10/09/2016 10:55 PM, safety0ff wrote: On Sunday, 9 October 2016 at 22:11:50 UTC, Andrei Alexandrescu wrote: I suspect there are cleverer things that can be done. Less clever seemed to work for me: void popFront1(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80 ||

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Monday, 10 October 2016 at 03:55:17 UTC, Andrei Alexandrescu wrote: Oh, forgot to mention: the initial/short path should only check for ASCII, i.e. c < 0x80. -- Andrei void popFront1(ref char[] s) @trusted pure nothrow { immutable c = s[0]; size_t char_length = 1; if (!(c & 0x80)) {

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Monday, 10 October 2016 at 15:37:09 UTC, Stefan Koch wrote: Since in this case stability of min is concern, you can shave of another 2 instructions by writing the comparison by hand In this case the stability of min is +NO+ concern.

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Monday, 10 October 2016 at 15:17:05 UTC, Stefan Koch wrote: On Monday, 10 October 2016 at 03:55:17 UTC, Andrei Alexandrescu wrote: Oh, forgot to mention: the initial/short path should only check for ASCII, i.e. c < 0x80. -- Andrei Since in this case stability of min is concern, you can sh

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
This version has 24 instructions but these have a smaller encoding then and are generally inexpensive With inline asm and conditional moves it would be possible to reduce this even further to ~20 instructions. void popFront1(ref char[] s) @trusted pure nothrow { immutable c = s[0]; size_t c

Re: Can you shrink it further?

2016-10-10 Thread Andrei Alexandrescu via Digitalmars-d
That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just skip one byte. Also, are we right to not worry about 5- and 6-byte sequences? The

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote: That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just skip one b

Re: Can you shrink it further?

2016-10-10 Thread Lurker via Digitalmars-d
On Tuesday, 11 October 2016 at 03:00:45 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote: That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 03:18:24 UTC, Lurker wrote: Pardon me asking, but why all these gotos instead of else ifs: if (c < 192) { char_length = 2; } else if (c < 240) { char_length = 3; } else if (...) { } Does it have any effect on generated code (I don't think it

Re: Can you shrink it further?

2016-10-10 Thread Andrei Alexandrescu via Digitalmars-d
On 10/10/16 11:00 PM, Stefan Koch wrote: On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote: That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I wanted in ldc. We should

Re: Can you shrink it further?

2016-10-10 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 03:58:59 UTC, Andrei Alexandrescu wrote: On 10/10/16 11:00 PM, Stefan Koch wrote: On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote: That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quic

Re: Can you shrink it further?

2016-10-11 Thread Matthias Bentrup via Digitalmars-d
On Tuesday, 11 October 2016 at 04:05:47 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 03:58:59 UTC, Andrei Alexandrescu wrote: On 10/10/16 11:00 PM, Stefan Koch wrote: On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote: [...] If you want to skip a byte it's easy

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248); s = s.ptr[char_length .. s.length]; } Theoretically the char_l

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 08:03:40 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);

Re: Can you shrink it further?

2016-10-11 Thread Temtaime via Digitalmars-d
On Tuesday, 11 October 2016 at 08:17:52 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 08:03:40 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 08:17:52 UTC, Stefan Koch wrote: Also the code produces conditional set instructions which have a higher latency. And worse throughput. On my arguably a bit dated laptop: popFront3 performs 109 us best time and popFront4 performs with 265 us best time Testcode

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 08:44:04 UTC, Temtaime wrote: void popFront1(ref char[] s) @trusted pure nothrow { import core.bitop, std.algorithm; auto v = bsr(~s[0] | 1); s = s[clamp(v, 1, v > 6 ? 1 : $)..$]; } Seems to be less if i'm not wrong. Yours runs with 790 us best time. bsr

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 08:57:46 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 08:44:04 UTC, Temtaime wrote: void popFront1(ref char[] s) @trusted pure nothrow { import core.bitop, std.algorithm; auto v = bsr(~s[0] | 1); s = s[clamp(v, 1, v > 6 ? 1 : $)..$]; } Seems to be

Re: Can you shrink it further?

2016-10-11 Thread Temtaime via Digitalmars-d
On Tuesday, 11 October 2016 at 09:13:10 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 08:57:46 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 08:44:04 UTC, Temtaime wrote: void popFront1(ref char[] s) @trusted pure nothrow { import core.bitop, std.algorithm; auto v = bsr(~

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 09:45:11 UTC, Temtaime wrote: Sorry this was also a type in the code. void popFront7(ref char[] s) @trusted pure nothrow { import core.bitop; auto v = 7 - bsr(~s[0] | 1); s = s[v > 6 ? 1 : (v ? (v > s.length ? s.length : v) : 1)..$]; } Please check this.

Re: Can you shrink it further?

2016-10-11 Thread Ethan Watson via Digitalmars-d
On Tuesday, 11 October 2016 at 10:01:41 UTC, Stefan Koch wrote: On Tuesday, 11 October 2016 at 09:45:11 UTC, Temtaime wrote: Sorry this was also a type in the code. void popFront7(ref char[] s) @trusted pure nothrow { import core.bitop; auto v = 7 - bsr(~s[0] | 1); s = s[v > 6 ? 1 : (v ?

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/11/2016 04:57 AM, Stefan Koch wrote: Yours runs with 790 us best time. bsr is a real timetaker :) What inputs did you test it on? Here's what I think would be a good set of requirements: * The ASCII case should be short and fast: a comparison and a branch, followed by return. This woul

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 14:16:54 UTC, Andrei Alexandrescu wrote: On 10/11/2016 04:57 AM, Stefan Koch wrote: Yours runs with 790 us best time. bsr is a real timetaker :) What inputs did you test it on? https://github.com/minimaxir/big-list-of-naughty-strings/blob/master/blns.txt Here

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/11/2016 03:30 AM, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248); s = s.ptr[char_length .. s.length]; } Theoretically the char_length could be compute

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 14:24:56 UTC, Andrei Alexandrescu wrote: On 10/11/2016 03:30 AM, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248); s = s.ptr[ch

Re: Can you shrink it further?

2016-10-11 Thread Matthias Bentrup via Digitalmars-d
On Tuesday, 11 October 2016 at 14:24:56 UTC, Andrei Alexandrescu wrote: On 10/11/2016 03:30 AM, Matthias Bentrup wrote: A branch-free version: void popFront4(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248); s = s.ptr[ch

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/11/2016 05:45 AM, Temtaime wrote: void popFront7(ref char[] s) @trusted pure nothrow { import core.bitop; auto v = 7 - bsr(~s[0] | 1); s = s[v > 6 ? 1 : (v ? (v > s.length ? s.length : v) : 1)..$]; } Please check this. Thanks. This does a lot of work on the frequent path c < 0x80:

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 14:49:28 UTC, Matthias Bentrup wrote: This is the result I'd like to get, but I can't find a way to write it without inline assembly :( void popFrontAsmIntel(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80) { s = s[1 .. $]; } els

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/11/2016 10:49 AM, Matthias Bentrup wrote: void popFrontAsmIntel(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80) { s = s[1 .. $]; } else { uint l = void; asm pure nothrow @nogc { mov EAX, 1; mov BL, 0xf8-1; sub BL, c; cmp BL,

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/10/2016 11:00 PM, Stefan Koch wrote: void popFront3(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1; if (c < 127) { Lend : s = s.ptr[char_length .. s.length]; } else { if ((c & b01100_) == 0b1000_) { //just skip

Re: Can you shrink it further?

2016-10-11 Thread David Nadlinger via Digitalmars-d
On Tuesday, 11 October 2016 at 15:08:34 UTC, Andrei Alexandrescu wrote: Looked at this, still seems to generate a jump forward with ldc. ldc.intrinsics.llvm_expect might help to influence basic block layout. — David

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 15:08:34 UTC, Andrei Alexandrescu wrote: On 10/10/2016 11:00 PM, Stefan Koch wrote: [...] Looked at this, still seems to generate a jump forward with ldc. Also, why do you leave a fallthrough path? Progress needs to be made on all paths, otherwise we have infi

Re: Can you shrink it further?

2016-10-11 Thread Andrei Alexandrescu via Digitalmars-d
On 10/11/16 11:15 AM, Stefan Koch wrote: I will now run this problem through STOKE. Let's see what it comes up with :) http://stoke.stanford.edu you mean? That would be cool. Keep us posted! -- Andrei

Re: Can you shrink it further?

2016-10-11 Thread Stefan Koch via Digitalmars-d
On Tuesday, 11 October 2016 at 16:13:45 UTC, Andrei Alexandrescu wrote: On 10/11/16 11:15 AM, Stefan Koch wrote: I will now run this problem through STOKE. Let's see what it comes up with :) http://stoke.stanford.edu you mean? That would be cool. Keep us posted! -- Andrei Yep I mean that on

Re: Can you shrink it further?

2016-10-12 Thread Matthias Bentrup via Digitalmars-d
On Tuesday, 11 October 2016 at 15:01:47 UTC, Andrei Alexandrescu wrote: On 10/11/2016 10:49 AM, Matthias Bentrup wrote: void popFrontAsmIntel(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80) { s = s[1 .. $]; } else { uint l = void; asm pure nothrow @nogc

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: Here are three branch-less variants that use the sign instead of the carry bit. The last one is the fastest on my machine, although it mixes the rare error case and the common 1-byte case into one branch. void popFront1

Re: Can you shrink it further?

2016-10-12 Thread Matthias Bentrup via Digitalmars-d
On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: [...] All three are slower than baseline, for my test-case. What did you test it against. The blns.txt file mentioned upthread.

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 10:15:17 UTC, Matthias Bentrup wrote: On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: [...] All three are slower than baseline, for my test-case. What did you test it agai

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though.

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 04:56 AM, Matthias Bentrup wrote: void popFront1b(ref char[] s) @trusted pure nothrow { immutable c = cast(byte)s[0]; if (c >= -8) { s = s[1 .. $]; } else { uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31); import std.algorithm; s = s[min(i, $) ..

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 05:23 AM, Stefan Koch wrote: All three are slower than baseline, for my test-case. What did you test it against. I'd say: (a) test for speed of ASCII-only text; (b) make it small. That's all we need. Nobody worries about 10-20% in multibyte-heavy text. -- Andrei

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 06:56 AM, Stefan Koch wrote: I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though. Nice. I like that the table is NOT looked up o

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: On 10/12/2016 06:56 AM, Stefan Koch wrote: I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produc

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 09:39 AM, Stefan Koch wrote: On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: On 10/12/2016 09:39 AM, Stefan Koch wrote: Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei I measur

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 10:39 AM, Stefan Koch wrote: On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: On 10/12/2016 09:39 AM, Stefan Koch wrote: Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR wi

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 14:46:32 UTC, Andrei Alexandrescu wrote: No need. 1% for dmd is negligible. 25% would raise an eyebrow. -- Andrei Alright then PR: https://github.com/dlang/phobos/pull/4849

Re: Can you shrink it further?

2016-10-12 Thread safety0ff via Digitalmars-d
My current favorites: void popFront(ref char[] s) @trusted pure nothrow { immutable byte c = s[0]; if (c >= -2) { s = s.ptr[1 .. s.length]; } else { import core.bitop; size_t i = 7u - bsr(~c); import std.algorithm; s = s.ptr[min(i, s.length) .. s.length]; } } I also e

Re: Can you shrink it further?

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm!

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 01:05 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm! Yah, that's pretty clever. Better yet, I suspect we can reuse the look-up table for front() as well. -- Andrei

Re: Can you shrink it further?

2016-10-13 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 17:59:51 UTC, Andrei Alexandrescu wrote: On 10/12/2016 01:05 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm! Yah, that's pretty clever. Better yet, I suspect we can reuse th

Re: Can you shrink it further?

2016-10-14 Thread Stefan Koch via Digitalmars-d
On Friday, 14 October 2016 at 04:21:28 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 17:59:51 UTC, Andrei Alexandrescu wrote: On 10/12/2016 01:05 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm! Y