On Tuesday, 11 October 2016 at 18:13:53 UTC, Andrei Alexandrescu
wrote:
http://indianautosblog.com/2016/10/most-powerful-suzuki-swift-produces-350-hp-25
-- Andrei
Buna ziua
Stimate Domnule Alexandrescu am studiat putin noul limbaj
dezvoltat de dumneavoastra. Am incercat sa rulez cateva pr
On 10/12/2016 09:35 PM, Stefan Koch wrote:
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote:
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized addition
(vp
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu
wrote:
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized
addition
(vpmovzxbq & vpaddq.)
Measurements without
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu
wrote:
On 10/12/2016 08:11 PM, Stefan Koch wrote:
We should probably introduce a new module for stuff like this.
object.d is already filled with too much unrelated things.
Yah, shouldn't go in object.d as it's fairly niche. On t
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized addition
(vpmovzxbq & vpaddq.)
Measurements without -mcpu=native:
overhead 0.336s
bytes0.610s
without branch hints 0.852s
co
On 10/12/2016 08:11 PM, Stefan Koch wrote:
We should probably introduce a new module for stuff like this.
object.d is already filled with too much unrelated things.
Yah, shouldn't go in object.d as it's fairly niche. On the other hand
defining a new module for two functions seems excessive unl
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized
addition (vpmovzxbq & vpaddq.)
Measurements without -mcpu=native:
overhead 0.336s
bytes0.610s
without branch hints 0.852s
code pasted 0.766s
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
Wait, so going through the bytes made almost no difference? Or
did you subtract the overhead already?
It made little difference: LDC compiled into AVX2 vectorized
addition (vpmovzxbq & vpaddq.)
On Wednesday, 12 October 2016 at 23:59:15 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
I think we should define two aliases "likely" and "unlikely"
with default implementations:
bool likely(bool b) { return b; }
bool unlikely(bool b) { ret
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
I think we should define two aliases "likely" and "unlikely"
with default implementations:
bool likely(bool b) { return b; }
bool unlikely(bool b) { return b; }
They'd go in druntime. Then implementers can hook them in
On 10/12/2016 04:02 PM, safety0ff wrote:
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to
arrange matters so the loss is negligible. -- Andrei
On Wednesday, 12 October 2016 at 22:38:33 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 22:16:38 UTC, tsbockman wrote:
Yes. The path to fix 259 is clear, and Lionello Lunesu and
myself have already done most of the work.
14835 is a blocker due to the nature of the solution that
Wal
On Wednesday, 12 October 2016 at 22:16:38 UTC, tsbockman wrote:
On Wednesday, 12 October 2016 at 16:36:32 UTC, Andrei
Alexandrescu wrote:
On 10/12/2016 12:31 PM, Stefan Koch wrote:
I can take a look at 259.
14835 is nothing trivial though.
My understanding is Thomas has an attack on 259 once
Hi!
I've recently started a new employment, and with that new
collegues and new language discussions/wars ;)
So there's this language Rust. And it provides some pretty
amazing safety guarantees when it to memory management, algorithm
correctness (preventing iterator invalidation) in both sin
On Wednesday, 12 October 2016 at 16:36:32 UTC, Andrei
Alexandrescu wrote:
On 10/12/2016 12:31 PM, Stefan Koch wrote:
I can take a look at 259.
14835 is nothing trivial though.
My understanding is Thomas has an attack on 259 once a solution
to 14835 is up. -- Andrei
Yes. The path to fix 259
On Wednesday, 12 October 2016 at 20:07:19 UTC, Stefan Koch wrote:
where did you apply the branch hints ?
Code: http://pastebin.com/CFCpUftW
On Wednesday, 12 October 2016 at 20:02:16 UTC, safety0ff wrote:
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei
Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only
two comparisons, all with 100% predictability. We should be
able to arrange matters so the loss
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei
Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to
arrange matters so the loss is negligible. -- Andrei
My measurements:
ldc -O3 -boundschec
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
On my machine, with "ldc2 -release -O3 -enable-inlining"
"-O3 -enable-inlining" is synonymous with "-O3" :-)
With LDC 1.1.0-beta3, you can try with
"-enable-cross-module-inlining". It won't cross-module inline
everyt
On 10/12/2016 01:05 PM, safety0ff wrote:
On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
[Snip]
Didn't see the LUT implementation, nvm!
Yah, that's pretty clever. Better yet, I suspect we can reuse the
look-up table for front() as well. -- Andrei
On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
[Snip]
Didn't see the LUT implementation, nvm!
My current favorites:
void popFront(ref char[] s) @trusted pure nothrow {
immutable byte c = s[0];
if (c >= -2) {
s = s.ptr[1 .. s.length];
} else {
import core.bitop;
size_t i = 7u - bsr(~c);
import std.algorithm;
s = s.ptr[min(i, s.length) .. s.length];
}
}
I also e
So it would be great to get the super annoying
https://issues.dlang.org/show_bug.cgi?id=259 to a conclusion, and it
seems the similarly annoying
https://issues.dlang.org/show_bug.cgi?id=14835 is in the way.
If anyone would like to look into the latter that would be great. Good
regression test
On 10/12/2016 12:31 PM, Stefan Koch wrote:
I can take a look at 259.
14835 is nothing trivial though.
My understanding is Thomas has an attack on 259 once a solution to 14835
is up. -- Andrei
On Wednesday, 12 October 2016 at 16:27:05 UTC, Andrei
Alexandrescu wrote:
So it would be great to get the super annoying
https://issues.dlang.org/show_bug.cgi?id=259 to a conclusion,
and it seems the similarly annoying
https://issues.dlang.org/show_bug.cgi?id=14835 is in the way.
If anyone wo
On Wednesday, 12 October 2016 at 14:46:32 UTC, Andrei
Alexandrescu wrote:
No need. 1% for dmd is negligible. 25% would raise an eyebrow.
-- Andrei
Alright then
PR: https://github.com/dlang/phobos/pull/4849
On 10/12/2016 12:03 PM, Stefan Koch wrote:
This will only work really efficiently with some state on the stack.
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to arrange
matters so the loss is negligible. -- Andrei
On Wednesday, 12 October 2016 at 16:07:39 UTC, Ilya Yaroshenko
wrote:
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from
s = s[1 .. $]. Smaller fun
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s
= s[1 .. $]. Smaller functions make sure that the impact on
instruction cache in larger applic
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s
= s[1 .. $]. Smaller functions make sure that the impact on
instruction cache in larger applic
On 10/12/2016 10:39 AM, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote:
On 10/12/2016 09:39 AM, Stefan Koch wrote:
Thanks! I'd say make sure there is exactly 0% loss on performance
compared to the popFront in the ASCII case, and if so make a PR wi
On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei
Alexandrescu wrote:
On 10/12/2016 09:39 AM, Stefan Koch wrote:
Thanks! I'd say make sure there is exactly 0% loss on
performance compared to the popFront in the ASCII case, and if
so make a PR with the table version. -- Andrei
I measur
On 10/12/2016 09:39 AM, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote:
In the second case, the compiler generates an inc for bumping the
pointer and a dec for decreasing the length (
On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei
Alexandrescu wrote:
On 10/12/2016 06:56 AM, Stefan Koch wrote:
I just confirmed that branching version is faster then
table-lookup.
please test it our for yourself
http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
The table-lookup does produc
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
$]. Smaller functions make sure that the impact on instruction cache in
larger applications is not high.
Now it's time to look at the end-to-end cost of autod
On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei
Alexandrescu wrote:
In the second case, the compiler generates an inc for bumping
the pointer and a dec for decreasing the length (small
instructions). If the variable char_
On 10/12/2016 06:56 AM, Stefan Koch wrote:
I just confirmed that branching version is faster then table-lookup.
please test it our for yourself
http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
The table-lookup does produce the smallest code though.
Nice. I like that the table is NOT looked up o
On 10/12/2016 05:23 AM, Stefan Koch wrote:
All three are slower than baseline, for my test-case.
What did you test it against.
I'd say: (a) test for speed of ASCII-only text; (b) make it small.
That's all we need. Nobody worries about 10-20% in multibyte-heavy text.
-- Andrei
On 10/12/2016 04:56 AM, Matthias Bentrup wrote:
void popFront1b(ref char[] s) @trusted pure nothrow {
immutable c = cast(byte)s[0];
if (c >= -8) {
s = s[1 .. $];
} else {
uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
import std.algorithm;
s = s[min(i, $) ..
I just confirmed that branching version is faster then
table-lookup.
please test it our for yourself
http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
The table-lookup does produce the smallest code though.
Then maybe this isn't photoshopped:
https://twitter.com/stamcd/status/742563964656062464
Why would be ? It's a screenshot from Forza Motorsport game.
On Sunday, 9 October 2016 at 20:33:29 UTC, Era Scarecrow wrote:
Something coming to mind is the idea of making a small
algorithm to be used with other already existing encryption
functions to extend the blocksize of encryption with minimal
complexity growth.
For fun I'm experimenting with t
On Wednesday, 12 October 2016 at 10:15:17 UTC, Matthias Bentrup
wrote:
On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch
wrote:
On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias
Bentrup wrote:
[...]
All three are slower than baseline, for my test-case.
What did you test it agai
On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup
wrote:
[...]
All three are slower than baseline, for my test-case.
What did you test it against.
The blns.txt file mentioned upthread.
On Tuesday, 11 October 2016 at 18:13:53 UTC, Andrei Alexandrescu
wrote:
http://indianautosblog.com/2016/10/most-powerful-suzuki-swift-produces-350-hp-25
-- Andrei
Then maybe this isn't photoshopped:
https://twitter.com/stamcd/status/742563964656062464
On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup
wrote:
Here are three branch-less variants that use the sign instead
of the carry bit.
The last one is the fastest on my machine, although it mixes
the rare error case and the common 1-byte case into one branch.
void popFront1
On Tuesday, 11 October 2016 at 15:01:47 UTC, Andrei Alexandrescu
wrote:
On 10/11/2016 10:49 AM, Matthias Bentrup wrote:
void popFrontAsmIntel(ref char[] s) @trusted pure nothrow {
immutable c = s[0];
if (c < 0x80) {
s = s[1 .. $];
} else {
uint l = void;
asm pure nothrow @nogc
47 matches
Mail list logo