On Monday, 2 April 2018 at 07:15:22 UTC, sarn wrote:
This feels like a long shot, but I figure it's worth a try.
I'll be at the Sydney C++ Meetup this Thursday (and other
Thursdays, no promises). If anyone lurking here is in Sydney
and wants to talk about D with someone, come say hi. (I'm
On Friday, 30 September 2016 at 03:42:27 UTC, deadalnix wrote:
GC is unnacceptable !
Ho ! a deferred and unordered destruction library, really cool !
Is there intelligent life in the C++ world ?
I think the difference is that the solution is scoped to only
those objects that you chose to
On Tuesday, 13 September 2016 at 18:24:26 UTC, deadalnix wrote:
No you don't, as how often the GC kicks in depend of the rate
at which you produce garbage, which is going to be very low
with an hybrid approach.
This is simply not true.
Assume in a pure GC program the GC heap can grow up to X
On Wednesday, 7 September 2016 at 11:53:00 UTC, Marc Schütz wrote:
Would a `vectorize` range adapter be feasible that prepares the
input to make it SIMD compatible? That is, force alignment,
process left-over elements at the end, etc.? As far as I
understand, the problems with auto
On Wednesday, 7 September 2016 at 02:09:17 UTC, Manu wrote:
The lesson I learned from this is that you need the user code
to provide a lot of extra information about the algorithm at
compile time for the templates to work out a way to fuse
pipeline stages together efficiently.
I believe it
On Wednesday, 7 September 2016 at 01:38:47 UTC, Manu wrote:
On 7 September 2016 at 11:04, finalpatch via Digitalmars-d
<digitalmars-d@puremagic.com> wrote:
It shouldn't be hard to have the framework look at the buffer
size and choose the scalar version when number of elements are
On Wednesday, 7 September 2016 at 00:21:23 UTC, Manu wrote:
The end of a scan line is special cased . If I need 12 pixels
for the last iteration but there are only 8 left, an instance
of Kernel::InputVector is allocated on stack, 8 remaining
pixels are memcpy into it then send to the kernel.
On Tuesday, 6 September 2016 at 14:47:21 UTC, Manu wrote:
with a main loop that reads the source buffer in *12* pixels
step, call
MySimpleKernel 3 times, then call AnotherKernel 4 times.
It's interesting thoughts. What did you do when buffers weren't
multiple of the kernels?
The end of a
On Tuesday, 6 September 2016 at 14:26:22 UTC, finalpatch wrote:
On Tuesday, 6 September 2016 at 14:21:01 UTC, finalpatch wrote:
Then some template magic will figure out the LCM of the 2
kernels' pixel width is 3*4=12 and therefore they are fused
together into a composite kernel of pixel width
On Tuesday, 6 September 2016 at 14:21:01 UTC, finalpatch wrote:
Then some template magic will figure out the LCM of the 2
kernels' pixel width is 3*4=12 and therefore they are fused
together into a composite kernel of pixel width 12. The above
line compiles down into a single function
On Tuesday, 6 September 2016 at 03:08:43 UTC, Manu wrote:
I still stand by this, and I listed some reasons above.
Auto-vectorisation is a nice opportunistic optimisation, but it
can't
be relied on. The key reason is that scalar arithmetic
semantics are
different than vector semantics, and
"500 - Internal Server Error
Internal Server Error"
Probably not good for publicity.
On Wednesday, 15 June 2016 at 10:58:04 UTC, Martin Nowak wrote:
It's a huge maintenance effort for us to produce the chm files.
We no longer generate documentation on Windows, but just for
the chm generation we have dedicated tools [¹] to create an
index (from a json generated via ddoc) and
On Monday, 6 June 2016 at 05:13:11 UTC, Daniel Kozak wrote:
You can still unregister your critical thread from GC.
Thanks, didn't know you could do that.
On Monday, 6 June 2016 at 04:17:40 UTC, Adam D. Ruppe wrote:
On Monday, 6 June 2016 at 02:30:55 UTC, Pie? wrote:
Duh! The claim is made that D can work without the GC... but
that's a red herring... If you take about the GC what do you
have?
Like 90% of the language, still generally nicer
On Monday, 23 May 2016 at 12:27:23 UTC, Seb wrote:
I don't use Windows || Visual Studio, but what happend to
VisualD?
http://rainers.github.io/visuald/visuald/StartPage.html
VisualD is fine, but some of us don't feel like to install a
20Gig IDE just to hack some D code.
On Sunday, 22 May 2016 at 15:35:23 UTC, poliklosio wrote:
The code-d plugin doesn't work on Windows for a very long time
(months). There is even an issue on github
https://github.com/Pure-D/code-d/issues/38
Do you have any plans of fixing it or is Windows low priority?
Doesn't work on OSX as
On Monday, 20 April 2015 at 11:01:28 UTC, Panke wrote:
On Monday, 20 April 2015 at 09:41:09 UTC, bearophile wrote:
Utilizing the other 80% of your system's performance:
Starting with Vectorization by Ulrich Drepper:
https://www.youtube.com/watch?v=DXPfE2jGqg0
It shows two still missing parts
On Monday, 9 February 2015 at 17:45:09 UTC, Andrei Alexandrescu
wrote:
On 2/9/15 1:39 AM, Dicebot wrote:
I think this is crucial if we want to keep actual Phobos
sources easily
review-able within your requirements. There is a good value in
having
`core.stdc` to map C headers 1-to-1 though.
the article
accessible to people new to D as well.
I've got an AGG inspired 2D rasterizer on github.
https://github.com/finalpatch/dagger
it's not as template heavy or making extensive use of ranges as
the OP's.
On Wednesday, 8 January 2014 at 18:49:58 UTC, Adam Wilson wrote:
I think if you're willing to use version 2.4 then you get a
much more permissive license, no? That's how I read
http://www.antigrain.com/license/index.html anyway...
Right, it will just force us to become responsible for
On Wednesday, 8 January 2014 at 23:29:59 UTC, Adam Wilson wrote:
Even with a full port of 2.4 to D it would still fall under the
BSD 3-Clause license which is not Boost compliant IIRC. So it
will never end up in Phobos. If I am missing something let me
know, because a Phobos Software Renderer
The code in glfw3.lib uses user32/gdi32 (in order to create the
window). And because you use glfw3 as a static library, you have
to link in user32/gdi32. However if you use glfw3 as a dynamic
library (your program links with the glfw3 import library) then
you don't need user32/gdi32 because
I find it critical to ensure all loops are unrolled in basic
vector ops (copy/arithmathc/dot etc.)
On Wednesday, 16 October 2013 at 12:02:15 UTC, Róbert László Páli
wrote:
Hello!
I am writing an unbiased raytrace renderer in D. I have good
progress, but I want to make it as fast as possible
Apparently the javascript that's responsible for creating
hyperlinks runs very slowly, usually several seconds or longer.
eg. http://dlang.org/phobos/core_memory.html is so slow it causes
Mozilla Firefox to pop up the page not responding box. I have
also tried Internet Explorer 10 on Windows
On Thursday, 1 August 2013 at 19:01:19 UTC, Russel Winder wrote:
On Thu, 2013-08-01 at 16:32 +0200, finalpatch wrote:
Hi Russel,
I'm not proposing my changes to be merged into master because
I know they are merely workarounds rather than proper fixes
(eg. font-lock-add-keywords
Hi Russel,
I'm not proposing my changes to be merged into master because I
know they are merely workarounds rather than proper fixes (eg.
font-lock-add-keywords).
On Thursday, 1 August 2013 at 08:07:28 UTC, Russel Winder wrote:
The way to progress this is for you to to clone the GitHub
version in my site-lisp directory that fixes most of
these issues for me but because I'm not familiar with the CC-mode
codebase my solutions are very rough and hacky.
--
finalpatch
On Saturday, 6 July 2013 at 01:35:09 UTC, Manu wrote:
Okay, so I feel like this should be possible, but I can't make
it work...
I want to use template deduction to deduce the argument type,
but I want
the function arg to be Unqual!T of the deduced type, rather
than the
verbatim type of the
Is anyone aware of the new immutable containers in .Net framework?
http://blogs.msdn.com/b/bclteam/archive/2012/12/18/preview-of-immutable-collections-released-on-nuget.aspx.
While we can attach the immutable qualifier to any D containers,
they will not perform nearly as well because they are
On Tuesday, 25 June 2013 at 18:29:00 UTC, Walter Bright wrote:
Any projects using AddRef() and Release()?
I use it in a pet project. ComObject is needed to manage
callbacks. Was wondering how does the COM object lifetime
management works but now I realise this is actually broken in the
Joseph Rushton Wakeling joseph.wakel...@webdrake.net writes:
On 06/19/2013 04:49 AM, finalpatch wrote:
I was testing with DMD 2.063
Try the very latest, 2.063.2 ... ? 2.063 was released on 28 May, my fix got
committed on 7 June.
2.063.2 has the same bug. But good to know it's fixed in git
in the declaration of RandomSample it only checks for
isInputRange, but in the implementation it calls the .save()
method which is only available in forward ranges.
as a result this code does not compile:
void main()
{
auto a = File(test.d, r).byLine();
auto s =
On Wednesday, 19 June 2013 at 02:32:46 UTC, Jonathan M Davis
wrote:
On Wednesday, June 19, 2013 03:56:43 finalpatch wrote:
in the declaration of RandomSample it only checks for
isInputRange, but in the implementation it calls the .save()
method which is only available in forward ranges
On Wednesday, 19 June 2013 at 03:46:11 UTC, Joseph Rushton
Wakeling wrote:
On 06/19/2013 02:56 AM, finalpatch wrote:
in the declaration of RandomSample it only checks for
isInputRange, but in the
implementation it calls the .save() method which is only
available in forward
ranges.
What
The following code compiles fine with the command dmd test.d
import std.array, std.algorithm, std.algorithm, std.string;
void main()
{
int[string] s;
auto a = join(map!(a=format(%s %d, a, s[a]))(s.keys));
}
However, if I try to compile it with dmd -inline test.d, I get
this
On Thursday, 13 June 2013 at 06:06:10 UTC, finalpatch wrote:
The following code compiles fine with the command dmd test.d
import std.array, std.algorithm, std.algorithm, std.string;
void main()
{
int[string] s;
auto a = join(map!(a=format(%s %d, a, s[a]))(s.keys));
}
However
in destructor, etc). Its not too hard,
as 'alias this' usage can wrap the pointer's methods easily enough.
--
finalpatch
Richard Webb we...@beardmouse.org.uk writes:
On Wednesday, 12 June 2013 at 14:41:05 UTC, finalpatch wrote:
This feels even more cumbersome than in C++ because in C++ we can
simply
delete this in the Release() method, there's no need to store a
reference in a global place.
Juno does
A typical COM server would create a new object (derived from
IUnknown), return it to the caller (potentially written in other
languages). Because the object pointer now resides outside of D's
managed heap, does that mean the object will be destroyed when
the GC runs? A normal COM object
string mixins and template mixins don't work either.
On Friday, 7 June 2013 at 12:14:45 UTC, finalpatch wrote:
Hi folks,
I need to apply different calling conventions to the same
interfaces when compiling for different platform. It's
something like this:
OSX:
interface InterfaceA
Hi Joseph,
The flags I used
OSX LDC: -O3 -release
WIN GDC: -O3 -fno-bounds-check -frelease
Joseph Rushton Wakeling joseph.wakel...@webdrake.net writes:
On 06/01/2013 04:58 PM, finalpatch wrote:
However I retested on a windows 7 machine with GDC compiler and the results
were
very different
writes:
On 06/02/2013 11:32 AM, finalpatch wrote:
The flags I used
OSX LDC: -O3 -release
WIN GDC: -O3 -fno-bounds-check -frelease
Does adding -inline make a difference to initial performance (i.e. before your
manual interventions)? I guess it's already covered by -O3 in both cases
This form is nice:
int[3] x = [1,2,3];
But it is horribly inefficient because it
1. allocates a dynamic array from the GC
2. writes 1,2,3 to the dynamic array
3. copies the 1,2,3 back to the static array
Or one can write:
int[3] x;
x[0] = 1;
x[1] = 2;
x[2] = 3;
That is a lot of typing, but
Oh cool, good to know this has been fixed. You are right, I just
verified with DMD 2.063 and the generated code was good! Sorry
about the noise.
On Saturday, 1 June 2013 at 06:28:45 UTC, Jonathan M Davis wrote:
Are you sure that that still allocates? I thought that that had
been fixed. If
it
I actually don't feel final by default is a big deal at all. Of
all the factors that caused the poor performance that was
discussed in the original post, final is the least significant
one, only caused 5% to %7 of a speed penalty in a heavily
recursive and looping program. Because of this I
of the changes and their
benefits.
Thanks finalpatch and everyone else for this work.
Andrei
You guys are awesome! I am happy to know that D can indeed offer
comparable speed to C++.
But it also shows there is room for the compiler to improve as
the C++ version also makes heavy use of loops (or STL algorithms)
but they get inlined or unrolled automatically.
On Friday, 31 May 2013
I actually have some experience with C++ template
meta-programming in HD video codecs. My experience is that it is
possible for generic code through TMP to match or even beat hand
written code. Modern C++ compilers are very good, able to
optimize away most of the temporary variables resulting
Just want to share a new way I just discovered to do loop
unrolling.
template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}
after that you can write stuff like
], 3, +));
which becomes: v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]
On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:
Just want to share a new way I just discovered to do loop
unrolling.
template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0
Wow! That's so very cool! We can make it even nicer with
template Unroll(alias CODE, alias N, alias SEP=)
{
enum t = replace(CODE, %, %1$d);
enum Unroll = iota(N).map!(i = format(t, i)).join(SEP);
}
And use % as the placeholder instead of the ugly %1$d:
mixin(Unroll!(v1[%]*v2[%], 3,
Recently I ported a simple ray tracer I wrote in C++11 to D.
Thanks to the similarity between D and C++ it was almost a line
by line translation, in other words, very very close. However,
the D verson runs much slower than the C++11 version. On Windows,
with MinGW GCC and GDC, the C++ version
Hi bearophile,
Thanks for the reply. I changed it to 0..height and it has no
measurable effect to the runtime.
The reason I used iota(height) was to test
std.parallelism.parallel. On Windows if I do foreach (y;
parallel(iota(height))) I do get almost 4x speed up on a quadcore
computer.
Hi Rob,
I have tried put GC.disable() and GC.enable() around the
rendering call and it made no difference.
On Friday, 31 May 2013 at 02:13:36 UTC, Rob T wrote:
I don't know if this is the case with the code in question (I
have not looked at it), but sometimes there will be a
significant
Hi Walter,
Thanks for the reply. I have already tried these flags. However,
DMD's codegen is lagging behind GCC and LLVM at the moment, so
even with these flags, the runtime is ~10x longer than the C++
version compiled with clang++ (2sec with DMD, 200ms with clang++
on a Core2 Mac Pro). I
, Andrei Alexandrescu wrote:
On 5/30/13 9:26 PM, finalpatch wrote:
https://dl.dropboxusercontent.com/u/974356/raytracer.d
https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
Manu's gonna love this one: make all methods final.
Andrei
Hi FeepingCreature,
Thanks for the tip, getting rid of the array constructor helped a
lot, Runtime is down from 800+ms to 583ms (with LDC, still cannot
match C++ though). Maybe I should get rid of all arrays and use
hardcoded x,y,z member variables instead, or use tuples.
On Friday, 31 May
Hi,
I think you are using the version(D_SIMD) path, which is my (not
very successful) attempt at vectorizing the thing.
change the version(D_SIMD) line to version(none) and it will use
the scalar path, which has exactly the same dot() function as
yours.
On Friday, 31 May 2013 at 04:29:19
Thanks Nazriel,
It is very cool you are able to narrow the gap to within 1.5x of
c++ with a few simple changes.
I checked your version, there are 3 changes (correct me if i
missed any):
* Change the (float) constructor from v= [x,x,x] to v[0] = x;
v[1] = x; v[2] = x;
* Get rid of the
60 matches
Mail list logo