Re: Lexer in D

2013-03-24 Thread Namespace
On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote: On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote: On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are

Re: Lexer in D

2013-03-24 Thread Dmitry Olshansky
of fun. It's tested with 'simple.d', std.stdio and std.datetime. Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Reusing the old and fizzled out thread of a D lexer to discuss a tool created on top

Re: Lexer in D

2013-03-24 Thread Namespace
Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also

Re: Lexer in D

2013-03-24 Thread Brian Schott
On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote: Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to take some time before all would work in general. In my small

Re: Lexer in D

2013-03-24 Thread Namespace
On Sunday, 24 March 2013 at 22:16:45 UTC, Brian Schott wrote: On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote: Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to

Re: Lexer in D

2013-03-23 Thread Namespace
On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote: On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few

Re: Lexer in D

2013-03-21 Thread Namespace
On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few tests, my little parser detects function, struct and variable

Re: Lexer in D

2013-03-15 Thread Namespace
So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.

Re: Lexer in D

2013-03-15 Thread Namespace
So you got rid of array creation. About time ;) Yes, that was the only way to get closer to your measured time. :D

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky
14-Mar-2013 01:19, Namespace пишет: Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far): It's getting better! Total time

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky
14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there:

Re: Lexer in D

2013-03-14 Thread Namespace
Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs.

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky
14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there:

Re: Lexer in D

2013-03-14 Thread Artur Skawina
On 03/14/13 18:24, Dmitry Olshansky wrote: 14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there:

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky
15-Mar-2013 01:17, Artur Skawina пишет: On 03/14/13 18:24, Dmitry Olshansky wrote: 14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve

Re: Lexer in D

2013-03-13 Thread Namespace
Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far): Total time (ms): 5971.66 Repetitions: 200 Sample mode: 28 (101

Re: Lexer in D

2013-03-11 Thread Namespace
Total time (ms): 11374.4 Repetitions: 200 Sample mode: 56 (89 ocurrences) Median time: 56.312 Avg time : 56.872 Std dev. : 2.60698 Minimum: 53.978 Maximum: 81.602 95% conf.int. : [51.7624, 61.9816] e = 5.10959 99% conf.int. :

Re: Lexer in D

2013-03-11 Thread Namespace
Total time (ms): 10941.9 Repetitions: 200 Sample mode: 54 (118 ocurrences) Median time: 54.5855 Avg time : 54.7094 Std dev. : 1.33935 Minimum: 52.535 Maximum: 67.206 95% conf.int. : [52.0843, 57.3345] e = 2.62509 99% conf.int. :

Re: Lexer in D

2013-03-11 Thread FG
On 2013-03-11 16:11, Namespace wrote: I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. Probably just because D's Appender has a better growth scheme than yours and you reallocate too often. Try to compare how often that happens.

Re: Lexer in D

2013-03-11 Thread Namespace
On Monday, 11 March 2013 at 15:21:31 UTC, FG wrote: On 2013-03-11 16:11, Namespace wrote: I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. Probably just because D's Appender has a better growth scheme than yours and you reallocate

Re: Lexer in D

2013-03-11 Thread David
But I do not think you can extend a memory block with C functions. What about realloc?

Re: Lexer in D

2013-03-11 Thread Namespace
On Monday, 11 March 2013 at 15:51:58 UTC, David wrote: But I do not think you can extend a memory block with C functions. What about realloc? That's what I use. But it moves your old memory into the new allocated. Or proves it first if your current memory block could be extended?

Re: Lexer in D

2013-03-11 Thread monarch_dodra
On Monday, 11 March 2013 at 15:56:37 UTC, Namespace wrote: On Monday, 11 March 2013 at 15:51:58 UTC, David wrote: But I do not think you can extend a memory block with C functions. What about realloc? That's what I use. But it moves your old memory into the new allocated. Or proves it

Re: Lexer in D

2013-03-11 Thread Namespace
extends if possible. http://dpaste.dzfl.pl/f86e5db6 import core.stdc.stdlib; void main() { auto p1 = malloc(14); auto p2 = realloc(p1, 15); assert(p1 is p2); } Cool, nice to know. Currently I have this benchmark: Without memory reserve: Total time (ms):

Re: Lexer in D

2013-03-11 Thread monarch_dodra
On Monday, 11 March 2013 at 16:37:00 UTC, Namespace wrote: Seems to be the wrong point of optimizations. Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will

Re: Lexer in D

2013-03-11 Thread Namespace
Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will necessarily do better, but it doesn't hurt to give it a roll. My profile output looks like this:

Re: Lexer in D

2013-03-04 Thread Namespace
I've installed cygwin to use the time benchmark function and that are the results: $ time ./Lexer.exe real0m0.225s user0m0.015s sys 0m0.046s I also read in the std.d.lexer thread that the use of 'char' isn't that good for a lexer, because dmd wants to decode 'char' to 'dchar'.

Re: Lexer in D

2013-03-04 Thread Namespace
With reserving of 60% of the file size in memory I get a big speed boost. Total time (ms): 17544.6 Repetitions: 200 Sample mode: 87 (121 ocurrences) Median time: 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum: 84.238 Maximum

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky
04-Mar-2013 16:17, Namespace пишет: With reserving of 60% of the file size in memory I get a big speed boost. Total time (ms): 17544.6 Repetitions: 200 Sample mode: 87 (121 ocurrences) Median time: 87.3835 Avg time : 87.7232 Std dev. : 1.77905

Re: Lexer in D

2013-03-04 Thread Namespace
Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how. I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky
04-Mar-2013 22:58, Namespace пишет: Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how. One thing that all of these lexers do (dmd's orignal and Dscanner) is get tokens as you need them - lazily. You would always be slower if taking time to

Re: Lexer in D

2013-03-04 Thread Namespace
Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%?

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky
04-Mar-2013 23:19, Namespace пишет: Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%? Basically you copy Token structs and sometimes reallocate the whole

Re: Lexer in D

2013-03-04 Thread Brian Schott
On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote: Brain listed graphs showing under 20 ms on his CPU. for all and 10 for every module in std except std.datetime. https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I checked in the script that I use to

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky
05-Mar-2013 01:35, Brian Schott пишет: On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote: Brain listed graphs showing under 20 ms on his CPU. for all and 10 for every module in std except std.datetime. https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I

Re: Lexer in D

2013-03-03 Thread Namespace
I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Yes I've understood, but currently I want it so. Another thing is to run

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky
03-Mar-2013 18:28, Namespace пишет: I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Yes I've understood, but currently I want

Re: Lexer in D

2013-03-03 Thread Namespace
Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward

Re: Lexer in D

2013-03-03 Thread Namespace
On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Or how

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky
03-Mar-2013 20:19, Namespace пишет: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Of course, not or the whole

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky
03-Mar-2013 20:36, Namespace пишет: On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each

Re: Lexer in D

2013-03-03 Thread Namespace
Just rip off the const why would you need it anyway? The data in 'tokens' are final, so why they should not be const? What exactly is the problem if there are const member? Is this a known problem? Otherwise, I will go and investigate it.

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky
03-Mar-2013 20:45, Namespace пишет: Just rip off the const why would you need it anyway? The data in 'tokens' are final, so why they should not be const? Then just return const(Token) like simple full const. In general I'd avoid marking data as bitwise const unless that is actually true.

Re: Lexer in D

2013-03-03 Thread Jonathan M Davis
On Sunday, March 03, 2013 17:19:25 Namespace wrote: But hasn't each range an array internally? Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.random for instance. They're generative. And if all ranges had arrays underneat the hood, infinite ranges wouldn't be possible

Re: Lexer in D

2013-03-03 Thread Namespace
It's quite amazing, with Appender I reach between 115 and 125 msecs. I adapted the Appender for myself and solve the problem with memcpy/memmove. That works fine. But is there any other I can do (besides the LazyForwardRange) to gain more performance?

Lexer in D

2013-03-02 Thread Namespace
For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 13:33, Namespace пишет: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a

Re: Lexer in D

2013-03-02 Thread David
Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to

Re: Lexer in D

2013-03-02 Thread Namespace
I'm already using these flags. But thanks for your Link I will take a look at it. :)

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why

Re: Lexer in D

2013-03-02 Thread Namespace
With canFind it takes ~300 msecs. :/ Any idea how I could improve it? Simpler way is use first letter and length to select a bucket of keywords. I don't understand how you could do this without AA's and how this could be more performant than canFind.

Re: Lexer in D

2013-03-02 Thread Namespace
I sorted both, types and keywords, alphabetical and wrote my own 'canFind': bool canFind(const size_t size)(ref const string[size] values, string value) pure nothrow { if (value[0] 'm' || value[0] == '_') { foreach (string _val; values) { if

Re: Lexer in D

2013-03-02 Thread Namespace
That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. I think I get it. Did you mean something like this? [code] immutable ubyte[char] typeMap; immutable ubyte[char] keywordMap; static this() { typeMap = [ 'd' :

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 18:53, Namespace пишет: That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. I think I get it. Did you mean something like this? No-no and no. Just translate list of keywords/types to a bunch of

Re: Lexer in D

2013-03-02 Thread Namespace
No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. I don't understand. You

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 22:09, Namespace пишет: No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 22:15, Dmitry Olshansky пишет: 02-Mar-2013 22:09, Namespace пишет: No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 23:01, Namespace пишет: I hope I understand you right this time: http://dpaste.1azy.net/4c2e4428 Was this your idea? With this I reached between 215 and 230 msecs. This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if

Re: Lexer in D

2013-03-02 Thread Zach the Mystic
On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote: 02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with

Re: Lexer in D

2013-03-02 Thread Namespace
This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx 0

Re: Lexer in D

2013-03-02 Thread Namespace
It is sufficient if my array has only 26 units, because I check only lower cases.

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
03-Mar-2013 00:03, Namespace пишет: This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx 0 || idx 26) {

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
02-Mar-2013 23:50, Zach the Mystic пишет: On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote: 02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried

Re: Lexer in D

2013-03-02 Thread Namespace
I never used the -profile flag before (only -conv) so I want to ask if I understand it right: http://dpaste.1azy.net/4382097b It seems that the call which is listed on line 133 1589328 214949 214948 0 pure @trusted dchar

Re: Lexer in D

2013-03-02 Thread Namespace
I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read. The new trace output is here and it seems you're right, isKeyword and isType (yes I'd decided to separate it again and keep both in the lexer) take a lot of time:

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
03-Mar-2013 03:52, Namespace пишет: I think that thanks to your suggestions isKeyword and isType are entirely inlined, because they aren't listet in the trace.log anymore (http://dpaste.1azy.net/b94b19ff). The only thing which makes trouble is the isNext function. Maybe I should also use a

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky
03-Mar-2013 03:12, Namespace пишет: I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read. That's why Walter promotes profilers. They help verify hypothesis on performance and constantly surprise people in what they actually find. The