What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Bernard Helyer
Okay, so I've seen several comments from several people regarding the need for a D lexer in Phobos. I figure I should contribute something to this NG other than misdirected anger, so here it is. SDC has a lexer, and it's pretty much complete. It handles unicode and script lines, and #line and fri

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Bernard Helyer
I have been informed that deadalnix, that wily Frenchman, has already built a range abstraction on top of it. So that's a plus.

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Jakob Ovrum
On Wednesday, 1 August 2012 at 23:06:19 UTC, Bernard Helyer wrote: Okay, so I've seen several comments from several people regarding the need for a D lexer in Phobos. I figure I should contribute something to this NG other than misdirected anger, so here it is. SDC has a lexer, and it's pretty m

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread deadalnix
Le 02/08/2012 01:14, Bernard Helyer a écrit : I have been informed that deadalnix, that wily Frenchman, has already built a range abstraction on top of it. So that's a plus. It shouldn't be included in phobos, but can be useful to test things during dev.

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Walter Bright
On 8/1/2012 4:18 PM, Jakob Ovrum wrote: * Currently files are read in their entirety first, then parsed. It is worth exploring the idea of reading it in chunks lazily. Using an input range will take care of that nicely. * The current result (TokenStream) is a wrapper over a GC-allocated a

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Jakob Ovrum
On Thursday, 2 August 2012 at 04:38:11 UTC, Walter Bright wrote: That's just not going to produce a high performance lexer. The way to do it is in the Lexer instance, have a value which is the current Token instance. That way, in the normal case, one NEVER has to allocate a token instance. O

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-01 Thread Walter Bright
On 8/1/2012 10:31 PM, Jakob Ovrum wrote: On Thursday, 2 August 2012 at 04:38:11 UTC, Walter Bright wrote: That's just not going to produce a high performance lexer. The way to do it is in the Lexer instance, have a value which is the current Token instance. That way, in the normal case, one NEV

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread David Nadlinger
On Thursday, 2 August 2012 at 05:36:37 UTC, Walter Bright wrote: Using a class implies an extra level of indirection, […] Use pass-by-ref for the Token. How is pass-by-ref not an extra level of indirection? David

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Jacob Carlborg
On 2012-08-02 07:31, Jakob Ovrum wrote: Which is exactly why I'm pointing out the current, poor approach. Having a single array with contiguous Tokens for lookahead is completely doable even when Token is a class with some simple GC.malloc and emplace composition. I think SDC's Token class is to

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Bernard Helyer
On Thursday, 2 August 2012 at 07:11:36 UTC, Jacob Carlborg wrote: If you change Token to a struct it takes 64bytes on a LP64 platform. I don't know if that is too big to be passed around by value. That's why I moved Token to a class in the first place. It became far too big and you had to pass

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Jacob Carlborg
On 2012-08-02 09:11, Jacob Carlborg wrote: If you change Token to a struct it takes 64 bytes on a LP64 platform. I don't know if that is too big to be passed around by value. Just for comparison, the type used for tokens in Clang is only 24 bytes. The main reason is the small source location.

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Bernard Helyer
In my dev work I've shaved some bytes off of Token. I removed the filename from Location, as we don't assume the input is a file anymore, and I've changed to tracking line and column numbers as uint instead of size_t. I don't know what kind of number I _should_ be aiming for, but I'd imagine I'm

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Jacob Carlborg
On 2012-08-02 09:26, Bernard Helyer wrote: In my dev work I've shaved some bytes off of Token. I removed the filename from Location, as we don't assume the input is a file anymore, and I've changed to tracking line and column numbers as uint instead of size_t. I don't know what kind of number I

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Bernard Helyer
On Thursday, 2 August 2012 at 07:42:05 UTC, Jacob Carlborg wrote: You can probably shave off a couple of bytes by using a (u)short or (u)byte instead of TokenKind. The TokenKind takes 32 bits, that's way more then what's actually needed. Good point. I think there's 180 ish at the moment, so we

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Jakob Ovrum
On Thursday, 2 August 2012 at 05:36:37 UTC, Walter Bright wrote: Using a class implies an extra level of indirection, and the other issue is the only point to using a class is if you're going to derive from it and override its methods. I don't see that for a Token. Use pass-by-ref for the Tok

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread deadalnix
Le 02/08/2012 07:35, Walter Bright a écrit : Using a class implies an extra level of indirection, and the other issue is the only point to using a class is if you're going to derive from it and override its methods. I don't see that for a Token. Use pass-by-ref for the Token. The fact that re

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Walter Bright
On 8/2/2012 12:04 AM, David Nadlinger wrote: On Thursday, 2 August 2012 at 05:36:37 UTC, Walter Bright wrote: Using a class implies an extra level of indirection, […] Use pass-by-ref for the Token. How is pass-by-ref not an extra level of indirection? If you have a "Lexer" instance that cont

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Walter Bright
On 8/2/2012 12:22 AM, Bernard Helyer wrote: Gonna spend some time massaging this into a Walter-Approved (tm) lexer. It's got some ways to go. Thank you. I've got a lot of experience writing lexers and heavily using them professionally, but I'm still finding ways to make them better - faster -

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Bernard Helyer
On Thursday, 2 August 2012 at 20:05:59 UTC, Walter Bright wrote: On 8/2/2012 12:22 AM, Bernard Helyer wrote: Gonna spend some time massaging this into a Walter-Approved (tm) lexer. It's got some ways to go. Thank you. I've got a lot of experience writing lexers and heavily using them profess

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-02 Thread Walter Bright
On 8/2/2012 7:21 PM, Bernard Helyer wrote: I will make it my mission to kick your (metaphorical performance based) ass, sir. I am looking forward to a good ass-kicking lexer!

Re: What would need to be done to get sdc.lexer to std.lexer quality?

2012-08-03 Thread Nathan M. Swan
On Thursday, 2 August 2012 at 07:22:57 UTC, Bernard Helyer wrote: Gonna spend some time massaging this into a Walter-Approved (tm) lexer. It's got some ways to go. Are there any specific ways I could help? NMS