tolf and detab

2010-08-06 Thread Walter Bright
I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line endings, and because I can never figure out the right settings for git to make it do the canonicalization. tolf - converts LF, CR, and C

Re: tolf and detab

2010-08-06 Thread Andrei Alexandrescu
On 08/06/2010 08:34 PM, Walter Bright wrote: I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line endings, and because I can never figure out the right settings for git to make it do the canon

Re: tolf and detab

2010-08-06 Thread Andrej Mitrovic
Or improve your google-fu by finding some existing tools that do the job right. :) I'm pretty sure Uncrustify is good at most of these issues, not to mention it's a very nice source-code "prettifier/indenter". There's a front-end called UniversalIndentGUI, which has about a dozen integrated versio

Re: tolf and detab

2010-08-06 Thread Walter Bright
Andrej Mitrovic wrote: Or improve your google-fu by finding some existing tools that do the job right. :) Sure, but I suspect it's faster to write the utility! After all, they are trivial.

Re: tolf and detab

2010-08-06 Thread Walter Bright
Andrei Alexandrescu wrote: A good exercise would be rewriting these tools in idiomatic D2 and assess the differences. Some D2-fu would be cool. Any takers?

Re: tolf and detab

2010-08-06 Thread Yao G.
What does idiomatic D means? On Fri, 06 Aug 2010 20:50:52 -0500, Andrei Alexandrescu wrote: On 08/06/2010 08:34 PM, Walter Bright wrote: I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF

Re: tolf and detab

2010-08-06 Thread Andrei Alexandrescu
On 08/06/2010 09:33 PM, Yao G. wrote: What does idiomatic D means? At a quick glance - I'm thinking two elements would be using string and possibly byLine. Andrei

Re: tolf and detab

2010-08-06 Thread Nick Sabalausky
"Yao G." wrote in message news:op.vg1qpcjfxeu...@miroslava.gateway.2wire.net... > > What does idiomatic D means? > "idiomatic D" -> "In typical D style"

Re: tolf and detab

2010-08-07 Thread Jonathan M Davis
On Friday 06 August 2010 18:50:52 Andrei Alexandrescu wrote: > On 08/06/2010 08:34 PM, Walter Bright wrote: > > I wrote these two trivial utilities for the purpose of canonicalizing > > source code before checkins and to deal with FreeBSD's inability to deal > > with CRLF line endings, and because

Re: tolf and detab

2010-08-07 Thread bearophile
Jonathan M Davis: > I would have thought that being more idomatic would have resulted in slower > code > than what Walter did, but interestingly enough, both programs are faster with > my > code. They might take more memory though. I'm not quite sure how to check > that. > In any cases, you w

Re: tolf and detab

2010-08-07 Thread Jonathan M Davis
Jonathan M Davis wrote: > void removeTabs(int tabSize, string fileName) > { > auto file = File(fileName); > string[] output; > > foreach(line; file.byLine()) > { > int lastTab = 0; > > while(lastTab != -1) > { > const int tab = line.indexOf('\t

Re: tolf and detab

2010-08-07 Thread Andrei Alexandrescu
On 08/07/2010 11:04 PM, Jonathan M Davis wrote: On Friday 06 August 2010 18:50:52 Andrei Alexandrescu wrote: A good exercise would be rewriting these tools in idiomatic D2 and assess the differences. Andrei I didn't try and worry about multiline string literals, but here are my more idiomati

Re: tolf and detab

2010-08-07 Thread Andrei Alexandrescu
On 08/07/2010 11:16 PM, bearophile wrote: Jonathan M Davis: I would have thought that being more idomatic would have resulted in slower code than what Walter did, but interestingly enough, both programs are faster with my code. They might take more memory though. I'm not quite sure how to check

Re: tolf and detab

2010-08-07 Thread Jonathan M Davis
On Saturday 07 August 2010 21:59:50 Andrei Alexandrescu wrote: > Very nice. Here's how I'd improve removeTabs: > > #!/home/andrei/bin/rdmd > import std.conv; > import std.file; > import std.getopt; > import std.stdio; > import std.string; > > void main(string[] args) > { > uint tabSize = 8;

Re: tolf and detab

2010-08-07 Thread Walter Bright
Jonathan M Davis wrote: It would certainly be nice to have a way to reasonably process with ranges without having to load the whole thing into memory at once. Because of asynchronous I/O, being able to start processing and start writing the new file before the old one is finished reading shoul

Re: tolf and detab

2010-08-07 Thread Andrei Alexandrescu
On 08/07/2010 11:16 PM, bearophile wrote: In this case a Python version is more readable, shorter and probably faster too because reading the lines of a _normal_ text file is faster in Python compared to D (because Python is more optimized for such purposes. I can show benchmarks on request). T

Re: tolf and detab

2010-08-08 Thread Norbert Nemec
I usually do the same thing with a shell pipe expand | sed 's/ *$//;s/\r$//;s/\r/\n/' On 07/08/10 02:34, Walter Bright wrote: I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line end

Re: tolf and detab

2010-08-08 Thread Walter Bright
Norbert Nemec wrote: I usually do the same thing with a shell pipe expand | sed 's/ *$//;s/\r$//;s/\r/\n/'

Re: tolf and detab

2010-08-08 Thread bearophile
Andrei Alexandrescu: > This makes me think we should have a range that detects and replaces > patterns lazily and on the fly. In Python there is a helper module: http://docs.python.org/library/fileinput.html > I think it's worth targeting D2 to tasks that are usually handled by > scripting lang

Re: tolf and detab

2010-08-08 Thread dsimcha
== Quote from bearophile (bearophileh...@lycos.com)'s article > Jonathan M Davis: > > I would have thought that being more idomatic would have resulted in slower > > code > > than what Walter did, but interestingly enough, both programs are faster > > with my > > code. They might take more memory

Re: tolf and detab

2010-08-08 Thread Nick Sabalausky
"Andrei Alexandrescu" wrote in message news:i3ldk4$2ci...@digitalmars.com... > > Very nice! You may as well guard the write with an if (result != fileStr). > With control source etc. in the mix it's always polite to not touch files > unless you are actually modifying them. > I'm fairly sure SV

Re: tolf and detab

2010-08-08 Thread Nick Sabalausky
"Norbert Nemec" wrote in message news:i3lq17$99...@digitalmars.com... >I usually do the same thing with a shell pipe > expand | sed 's/ *$//;s/\r$//;s/\r/\n/' > Filed under "Why I don't like regex for non-trivial things" ;)

Re: tolf and detab

2010-08-08 Thread Walter Bright
bearophile wrote: In the D code I have added an idup to make the comparison more fair, because in the Python code the "line" is a true newly allocated line, you can safely use it as dictionary key. So it is with byLine, too. You've burdened D with double the amount of allocations. Also, I obj

Re: tolf and detab

2010-08-08 Thread Nick Sabalausky
"bearophile" wrote in message news:i3lb30$26v...@digitalmars.com... > Jonathan M Davis: >> I would have thought that being more idomatic would have resulted in >> slower code >> than what Walter did, but interestingly enough, both programs are faster >> with my >> code. They might take more mem

Re: tolf and detab

2010-08-08 Thread Nick Sabalausky
"Walter Bright" wrote in message news:i3mpnb$2hc...@digitalmars.com... > bearophile wrote: >> In the D code I have added an idup to make the comparison more fair, >> because >> in the Python code the "line" is a true newly allocated line, you can >> safely >> use it as dictionary key. > > So it

Re: tolf and detab

2010-08-08 Thread bearophile
Walter Bright: > bearophile wrote: > > In the D code I have added an idup to make the comparison more fair, because > > in the Python code the "line" is a true newly allocated line, you can safely > > use it as dictionary key. > > So it is with byLine, too. You've burdened D with double the amount

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 12:28 PM, Nick Sabalausky wrote: "Andrei Alexandrescu" wrote in message news:i3ldk4$2ci...@digitalmars.com... Very nice! You may as well guard the write with an if (result != fileStr). With control source etc. in the mix it's always polite to not touch files unless you are actual

Re: tolf and detab

2010-08-08 Thread Walter Bright
bearophile wrote: Walter Bright: bearophile wrote: In the D code I have added an idup to make the comparison more fair, because in the Python code the "line" is a true newly allocated line, you can safely use it as dictionary key. So it is with byLine, too. You've burdened D with double the am

Re: tolf and detab

2010-08-08 Thread bearophile
Walter Bright: > If you want to conclude that Python is better at processing files, you need > to > show it using each language doing it a way well suited to that language, > rather > than burdening one so it uses the same method as the less powerful one. byLine() yields a char[], so if you wa

Re: tolf and detab

2010-08-08 Thread Andrej Mitrovic
Andrei used to!string() in an early example in TDPL for some line-by-line processing. I'm not sure of the advantages/disadvantages of to!type vs .dup. On Sun, Aug 8, 2010 at 11:44 PM, bearophile wrote: > Walter Bright: > > If you want to conclude that Python is better at processing files, you > n

Re: tolf and detab

2010-08-08 Thread bearophile
Andrej Mitrovic: > Andrei used to!string() in an early example in TDPL for some line-by-line > processing. I'm not sure of the advantages/disadvantages of to!type vs .dup. I have modified the code: import std.stdio: File, writeln; import std.conv: to; int process(string fileName) { int tota

Re: tolf and detab

2010-08-08 Thread bearophile
> so it's not a limit of the language, it's Phobos that has a performance bug > that can be improved. I don't know where the performance bug is, maybe it's a matter of GC, not a Phobos performance bug. Bye, bearophile

Re: tolf and detab

2010-08-08 Thread Leandro Lucarella
Andrei Alexandrescu, el 8 de agosto a las 14:44 me escribiste: > On 08/08/2010 12:28 PM, Nick Sabalausky wrote: > >"Andrei Alexandrescu" wrote in message > >news:i3ldk4$2ci...@digitalmars.com... > >> > >>Very nice! You may as well guard the write with an if (result != fileStr). > >>With control s

Re: tolf and detab

2010-08-08 Thread Leandro Lucarella
Nick Sabalausky, el 8 de agosto a las 13:31 me escribiste: > "Norbert Nemec" wrote in message > news:i3lq17$99...@digitalmars.com... > >I usually do the same thing with a shell pipe > > expand | sed 's/ *$//;s/\r$//;s/\r/\n/' > > > > Filed under "Why I don't like regex for non-trivial things" ;

Re: tolf and detab

2010-08-08 Thread Yao G.
On Sun, 08 Aug 2010 16:44:09 -0500, bearophile wrote: Walter Bright: If you want to conclude that Python is better at processing files, you need to show it using each language doing it a way well suited to that language, rather than burdening one so it uses the same method as the less po

Re: tolf and detab

2010-08-08 Thread Andrej Mitrovic
What are you using to time the app? I'm using timeit (from the Windows Server 2003 Resource Kit). I'm getting similar results to yours. Btw, how do you use a warm disk cache? Is there a setting somewhere for that? On Sun, Aug 8, 2010 at 11:54 PM, bearophile wrote: > Andrej Mitrovic: > > > Andrei

Re: tolf and detab

2010-08-08 Thread bearophile
Yao G.: > What's next? Will you demand attribution like the time Andrei > presented the ranges design? Of course. In the end all D will be mine :-) Bye, bearophile

Re: tolf and detab

2010-08-08 Thread Yao G.
On Sun, 08 Aug 2010 17:27:04 -0500, bearophile wrote: Yao G.: What's next? Will you demand attribution like the time Andrei presented the ranges design? Of course. In the end all D will be mine :-) Bye, bearophile :D That was a good comeback. -- Using Opera's revolutionary e-mai

Re: tolf and detab

2010-08-08 Thread bearophile
Andrej Mitrovic: > What are you using to time the app? A buggy utility that is the Windows port of the GNU time command. > Btw, how do you use a warm disk cache? Is there a setting somewhere for that? If you run the benchmarks two times, the second time if you have enough free RAM and your sy

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 04:44 PM, bearophile wrote: Walter Bright: If you want to conclude that Python is better at processing files, you need to show it using each language doing it a way well suited to that language, rather than burdening one so it uses the same method as the less powerful one. byLine

Re: tolf and detab

2010-08-08 Thread Walter Bright
Andrej Mitrovic wrote: Btw, how do you use a warm disk cache? Is there a setting somewhere for that? Just run it several times until the times stop going down.

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 04:48 PM, Andrej Mitrovic wrote: Andrei used to!string() in an early example in TDPL for some line-by-line processing. I'm not sure of the advantages/disadvantages of to!type vs .dup. For example, to!string(someString) does not duplicate the string. Andrei

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 05:17 PM, Yao G. wrote: On Sun, 08 Aug 2010 16:44:09 -0500, bearophile wrote: Walter Bright: If you want to conclude that Python is better at processing files, you need to show it using each language doing it a way well suited to that language, rather than burdening one so it us

Re: tolf and detab

2010-08-08 Thread bearophile
Andrei: >Where does xio derive its performance advantage from?< I'd like to give you a good answer, but I can't. dlibs1 (that you can found online still) has a Python Licence, so to create xio.xfile() I have just translated to D1 the C code of the CPython implementation code of the file object

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 02:32 PM, bearophile wrote: Walter Bright: bearophile wrote: In the D code I have added an idup to make the comparison more fair, because in the Python code the "line" is a true newly allocated line, you can safely use it as dictionary key. So it is with byLine, too. You've burd

Re: tolf and detab

2010-08-08 Thread bearophile
Andrei Alexandrescu: > I think at the end of the day, regardless the relative possibilities of > file reading in the two languages, we should be faster than Python when > allocating one new string per line. For now I suggest you to aim to be just about as fast as Python in this task :-) Beating

Re: tolf and detab

2010-08-08 Thread Andrei Alexandrescu
On 08/08/2010 10:29 PM, bearophile wrote: Andrei Alexandrescu: I think at the end of the day, regardless the relative possibilities of file reading in the two languages, we should be faster than Python when allocating one new string per line. For now I suggest you to aim to be just about as fa

Re: tolf and detab

2010-08-08 Thread Nick Sabalausky
"Leandro Lucarella" wrote in message news:20100808212859.gl3...@llucax.com.ar... > Nick Sabalausky, el 8 de agosto a las 13:31 me escribiste: >> "Norbert Nemec" wrote in message >> news:i3lq17$99...@digitalmars.com... >> >I usually do the same thing with a shell pipe >> > expand | sed 's/ *$//;

Re: tolf and detab

2010-08-08 Thread Walter Bright
Nick Sabalausky wrote: (I'm not genuinely complaining about regexes. They can be very useful. They just tend to get real ugly real fast.) Regexes are like flying airplanes. You have to do them often or you get "rusty" real fast. (Flying is not a natural behavior, it's not like riding a bike.)

Re: tolf and detab

2010-08-08 Thread Kagamin
bearophile Wrote: > I think it minimizes heap allocations, the performance is tuned for a line > length found to be the "average one" for normal files. So I presume if your > text file has very short lines (like 5 chars each) or very long ones (like > 1000 chars each) it becomes less efficient.

Re: tolf and detab

2010-08-09 Thread bearophile
Kagamin: > Don't you minimize heap allocation etc by reading whole file in one io call? The whole thread was about lazy read of file lines. If the file is very large it's not wise to load it all in RAM at once. Bye, bearophile

Re: tolf and detab

2010-08-09 Thread bearophile
Andrei Alexandrescu: > > For now I suggest you to aim to be just about as fast as Python in > > this task :-) Beating Python significantly on this task is probably > > not easy. > > Why? Because it's a core functionality for Python so devs probably have optimized it well, it's written in C, and

Re: tolf and detab

2010-08-09 Thread Michel Fortin
On 2010-08-09 07:12:38 -0400, bearophile said: Kagamin: Don't you minimize heap allocation etc by reading whole file in one io call? The whole thread was about lazy read of file lines. If the file is very large it's not wise to load it all in RAM at once. For non-huge files that can fit

Re: tolf and detab

2010-08-09 Thread Jonathan M Davis
On Monday, August 09, 2010 05:30:33 Michel Fortin wrote: > On 2010-08-09 07:12:38 -0400, bearophile said: > > Kagamin: > >> Don't you minimize heap allocation etc by reading whole file in one io > >> call? > > > > The whole thread was about lazy read of file lines. If the file is very > > large i

Re: tolf and detab

2010-08-09 Thread Andrei Alexandrescu
bearophile wrote: Andrei Alexandrescu: For now I suggest you to aim to be just about as fast as Python in this task :-) Beating Python significantly on this task is probably not easy. Why? Because it's a core functionality for Python so devs probably have optimized it well, it's written in C

Re: tolf and detab

2010-09-30 Thread Bruno Medeiros
On 08/08/2010 14:31, dsimcha wrote: I disagree completely. D is clearly designed from the "simple things should be simple and complicated things should be possible" point of view. If it doesn't work well for these kinds of short scripts then we've failed at making simple things simple and we're

Re: tolf and detab

2010-09-30 Thread Adam D. Ruppe
On Thu, Sep 30, 2010 at 01:16:09PM +0100, Bruno Medeiros wrote: > dsimcha wrote: > "I hate Java and every programming language where a readable hello world > takes more than 3 SLOC" > > That may be your preference, but other people here in the community, me > at least, very much want D to be a "

Re: tolf and detab

2010-09-30 Thread bearophile
Bruno Medeiros: > I think that medium and large > scale projects are simply much more important and interesting than small > scale ones. > I am hoping this would become an *explicit* point of D design goals, if > it isn't already. > And I will campaign against (so to speak), people like you who t

Re: tolf and detab

2010-10-01 Thread Bruno Medeiros
On 30/09/2010 19:31, bearophile wrote: Bruno Medeiros: I think that medium and large scale projects are simply much more important and interesting than small scale ones. I am hoping this would become an *explicit* point of D design goals, if it isn't already. And I will campaign against (so

Re: tolf and detab

2010-10-01 Thread bearophile
Bruno Medeiros: > From my understanding, Scala is a "scalable language" in the sense > that it easy to add new language features, or something similar to that. I see. You may be right. > But I'm missing your point here, what does Ada have to do with this? Ada has essentially died for several r

Re: tolf and detab

2010-10-01 Thread Adam D. Ruppe
On Fri, Oct 01, 2010 at 07:54:01AM -0400, bearophile wrote: > > barring crazy stuff like dynamic scoping) > > I don't know what dynamic scoping is, do you mean that crazy nice thing named > dynamic typing? :-) "Nice"? I hate dynamic typing with a passion! It offers very few benefits and only in

Re: tolf and detab

2010-10-01 Thread Pelle
On 10/01/2010 01:54 PM, bearophile wrote: Bruno Medeiros: From my understanding, Scala is a "scalable language" in the sense that it easy to add new language features, or something similar to that. I see. You may be right. But I'm missing your point here, what does Ada have to do with thi

Re: tolf and detab (language succinctness)

2010-10-05 Thread Bruno Medeiros
On 01/10/2010 12:54, bearophile wrote: Bruno Medeiros: From my understanding, Scala is a "scalable language" in the sense that it easy to add new language features, or something similar to that. I see. You may be right. But I'm missing your point here, what does Ada have to do with this?

Ada, SPARK [Was: Re: tolf and detab (language succinctness)]

2010-10-06 Thread bearophile
Bruno Medeiros: >[About ADA] That "begin" "end " syntax is awful. I already >think just "begin" "end" syntax is bad, but also having to repeat the name of >block/function/procedure/loop at the "end", that's awful.< If you take a look at my dlibs1, probably more than 60_000 lines of D1 code, yo

Re: Ada, SPARK [Was: Re: tolf and detab (language succinctness)]

2010-10-29 Thread Bruno Medeiros
On 06/10/2010 22:48, bearophile wrote: Bruno Medeiros: [About ADA] That "begin" "end" syntax is awful. I already think just "begin" "end" syntax is bad, but also having to repeat the name of block/function/procedure/loop at the "end", that's awful.< If you take a look at my dlibs1, probably

Re: Ada, SPARK [Was: Re: tolf and detab (language succinctness)]

2010-10-29 Thread bearophile
Bruno Medeiros: > I'm not an expert on high-reliability/critical systems, but I had the > impression that the majority of it was written in C (even if with > restricting code guidelines). Or that at least, much more critical > software is written in C than in Ada. Is that not the case? MISRA C