script too slow?

2002-07-13 Thread Paul Tremblay
I just finished my first version of a script that converts rtf to xml and was wondering if I went about writing it the wrong way. My method was to read in one line at a time and split the lines into tokens, and then to read one token at a time. I used this line to split up the text: @tokens =

Re: script too slow?

2002-07-13 Thread Tanton Gibbs
t; Sent: Saturday, July 13, 2002 10:35 PM Subject: script too slow? > I just finished my first version of a script that converts rtf to > xml and was wondering if I went about writing it the wrong way. > > My method was to read in one line at a time and split the lines > into token

Re: script too slow?

2002-07-13 Thread Marco Antonio Valenzuela Escárcega
maybe you should check this out: http://search.cpan.org/search?dist=RTF-Tokenizer http://search.cpan.org/search?dist=RTF-Parser On Sat, 2002-07-13 at 19:35, Paul Tremblay wrote: > I just finished my first version of a script that converts rtf to > xml and was wondering if I went about writing it

Re: script too slow?

2002-07-13 Thread Paul Tremblay
On Sat, Jul 13, 2002 at 08:08:50PM -0700, Marco Antonio Valenzuela Escárcega wrote: > Subject: Re: script too slow? > > > maybe you should check this out: > http://search.cpan.org/search?dist=RTF-Tokenizer > http://search.cpan.org/search?dist=RTF-Parser > > These modul

Re: script too slow?

2002-07-13 Thread Paul Tremblay
On Sat, Jul 13, 2002 at 10:57:04PM -0400, Tanton Gibbs wrote: > > > I'm not exactly sure what the problems are; however, here are a couple of > things to try > 1.) If you don't need to save the value of each of the subexpressions, then > tell perl so by using ?: after each opening paren. Once I

Re: script too slow?

2002-07-14 Thread John W. Krahn
Paul Tremblay wrote: > > I just finished my first version of a script that converts rtf to > xml and was wondering if I went about writing it the wrong way. > > My method was to read in one line at a time and split the lines > into tokens, and then to read one token at a time. I used this > line

Re: script too slow?

2002-07-14 Thread Paul Tremblay
On Sun, Jul 14, 2002 at 04:45:19AM -0700, John W. Krahn wrote: So your split could be simplified to: > > my @tokens = split /({\\[^\s}{]+|\\[^\s\\}]+|\\[\\}]|})/, $line; > > Ah, that cuts the tokenize process in half. My entire script now takes 40 seconds to run instead of 50. Thanks! Paul

Re: script too slow?

2002-07-15 Thread intexo
I expect you've seen it, but Microsoft have the RTF spec on MSDN: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/RTFSpec_2.asp They've got a fairly detailed description of a "Sample RTF Reader Application" in there, which might be useful in determining the bes

Re: script too slow?

2002-07-15 Thread Janek Schleicher
John W. Krahn wrote at Sun, 14 Jul 2002 13:45:19 +0200: > my @tokens = split /({\\[^\s}{]+|\\[^\s\\}]+|\\[\\}]|})/, $line; > ^^ ^^ ^^ Perhaps we can simplify even this regex: my @tokens = split / ( \\ (?: [^\s{}]+ | [^\s\

Re: script too slow?

2002-07-15 Thread John W. Krahn
Janek Schleicher wrote: > > John W. Krahn wrote at Sun, 14 Jul 2002 13:45:19 +0200: > > > > my @tokens = split /({\\[^\s}{]+|\\[^\s\\}]+|\\[\\}]|})/, $line; > > ^^ ^^ ^^ > > Perhaps we can simplify even this regex: > > my @tokens = split / ( \\ (?: [^\s{}]

Re: script too slow?

2002-07-15 Thread Paul Tremblay
On Mon, Jul 15, 2002 at 10:26:25AM +0200, Janek Schleicher wrote: > > To increase speed, we can make also a lookahead statement: > > my @tokens = split / ( \\ (?=\S)# there's never a whitespace > (?: [^\s{}]+ | > [^\s\\}]+ | >

Re: script too slow?

2002-07-15 Thread Janek Schleicher
John W. Krahn wrote at Mon, 15 Jul 2002 15:16:38 +0200: > Janek Schleicher wrote: >> >> John W. Krahn wrote at Sun, 14 Jul 2002 13:45:19 +0200: >> > >> > my @tokens = split /({\\[^\s}{]+|\\[^\s\\}]+|\\[\\}]|})/, $line; >> > ^^ ^^ ^^ >> >> Perhaps we can si

Re: script too slow?

2002-07-15 Thread Janek Schleicher
Paul Tremblay wrote at Mon, 15 Jul 2002 18:45:21 +0200: > On Mon, Jul 15, 2002 at 10:26:25AM +0200, Janek Schleicher wrote: > > >> To increase speed, we can make also a lookahead statement: >> >> my @tokens = split / ( \\ (?=\S)# there's never a whitespace >>