FW: HTML Strip and PRE tag?

2004-05-06 Thread Boris Shor
Hello, I am using the HTML::Strip module to strip the HTML tags off of source files, which I need to process. But it seems that anything after a PRE tag is ignored. For example, in the file http://www.legis.state.ia.us/GA/76GA/Session.2/SJournal/Day/0228.html the vast majority of the text is

RE: FW: HTML Strip and PRE tag?

2004-05-06 Thread Boris Shor
Wiggins, Thanks for writing back. -Original Message- From: Wiggins d Anconia [mailto:[EMAIL PROTECTED] Sent: Thursday, May 06, 2004 2:48 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: FW: HTML Strip and PRE tag? Hello, I am using the HTML::Strip module to strip

RE: Regular expression question: non-greedy matches

2004-04-21 Thread Boris Shor
Joseph, Thanks for writing and the advice. Here's another crack at the question. -Original Message- From: R. Joseph Newton [mailto:[EMAIL PROTECTED] Sent: Monday, April 05, 2004 5:39 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 'Stuart V. Jordan' Subject:

RE: Regular expression question: non-greedy matches

2004-04-05 Thread Boris Shor
on the 'nays' or $2. -Original Message- From: Randy W. Sims [mailto:[EMAIL PROTECTED] Sent: Sunday, April 04, 2004 9:30 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Regular expression question: non-greedy matches Boris Shor wrote: Hello, Perl beginner here. I am having

FW: Regular expression question: non-greedy matches

2004-04-05 Thread Boris Shor
Thanks for writing. I get no warnings when I use (ActiveState Perl on Windows): use Strict; use Warnings; $test = Yea 123xrandomYea 456xdumdumNay 789xpop; while ($test =~ /Yea (.*?)x.*?(Nay (.*?)x)?/g) { print $1\n; print $2\n; } What I am looking for are pairs: $1 will

Regular expression question: non-greedy matches

2004-04-04 Thread Boris Shor
Hello, Perl beginner here. I am having difficulty with a regular expression that uses non-greedy matches. Here is a sample code snippet: $test = Yea 123xrandomYea 456xdumdumNay 789xpop; while ($test =~ /Yea (.*?)x.*?(?:Nay (.*?)x)?/g) { print $1\n; print $2\n; } The idea

Regular expression with lookbehinds question

2004-02-26 Thread Boris Shor
Hello everyone, I'm trying to implement the following regular expression with a lookbehind: $e1 = ','; $aye =~ s/(?!($e1))\s/\n/g; So this expression replaces spaces with newlines except when they are immediately preceded by a comma. But when I change $e1 = ',|R\.' (English: comma or R.), I

TokeParser and get_trimmed_text question

2004-01-29 Thread Boris Shor
Hello, New Perl programmer here. I am using HTML::TokeParser to parse HTML files. It is really very useful. In particular, I use the get_trimmed_text() function quite a bit to extract tag-free text from HTML files. I usually use the function in this fashion: $x = $p -

Glob and space in directory name

2003-11-26 Thread Boris Shor
Why does the following work (eg, give me an array filled with matching file names): @filelist = glob(w:/stleg/Colorado/House_98/*.htm); And when I rename the directory to House 98 (space instead of underscore), the following does not: @filelist = glob(w:/stleg/Colorado/House 98/*.htm); Thanks!

TokeParser help

2003-11-19 Thread Boris Shor
Hello, I am a Perl newcomer, and I'm trying to use the TokeParser module to extract text from an HTML file. Here's the Perl code: use HTML::TokeParser; my $p = HTML::TokeParser-new(test.htm); while ($p - get_tag('b')) { print $p - get_text(),\n; } This works only on bold tags that