Robert,
Thank you for pointing me to the relevant part of TFM that I needed to R.
The final regular expression that I have settled on that is reliably
producing expected results is:
#li(.*)br#isU
This finds all the text between a li and br tag.
I found that it was helpful for me to restrict
Robert,
Thank you for replying.
Check out the greediness modifier. Greediness determines whether it
extends the matching to the largest possible match or the smallest
possible match. By default regexes are greedy.
By greediness modifier, do you mean the preg_set_match, the
preg_set_order,
On Sun, 2006-08-06 at 10:39 +0900, Dave M G wrote:
Robert,
Thank you for replying.
Check out the greediness modifier. Greediness determines whether it
extends the matching to the largest possible match or the smallest
possible match. By default regexes are greedy.
By greediness
Dave M G wrote:
PHP List,
Recently I wrote a piece of code to scrape data from an HTML page.
Part of that code deleted all the unwanted text from the very top of the
page, where it says !DOCTYPE, all the way down to the first instance
of a ul tag.
That code looks like this:
On 04/08/06, Chris [EMAIL PROTECTED] wrote:
Dave M G wrote:
PHP List,
Recently I wrote a piece of code to scrape data from an HTML page.
Part of that code deleted all the unwanted text from the very top of the
page, where it says !DOCTYPE, all the way down to the first instance
of a ul
Chris, Ligaya, Dave,
Thank you for responding. I understand the difference in principle
between ereg and preg much better now.
Chris wrote:
! in perl regular expressions means not so you need to escape it:
\!
Still, when including that escape character, the following preg
expression does
On 04 August 2006 10:52, Dave M G wrote:
Chris, Ligaya, Dave,
Thank you for responding. I understand the difference in principle
between ereg and preg much better now.
Chris wrote:
! in perl regular expressions means not so you need to escape it:
\!
AFAIR, that's only true in the (?!
Jochem,
Thank you for responding.
does this one work?:
preg_replace('#^\!DOCTYPE(.*)ul[^]*#is', '', $htmlPage);
Yes, that works. I don't think I would have every figured that out on my
own - it's certainly much more complicated than the ereg equivalent.
If I may push for just one more
On 04/08/06, Dave M G [EMAIL PROTECTED] wrote:
It seemed that the main difference was that preg_replace required
forward slashes around the regular expression, like so:
preg_replace(/!DOCTYPE(.*)ul/, , $htmlPage);
It requires delimiters - slashes are conventional, but other
characters can be
Dave M G wrote:
Chris, Ligaya, Dave,
Thank you for responding. I understand the difference in principle
between ereg and preg much better now.
Chris wrote:
! in perl regular expressions means not so you need to escape it:
\!
Still, when including that escape character, the following
On 04 August 2006 11:30, Dave M G wrote:
Jochem,
Thank you for responding.
does this one work?:
preg_replace('#^\!DOCTYPE(.*)ul[^]*#is', '', $htmlPage);
Yes, that works. I don't think I would have every figured
that out on my
own - it's certainly much more complicated than the
Dave M G wrote:
Jochem,
Thank you for responding.
does this one work?:
preg_replace('#^\!DOCTYPE(.*)ul[^]*#is', '', $htmlPage);
Yes, that works. I don't think I would have every figured that out on my
own - it's certainly much more complicated than the ereg equivalent.
1. the '^' at
Ford, Mike wrote:
On 04 August 2006 11:30, Dave M G wrote:
Jochem,
...
That's where capturing expressions and backreferences come in handy:
preg_replace (/.*li(.*)br.*/, $1, $htmlPage);
(add qualifiers and other options to taste, as before!)
ah yes, good point - you can do it
Jochem,
Thank you for responding, and for explaining more about regular expressions.
yes but you wouldn't use preg_replace() but rather preg_match() or
preg_match_all()
which gives you back an array (via 3rd/4th[?] reference argument) which contains
the texts that matched (and therefore want
Dave M G wrote:
Jochem,
Thank you for responding, and for explaining more about regular
expressions.
yes but you wouldn't use preg_replace() but rather preg_match() or
preg_match_all()
which gives you back an array (via 3rd/4th[?] reference argument)
which contains
the texts that
Dave M G wrote:
PHP List,
Recently I wrote a piece of code to scrape data from an HTML page.
Part of that code deleted all the unwanted text from the very top of the
page, where it says !DOCTYPE, all the way down to the first instance
of a ul tag.
That code looks like this:
On Fri, 2006-08-04 at 13:03 -0400, John Nichel wrote:
Perl compatible regexs are faster* and more powerful. Course, writing a
good Perl regex is an art form in itself (probably why O'Reilly released
a book just on regexs), and takes some time (and headaches) to master
(if one ever does
Jochem
Thank you for your continued assistance.
^--- remove the caret as you dont want to only match when the line
starts with li (the li can be anywhere on the line)
Ah, I get it now. I was confused about the meaning of the caret.
I'll assume you also have the mb extension setup.
On Sat, 2006-08-05 at 10:50 +0900, Dave M G wrote:
Jochem
Thank you for your continued assistance.
^--- remove the caret as you dont want to only match when the line
starts with li (the li can be anywhere on the line)
Ah, I get it now. I was confused about the meaning of the caret.
PHP List,
Recently I wrote a piece of code to scrape data from an HTML page.
Part of that code deleted all the unwanted text from the very top of the
page, where it says !DOCTYPE, all the way down to the first instance
of a ul tag.
That code looks like this:
ereg_replace(!DOCTYPE(.*)ul, ,
Dave M G wrote:
PHP List,
Recently I wrote a piece of code to scrape data from an HTML page.
Part of that code deleted all the unwanted text from the very top of the
page, where it says !DOCTYPE, all the way down to the first instance
of a ul tag.
That code looks like this:
21 matches
Mail list logo