Re: [Tutor] Question regular expressions - the non-greedy pattern

Lie Ryan Mon, 21 Jan 2013 16:47:04 -0800

On 22/01/13 10:11, Marcin Mleczko wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello Hugo, hello Walter,

first thank you very much for the quick reply.

The functions used here i.e. re.match() are taken directly form the
example in the mentioned HowTo. I'd rather use re.findall() but I
think the general interpretetion of the given regexp sould be nearly
the same in both functions.

So I'd like to neglect the choise of a particular function for a
moment a concentrate on the pure theory.
What I got so far:
in theory form s = '<<html><head><title>Title</title>'
'<.*?>' would match '<html>' '<head>' '<title>' '</title>'
to achieve this the engine should:
1. walk forward along the text until it finds <
2. walk forward from that point until in finds >
3. walk backward form that point (the one of >) until it finds <
4. return the string between < from 3. and > from 2. as this gives the
least possible string between < and >

Did I get this right so far? Is this (=least possible string between <
and >), what non-greedy really translates to?

Nope, that's not how regex works. In particular, it does not do step 3,non-greedy regex do not walk backward to find the shortest string thatmatches.


What it does for the non-greedy pattern <.*?> is:

1. walk forward along the text until it finds <

2. if the next character matches . (any chars) then step one characterforward, otherwise backtrack from where we left off in the previous step3. if the next char is >, then we find a match, otherwise backtrack fromwhere we left off in the previous step

4. return the string between < from 1 and > from 3

or alternatively

1. walk forward along the text until it finds <
2. walk forward from that point until in finds >
3. return the string between < from 1 and > from 2

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Question regular expressions - the non-greedy pattern

Reply via email to