A friend of mine got bitten by an expectations bug.  he was using 
re.findall to look for all occurances of strings matching a pattern, and a 
substring he *knew* was in there did not pop out.

the bug was that it overlapped another matching substring, and findall 
only returns non-overlapping strings.  This is documented; he just missed 
it.

But he asked me, is there a standard method to get even overlapped
strings?

Cut to its basics, here's an artificial example:

>>> import re
>>> rexp=re.compile("B.B")
>>> sequence="BABBEBIB"
>>> rexp.findall(sequence)
['BAB', 'BEB']

What he would have wanted was  the list ['BAB', 'BEB', 'BIB']; but since 
the last 'B' in "BEB" is also the firt 'B' in "BIB", "BIB" is not picked 
up.

After looking through the docs, I couldn't find a way to do this in 
standard methods, so I gave him a quick RYO solution:

>>> def myfindall(regex, seq):
...    resultlist=[]
...    pos=0
...
...    while True:
...       result = regex.search(seq, pos)
...       if result is None:
...          break
...       resultlist.append(seq[result.start():result.end()])
...       pos = result.start()+1
...    return resultlist
...
>>> myfindall(rexp,sequence)
['BAB', 'BEB', 'BIB']

But just curious; are we reinventing the wheel here?  Is there already a 
way to match even overlapping substrings?  I'm surprised I can't find one.

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to