A friend of mine got bitten by an expectations bug. he was using re.findall to look for all occurances of strings matching a pattern, and a substring he *knew* was in there did not pop out.
the bug was that it overlapped another matching substring, and findall only returns non-overlapping strings. This is documented; he just missed it. But he asked me, is there a standard method to get even overlapped strings? Cut to its basics, here's an artificial example: >>> import re >>> rexp=re.compile("B.B") >>> sequence="BABBEBIB" >>> rexp.findall(sequence) ['BAB', 'BEB'] What he would have wanted was the list ['BAB', 'BEB', 'BIB']; but since the last 'B' in "BEB" is also the firt 'B' in "BIB", "BIB" is not picked up. After looking through the docs, I couldn't find a way to do this in standard methods, so I gave him a quick RYO solution: >>> def myfindall(regex, seq): ... resultlist=[] ... pos=0 ... ... while True: ... result = regex.search(seq, pos) ... if result is None: ... break ... resultlist.append(seq[result.start():result.end()]) ... pos = result.start()+1 ... return resultlist ... >>> myfindall(rexp,sequence) ['BAB', 'BEB', 'BIB'] But just curious; are we reinventing the wheel here? Is there already a way to match even overlapping substrings? I'm surprised I can't find one. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor