On Sun, Aug 23, 2009 at 09:02:53AM -0700, James Thiele wrote:
> If I'm reading this correctly, you want to to verify that the full string
> matches "(AB+)+" and then print it followed by the submatches of "AB+" .
> Combining your code with Bryan's suggestion:
> #!/usr/bin/env python
> import re
> ptn = re.compile("^((AB+)+)$")
> str = "ABABBABBBABBBBABBBBBABBBBBB"
> if ptn.match(str):
> print str, re.findall('(AB+)', str)
Thanks for your help. I had simplified my example, but this solves the
core problem. Here's an extract from the actual data and my application
of your suggestions:
#!/usr/bin/env python
import re
wpx = re.compile("WPX/(\d+)(,([-+]?\d+\.\d*e[-+]\d+))+")
floats = re.compile("[-+]?\d+\.\d*e[-+]\d+")
#
lines = [
"WPX/1,8.2954231790e+006,1.0133209480e+005,1.7395780740e-004",
"WPX/2,2.739e+06,3.301e+04,-8.822e+00,-4.688e+00,-1.443e-01,-6.109e-02",
"WPX/3,1.3e+5,6.2e+2,-1.7e-1,-1.8e+1,-4.3e-3,-2.1e-5,-7.4e-2,-2.6-5,7.2e-7,1.0e-6",
"Other stuff", ]
info = {"WPX":[], }
#
for line in lines:
mo = wpx.search(line)
if mo:
info["WPX"].append([int(mo.group(1))]+map(float,floats.findall(line)))
continue
#
# ... much later ...
#
for value in info["WPX"]:
print value
The technique works for this case, but it seems a bit fragile. I still
wonder if there isn't a more robust method which would work for a messier
collection of nested groups. Perhaps I'll have to revert to traditional
parsing when that case appears.
--
Randolph Bentson
[email protected]