On Sun, Aug 23, 2009 at 09:02:53AM -0700, James Thiele wrote:
> If I'm reading this correctly, you want to to verify that the full string
> matches "(AB+)+" and then print it followed by the submatches of "AB+"  .
> Combining your code with Bryan's suggestion:
>    #!/usr/bin/env python
>    import re
>    ptn = re.compile("^((AB+)+)$")
>    str = "ABABBABBBABBBBABBBBBABBBBBB"
>    if ptn.match(str):
>        print str, re.findall('(AB+)', str)

Thanks for your help.  I had simplified my example, but this solves the
core problem. Here's an extract from the actual data and my application
of your suggestions:
    #!/usr/bin/env python
    import re
    wpx = re.compile("WPX/(\d+)(,([-+]?\d+\.\d*e[-+]\d+))+")
    floats = re.compile("[-+]?\d+\.\d*e[-+]\d+")
    #
    lines = [
        "WPX/1,8.2954231790e+006,1.0133209480e+005,1.7395780740e-004",
        "WPX/2,2.739e+06,3.301e+04,-8.822e+00,-4.688e+00,-1.443e-01,-6.109e-02",
        
"WPX/3,1.3e+5,6.2e+2,-1.7e-1,-1.8e+1,-4.3e-3,-2.1e-5,-7.4e-2,-2.6-5,7.2e-7,1.0e-6",
        "Other stuff", ]
    info = {"WPX":[], }
    #
    for line in lines:
        mo = wpx.search(line)
        if mo:
            
info["WPX"].append([int(mo.group(1))]+map(float,floats.findall(line)))
            continue
    #
    # ... much later ...
    #
    for value in info["WPX"]:
        print value

The technique works for this case, but it seems a bit fragile.  I still
wonder if there isn't a more robust method which would work for a messier
collection of nested groups. Perhaps I'll have to revert to traditional
parsing when that case appears.

-- 
Randolph Bentson
[email protected]

Reply via email to