> > Hi Denis, > > > > Thanks for your input. So i decided i should use a pyparser and try it > (im a > > relative python noob though!) >
Hi Everyone! I have made some progress, although i believe it mainly due to luck and not a lot of understanding (vague understanding maybe). Hopefully this can help someone else out... This is due to Combine(), that glues (back) together matched string bits. To > work safely, it disables the default separator-skipping behaviour of > pyparsing. So that > real = Combine(integral+fractional) > would correctly not match "1 .2". Right? > See a recent reply by Paul MacGuire about this topic on the pyparsing list > http://sourceforge.net/mailarchive/forum.php?thread_name=FE0E2B47198D4F73B01E263034BDCE3C%40AWA2&forum_name=pyparsing-usersand > the pointer he gives there. > There are several ways to correctly cope with that. > ^ was a useful link - I still sometime struggle with the whitespaces and combine / group... Below is my code that works as I expect (i think...) #!/usr/bin/python import sys from pyparsing import alphas, nums, ZeroOrMore, Word, Group, Suppress, Combine, Literal, OneOrMore, SkipTo, printables, White text=''' [04 Jun 2009] DSA-1812-1 apr-util - several vulnerabilities {CVE-2009-0023 CVE-2009-1955 CVE-2009-1243} [etch] - apr-util 1.2.7+dfsg-2+etch2 [lenny] - apr-util 1.2.12+dfsg-8+lenny2 [01 Jun 2009] DSA-1808-1 drupal6 - insufficient input sanitising {CVE-2009-1844} [lenny] - drupal6 6.6-3lenny2 [01 Jun 2009] DSA-1807-1 cyrus-sasl2 cyrus-sasl2-heimdal - arbitrary code execution {CVE-2009-0688} [lenny] - cyrus-sasl2-heimdal 2.1.22.dfsg1-23+lenny1 [lenny] - cyrus-sasl2 2.1.22.dfsg1-23+lenny1 [etch] - cyrus-sasl2 2.1.22.dfsg1-8+etch1 ''' lsquare = Literal('[') rsquare = Literal(']') lbrace = Literal('{') rbrace = Literal('}') dash = Literal('-') space = White('\x20') newline = White('\n') spaceapp = White('\x20') + Literal('-') + White('\x20') spaceseries = White('\t') date = Combine(lsquare.suppress() + Word(nums, exact=2) + Word(alphas) + Word(nums, exact=4) + rsquare.suppress(),adjacent=False,joinString='-') dsa = Combine(Literal('DSA') + dash + Word(nums, exact=4) + dash + Word(nums, exact=1)) app = Combine(Word(printables) + SkipTo(spaceapp)) desc = Combine(spaceapp.suppress() + ZeroOrMore(Word(alphas)) + SkipTo(newline)) cve = Combine(lbrace.suppress() + OneOrMore(Literal('CVE') + dash + Word(nums, exact=4) + dash + Word(nums, exact=4) + SkipTo(rbrace) + Suppress(rbrace) + SkipTo(newline))) series = OneOrMore(Group(lsquare.suppress() + OneOrMore(Literal('lenny') ^ Literal('etch') ^ Literal('sarge')) + rsquare.suppress() + spaceapp.suppress() + Word(printables) + SkipTo(newline))) record = date + dsa + app + desc + cve + series def parse(text): for data,dataStart,dataEnd in record.scanString(text): yield data for i in parse(text): print i My output is as follows ['04-Jun-2009', 'DSA-1812-1', 'apr-util', 'several vulnerabilities', 'CVE-2009-0023 CVE-2009-1955 CVE-2009-1243', ['etch', 'apr-util', '1.2.7+dfsg-2+etch2'], ['lenny', 'apr-util', '1.2.12+dfsg-8+lenny2']] ['01-Jun-2009', 'DSA-1808-1', 'drupal6', 'insufficient input sanitising', 'CVE-2009-1844', ['lenny', 'drupal6', '6.6-3lenny2']] ['01-Jun-2009', 'DSA-1807-1', 'cyrus-sasl2 cyrus-sasl2-heimdal', 'arbitrary code execution', 'CVE-2009-0688', ['lenny', 'cyrus-sasl2-heimdal', '2.1.22.dfsg1-23+lenny1'], ['lenny', 'cyrus-sasl2', '2.1.22.dfsg1-23+lenny1'], ['etch', 'cyrus-sasl2', '2.1.22.dfsg1-8+etch1']] Thanks for everyone that offered assistance and prodding in right directions. Stefan
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor