[Hope you don't mind I copy to the list. Not only it can help others, but 
pyparsing users read tutor, including Paul MacGuire (author).]

Le Thu, 11 Jun 2009 11:53:31 +0200,
Stefan Lesicnik <ste...@lsd.co.za> s'exprima ainsi:

[...]

I cannot really answer precisely for haven't used pyparsing for a while (*).

So, below are only some hints.

> Hi Denis,
> 
> Thanks for your input. So i decided i should use a pyparser and try it (im a
> relative python noob though!)
> 
> This is what i have so far...
> 
> import sys
> from pyparsing import alphas, nums, ZeroOrMore, Word, Group, Suppress,
> Combine, Literal, alphanums, Optional, OneOrMore, SkipTo, printables
> 
> text='''
> [04 Jun 2009] DSA-1812-1 apr-util - several vulnerabilities
>         {CVE-2009-0023 CVE-2009-1955}
>         [etch] - apr-util 1.2.7+dfsg-2+etch2
>         [lenny] - apr-util 1.2.12+dfsg-8+lenny2
> '''
> 
> date = Combine(Literal('[') + Word(nums, exact=2) + Word(alphas) +
> Word(nums, exact=4) + Literal(']'),adjacent=False)
> dsa = Combine(Word(alphanums) + Literal('-') + Word(nums, exact=4) +
> Literal('-') + Word(nums, exact=1),adjacent=False)
> app = Combine(OneOrMore(Word(printables)) + SkipTo(Literal('-')))
> desc = Combine(Literal('-') + ZeroOrMore(Word(alphas)) +
> SkipTo(Literal('\n')))
> cve = Combine(Literal('{') + OneOrMore(Literal('CVE') + Literal('-') +
> Word(nums, exact=4) + Literal('-') + Word(nums, exact=4)) )
> 
> record = date + dsa + app + desc + cve
> 
> fields = record.parseString(text)
> #fields = dsa.parseString(text)
> print fields
> 
> 
> What i get out of this is
> 
> ['[04Jun2009]', 'DSA-1812-1', 'apr-util ', '- several vulnerabilities',
> '{CVE-2009-0023']
> 
> Which i guess it heading towards the right track...

For sure! Rather impressing you could write this so fast. Hope my littel PEG 
grammar helped.
There seems to be some detail issues, such as in the app pattern I would write
   ...+ SkipTo(Literal(' - '))
Also, you could directly Suppress() probably useless delimiters such as [...] 
in date.

Think at post-parse funcs to transform and/or reformat nodes: search for 
setParseAction() and addParseAction() in the doc.

> I am unsure why I am not getting more than 1 CVE... I have the OneOrMore
> match for the CVE stuff...

This is due to Combine(), that glues (back) together matched string bits. To 
work safely, it disables the default separator-skipping behaviour of pyparsing. 
So that
   real = Combine(integral+fractional)
would correctly not match "1 .2". Right?
See a recent reply by Paul MacGuire about this topic on the pyparsing list 
http://sourceforge.net/mailarchive/forum.php?thread_name=FE0E2B47198D4F73B01E263034BDCE3C%40AWA2&forum_name=pyparsing-users
 and the pointer he gives there.
There are several ways to correctly cope with that.

> That being said, how does the parser scale across multiple lines and how
> will it know that its finished?

Basically, you probably should express line breaks explicitely, esp. because 
they seem to be part of the source format.
Otherwise, there is a func or method to define which chars should be skipped as 
separators (default is sp/tab if I remember well).

> Should i maybe look at getting the list first into one entry per line? (must
> be easier to parse then?)

What makes sense I guess is Group()-ing items that *conceptually* build a list. 
In your case, I see:
* CVS items inside {...}
* version entry lines ("[etch]...", "[lenny]...", ...)
* whole records at a higher level

> This parsing is a mini language in itself!

Sure! A kind of rather big & complex parsing language. Hard to know it all well 
(and I don't even speak of all builtin helpers, and even less of all what you 
can do by mixing ordinary python code inside the grammar/parser: a whole new 
field in parsing/processing).

> Thanks for your input :)

My pleasure...

> Stefan

Denis

(*) The reason is I'm developping my own parsing tool; see 
http://spir.wikidot.com/pijnu.
The guide is also intended as a parsing tutorial, it may help, but is not 
exactly up-to-date.
------
la vita e estrany
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to