Re: [Tutor] Parsing problem

Liam Clarke Wed, 20 Jul 2005 05:05:36 -0700

Well, I've been poking around and... well.. this is way better than writing complex regexes.

To suit my needs, I need something that can handle -

foo = bar
foo = 20
foo = { bar 20 }
foo = { bar = 20 baz}
foo = {bar = 20 baz { dave henry}}

OK, so the last one's extreme. So far, I can handle down to foo = { bar 20 }, but it looks ugly, so some feedback on my very rough usage of pyparsing would be great.

>>> from pyparsing import Word, Suppress, alphas, nums
>>> q = (Word(alphas) + Suppress("=") + ( ( Word(nums) | Word(alphas) ) | ( Suppress("{") + pyparsing.ZeroOrMore( Word (alphas) | Word(nums) ) + Suppress("}" ) ) ) )
>>> q.parseString("foo = bar").asList()
['foo', 'bar']
>>> q.parseString("a = 23").asList()
['a', '23']
>>> q.parseString(" foo = { bar baz 23 }").asList()
['foo', 'bar', 'baz', '23']

Yeech.

I'm sure I can shorten that a whole lot ( I just found alphanums in the manual, d'oh. ), but it works pretty good out of the box. Thanks for the heads up.

Couple of queries -

I think I understand Danny's example of circular references.

------
Value << (Symbol | Sequence)
Sequence << (pyparsing.Suppress("{") +
pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
pyparsing.Suppress("}"))
------

Sequence depends on Value for it's *ahem* value, but Value depends on Sequence for it's value, so I'll play with that.

Is anyone able to post an example of returning dictionaries from ParsingResults? If so, it would be brilliant.

The documentation states -
"the Dict class generates dictionary entries using the data of the input text - in addition to ParseResults listed as [ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ] it also acts as a dictionary with entries defined as { a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] };"

Problem is, I haven't figured out how to use it yet, I know I could use pyparsing.Group(stuff) to ensure proper key:value pairings.

Thanks for the pointers so far, feeling very chuffed with myself for managing to get this far, I had strayed into VBA territory, it's nice to work with real objects again.

And of course, always open to being shown the simple, elegant way. ;)

Many thanks,

Liam Clarke

On 7/19/05, Liam Clarke <[EMAIL PROTECTED]> wrote:

Thanks guys, I daresay I will have a lot of questions regarding this, but at least I have a point to start digging and a better shovel!

Cheers,

Liam Clarke

On 7/19/05, Danny Yoo < [EMAIL PROTECTED]> wrote:

On Mon, 18 Jul 2005, Liam Clarke wrote:

> country = {
> tag = ENG
> ai = {
> flags = { }
> combat = { DAU FRA ORL PRO }
> continent = { }
> area = { }
> region = { "British Isles" "NorthSeaSea" "ECAtlanticSea" "NAtlanticSea"
> "TagoSea" "WCAtlanticSea" }
> war = 60
> ferocity = no
> }
> }

[Long message ahead; skip if you're not interested.]

Kent mentioned PyParsing,

     http://pyparsing.sourceforge.net/

which is a really excellent system.  Here's a demo of what it can do, just
so you have a better idea what pyparsing is capable of.

(For the purposes of this demo, I'm doing 'import pyparsing', but in real
usage, I'd probably use 'from pyparsing import ...' just to make things
less verbose.)

Let's say that we want to recognize a simpler subset of the data that you
have there, something like:

    { fee fie foo fum }

And let's imagine that we have a function parse() that can take a string
like:

######
>>> testString = """
... { fee fie foo fum }
... """
######

This imaginary parse() function could turn that into something that looks
like a Python value, like this:

######
>>> parse(testString)
(["fee", "fie", "foo", "fum"])
######

That's our goal; does this make sense so far?  So how do we start?

Instead of going at the big goal of doing:

    country = { fee fie foo fum }

let's start small by teaching our system how to recognize the innermost
parts, the small things like fee or foo.  Let's start there:

######
>>> Symbol = pyparsing.Word(pyparsing.alphas)
######

We want a Symbol to be able to recognize a "Word" made up of alphabetic
letters.  Does this work?

######
>>> Symbol.parseString("fee")
(['fee'], {})
#######

Symbol is now a thing that can parse a string, and return a list of
results in a pyparsing.ParseResults object.

Ok, if we can recognize Symbols, let's go for the jugular:

    { fee fie foo fum }

Let's call this a Sequence.

######
>>> Sequence = "{" + pyparsing.ZeroOrMore (Symbol) + "}"
######

A Sequence is made up of zero or more Symbols.

Wait, let's change that, for a moment, to "A Sequence is made up of zero
or more Values."  (You'll see why in a moment.  *grin*)

If we turn toward this strange way, then we need a definition for a Value:

######
>>> Value = Symbol
######

and now we can say that a Sequence is a bunch of Values:

######
>>> Sequence = "{" + pyparsing.ZeroOrMore(Value) + "}"
######

Let's try this out:

######
>>> Sequence.parseString('{ fee fie    foo fum}')
(['{', 'fee', 'fie', 'foo', 'fum', '}'], {})
######

This is close, but it's not quite right: the problem is that we'd like to
somehow group the results all together in a list, and without the braces.
That is, we actually want to see:

    [['fee', 'fie', 'foo', 'fum']]

in some form.  (Remember, we want a list of a single result, and that
result should be our Sequence.)

How do we get this working?  We have to tell pyparsing to "Group" the
middle elements together in a collection, and to "suppress" the braces
from the result.

Here we go:

######
>>> Sequence = (pyparsing.Suppress("{") +
...             pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
...             pyparsing.Suppress ("}"))
######

Does this work?

######
>>> Sequence.parseString('{ fee fie    foo fum}')
([(['fee', 'fie', 'foo', 'fum'], {})], {})
######

That looks a little messy and more nested than expected.

Actually, what's happening is that we're looking at that
pyparsing.ParseResults object, so there's more nesting in the string
representation than what's really there.  We can use the ParseResults's
asList() method to make it a little easier to see what the real result
value looks like:

######
>>> Sequence.parseString('{ fee fie    foo fum}').asList()
[['fee', 'fie', 'foo', 'fum']]
######

That's better.

Out of curiosity, wouldn't it be neat if we could parse out something like
this?

     { fee fie {foo "fum"} }

*cough* *cough*

What we'd like to do is make Sequence itself a possible value.  The
problem is that then there's a little circularity involved:

### Illegal PyParsing pseudocode  ###
Value = Symbol | Sequence

Sequence = (pyparsing.Suppress("{") +
            pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
            pyparsing.Suppress ("}"))
######

The problem is that Value can't be defined before Sequence is, and
vice-versa.  We break this problem by telling PyParsing "ok, the following
rules will come up soon" and "forward" define them:

######
>>> Value = pyparsing.Forward()
>>> Sequence = pyparsing.Forward()
######

and once we have these forward declarations, we can then reconnect them to
their real definitions by using '<<'.  (This looks bizarre, but it applies
just to rules that are Forward()ed.)

######
Value    << (Symbol | Sequence)
Sequence << (pyparsing.Suppress("{") +
             pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
             pyparsing.Suppress("}"))
######

Let's try it:

######
>>> Value.parseString(' { fee fie {foo fum} } ').asList()
[['fee', 'fie', ['foo', 'fum']]]
######

Cool.

Ok, that was a little artificial, but oh well.  The idea is we now know
how to say:

    A Value is either a Symbol or Sequence

and

    A Sequence is a bunch of Values

without getting into trouble with pyparsing, and that's important whenever
we're dealing with things that have recursive structure... like:

    country = {
               tag = ENG
               ai = {
                     flags = { }
                     combat = { DAU FRA ORL PRO }
                     continent = { }
                     area = { }
                     region = { "British Isles"
                                "NorthSeaSea"
                                "ECAtlanticSea"
                                "NAtlanticSea"
                                "TagoSea"
                                "WCAtlanticSea" }
                     war = 60
                     ferocity = no }
               }

Anyway, this is a really fast whirlwind tour of pyparsing, with some
intentional glossing-over of hard stuff, just so you get a better idea of
the core of parsing.  Sorry if it went fast.  *grin*

If you have questions, please feel free to ask!

--
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.'

--
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.'

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

Reply via email to