Re: [Tutor] Parsing problem

Paul McGuire Mon, 25 Jul 2005 14:09:27 -0700

Liam -

I just uploaded an update to pyparsing, version 1.3.2, that should fix the
problem with using nested Dicts.  Now you won't need to use [0] to
dereference the "0'th" element, just reference the nested elements as a.b.c,
or a["b"]["c"].


-- Paul 


-----Original Message-----
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 24, 2005 10:21 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

That is fantastic. It works, and using that pp.group is the key with the
nested braces. 

I just ran this on the actual file after adding a few more possible values
inside the group, and it parsed the entire header structure rather nicely.

Now this will probably sound silly, but from the bit 

header = {...
...
}

it continues on with 

province = {...
} 

and so forth. 

Now, once it reads up to the closing bracket of the header section, it
returns that parsed nicely. 
Is there a way I can tell it to continue onwards? I can see that it's
stopping at one group.

Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over
my head.

I've tried this - 

Code http://www.rafb.net/paste/results/3Dm7FF35.html
Current data http://www.rafb.net/paste/results/3cWyt169.html

assignment << (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) 

to try and continue the parsing, but no luck.

I've been running into the 

 File "c:\python24\Lib\site-packages\pyparsing.py", line 1427, in parseImpl
    raise maxException
pyparsing.ParseException: Expected "}" (at char 742), (line:35, col:5) 

hassle again. From the CPU loading, I'm worried I've got myself something
very badly recursive going on, but I'm unsure of how to use validate()

I've noticed that a few of the sections in between contain values like this
- 

foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }

and so I've stuck pp.empty into my RHS possible values. What unintended side
effects may I get from using pp.empty? From the docs, it sounds like a
wildcard token, rather than matching a null.

Using pp.empty has resolved my apparent problem with empty {}'s causing my
favourite exception, but I'm just worried that I'm casting my net too wide.

Oh, and, if there's a way to get a 'last line parsed' value so as to start
parsing onwards, it would ease my day, as the only way I've found to get the
whole thing parsed is to use another x = { ... } around the whole of the
data, and now, I'm only getting the 'x' returned, so if I could parse by
section, it would help my understanding of what's happening. 

I'm still trial and error-ing a bit too much at the moment.

Regards, 

Liam Clarke





On 7/24/05, Paul McGuire <[EMAIL PROTECTED]> wrote:

        Liam -
        
        Glad you are sticking with pyparsing through some of these
idiosyncracies!
        
        One thing that might simplify your life is if you are a bit more
strict on
        specifying your grammar, especially using pp.printables as the
character set
        for your various words and values.  Is this statement really valid?
        
        Lw)r*)*dsflkj = sldjouwe)r#jdd
        
        According to your grammar, it is.  Also, by using printables, you
force your
        user to insert whitespace between the assignment target and the
equals sign. 
        I'm sure your users would like to enter a quick "a=1" once in a
while, but
        since there is no whitespace, it will all be slurped into the
left-hand side
        identifier.
        
        Let's create two expressions, LHS and RHS, to dictate what is valid
on the 
        left and right-hand side of the equals sign.  (Well, it turns out I
create a
        bunch of expressions here, in the process of defining LHS and RHS,
but
        hopefullly, this will make some sense):
        
        EQUALS = pp.Suppress ("=")
        LBRACE = pp.Suppress("{")
        RBRACE = pp.Suppress("}")
        identifier = pp.Word(pp.alphas, pp.alphanums + "_")
        integer = pp.Word(pp.nums+"-+", pp.nums)
        assignment = pp.Forward()
        LHS = identifier
        RHS = pp.Forward().setName("RHS")
        RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group(
LBRACE +
        pp.OneOrMore(assignment) + RBRACE ) )
        assignment << pp.Group( LHS + EQUALS + RHS )
        
        I leave it to you to flesh out what other possible value types can
be
        included in RHS.
        
        Note also the use of the Group.  Try running this snippet with and
without
        Group and see how the results change.  I think using Group will help
you to 
        build up a good parse tree for the matched tokens.
        
        Lastly, please note in the '<<' assignment to RHS that the
expression is
        enclosed in parens.  I originally left this as
        
        RHS << pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE
+
        pp.OneOrMore(assignment) + RBRACE )
        
        And it failed to match!  A bug! In my own code!  The shame...
        
        This fails because '<<' has a higher precedence then '^', so RHS
only worked 
        if it was handed a quoted string.  Probably good practice to always
enclose
        in quotes the expression being assigned to a Forward using '<<'.
        
        -- Paul
        
        
        -----Original Message-----
        From: Liam Clarke [mailto: [EMAIL PROTECTED]
        Sent: Saturday, July 23, 2005 9:03 AM
        To: Paul McGuire
        Cc: tutor@python.org
        Subject: Re: [Tutor] Parsing problem
        
        *sigh* I just read the documentation more carefully and found the
difference
        between the
        | operator and the ^ operator.
        
        Input -
        
        j = { line = { foo = 10 bar = 20 } }
        
        New code
        
        sel = pp.Forward ()
        values = ((pp.Word(pp.printables) + pp.Suppress("=") +
        pp.Word(pp.printables)) ^ sel)
        sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
+
        pp.OneOrMore(values) + pp.Suppress("}"))
        
        Output -
        
        (['j', 'line', 'foo', '10', 'bar', '20'], {})
        
        My apologies for the deluge.
        
        Regards,
        
        Liam Clarke
        
        
        On 7/24/05, Liam Clarke < [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> > wrote:
        
                Hmmm... just a quick update, I've been poking around and I'm
        obviously making some error of logic.
        
                Given a line -
        
                 f = "j = { line = { foo = 10 bar = 20 } }" 
        
                And given the following code -
        
                select = pp.Forward()
                select <<
                pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
+
                pp.OneOrMore ( (pp.Word(pp.printables) + pp.Suppress("=") +
                pp.Word(pp.printables) ) | select ) + pp.Suppress("}")
        
                sel.parseString(f) gives -
        
                (['j', 'line', '{', 'foo', '10', 'bar', '20'], {}) 
        
                So I've got a bracket sneaking through there. Argh. My brain
hurts.
        
                Is the | operator an exclusive or?
        
                Befuddled,
        
                Liam Clarke
        
        
        
                On 7/23/05, Liam Clarke < [EMAIL PROTECTED] > wrote:
        
                        Howdy,
        
                        I've attempted to follow your lead and have started
from
        scratch, I could just copy and paste your solution (which works
pretty
        well), but I want to understand what I'm doing *grin*
        
                        However, I've been hitting a couple of ruts in the
path to
        enlightenment. Is there a way to tell pyparsing that to treat
specific
        escaped characters as just a slash followed by a letter? For the
time being
        I've converted all backslashes to forwardslashes, as it was choking
on \a in 
        a file path.
        
                        But my latest hitch, takes this form (apologies for
large
        traceback)
        
                        Traceback (most recent call last):
                          File "<interactive input>", line 1, in ?
                          File "parse.py", line 336, in parse
                            parsedEntries = dicts.parseString(test_data)
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        616, in parseString
                            loc, tokens = self.parse( instring.expandtabs(),
0 )
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        558, in parse
                            loc,tokens = self.parseImpl( instring, loc,
doActions )
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        1518, in parseImpl
                            return self.expr.parse( instring, loc, doActions
)
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        558, in parse
                            loc,tokens = self.parseImpl( instring, loc,
doActions )
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        1367, in parseImpl
                            loc, exprtokens = e.parse( instring, loc,
doActions )
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        558, in parse
                            loc,tokens = self.parseImpl( instring, loc,
doActions )
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        1518, in parseImpl
                            return self.expr.parse( instring, loc, doActions
)
                          File "c:\python24\Lib\site-packages\pyparsing.py",
line
        560, in parse
                            raise ParseException, ( instring, len(instring),
        self.errmsg, self )
        
                        ParseException: Expected "}" (at char 9909),
(line:325,
        col:5)
        
                        The offending code can be found here (includes the
data) -
        http://www.rafb.net/paste/results/L560wx80.html
        
                        It's like pyparsing isn't recognising a lot of my
"}"'s, as
        if I add another one, it throws the same error, same for adding
another
        two...
        
                        No doubt I've done something silly, but any help in
finding
        the tragic flaw would be much appreciated. I need to get a
parsingResults
        object out so I can learn how to work with the basic structure!
        
                        Much regards, 
        
                        Liam Clarke
        
        
        
                        On 7/21/05, Paul McGuire <
[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]> > wrote:
        
                                Liam, Kent, and Danny -
        
                                It sure looks like pyparsing is taking on a
life of
        its own!  I can see I no
                                longer am the only one pitching pyparsing at
some of
        these applications!
        
                                Yes, Liam, it is possible to create
dictionary-like
        objects, that is,
                                ParseResults objects that have named values
in them.
        I looked into your
                                application, and the nested assignments seem
very
        similar to a ConfigParse
                                type of structure.  Here is a pyparsing
version that
        handles the test data
                                in your original post (I kept Danny Yoo's
recursive
        list values, and added
                                recursive dictionary entries):
        
                                --------------------------
                                import pyparsing as pp
        
                                listValue = pp.Forward()
                                listSeq = pp.Suppress ('{') +
        pp.Group(pp.ZeroOrMore(listValue)) +
                                pp.Suppress('}')
                                listValue << (
        pp.dblQuotedString.setParseAction(pp.removeQuotes) |
                                                pp.Word(pp.alphanums) |
listSeq )
        
                                keyName = pp.Word( pp.alphas )
        
                                entries = pp.Forward()
                                entrySeq = pp.Suppress('{') +
        pp.Group(pp.OneOrMore(entries)) +
                                pp.Suppress('}')
                                entries << pp.Dict(
                                            pp.OneOrMore (
                                                pp.Group( keyName +
pp.Suppress('=')
        + (entrySeq |
                                listValue) ) ) )
                                --------------------------
        
        
                                Dict is one of the most confusing classes to
use,
        and there are some
                                examples in the examples directory that
comes with
        pyparsing (see
                                dictExample2.py), but it is still tricky.
Here is
        some code to access your
                                input test data, repeated here for easy
reference:
        
                                --------------------------
                                testdata = """\
                                country = {
                                tag = ENG
                                ai = {
                                flags = { }
                                combat = { DAU FRA ORL PRO }
                                continent = { }
                                area = { }
                                region = { "British Isles" "NorthSeaSea"
        "ECAtlanticSea" "NAtlanticSea"
                                "TagoSea" "WCAtlanticSea" }
                                war = 60
                                ferocity = no
                                }
                                }
                                """
                                parsedEntries =
entries.parseString(testdata)
        
                                def dumpEntries(dct,depth=0):
                                    keys = dct.keys()
                                    keys.sort()
                                    for k in keys:
                                        print ('  '*depth) + '- ' + k + ':',
                                        if
isinstance(dct[k],pp.ParseResults):
                                            if dct[k][0].keys():
                                                print
        
dumpEntries(dct[k][0],depth+1)
                                            else:
                                                print dct[k][0]
                                        else:
                                            print dct[k]
        
                                dumpEntries( parsedEntries )
        
                                print
                                print parsedEntries.country[0].tag
                                print parsedEntries.country[0].ai[0].war
                                print
parsedEntries.country[0].ai[0].ferocity
                                --------------------------
        
                                This will print out:
        
                                --------------------------
                                - country:
                                  - ai:
                                    - area: []
                                    - combat: ['DAU', 'FRA', 'ORL', 'PRO']
                                    - continent: []
                                    - ferocity: no
                                    - flags: []
                                    - region: ['British Isles',
'NorthSeaSea',
        'ECAtlanticSea',
                                'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
                                    - war: 60
                                  - tag: ENG
        
                                ENG
                                60
                                No
                                --------------------------
        
                                But I really dislike having to dereference
those
        nested values using the
                                0'th element.  So I'm going to fix pyparsing
so that
        in the next release,
                                you'll be able to reference the sub-elements
as:
        
                                print parsedEntries.country.tag
                                print parsedEntries.country.ai.war
                                print parsedEntries.country.ai.ferocity
        
                                This *may* break some existing code, but
Dict is not
        heavily used, based on
                                feedback from users, and this may make it
more
        useful in general, especially
                                when data parses into nested Dict's.
        
                                Hope this sheds more light than confusion!
                                -- Paul McGuire
        
        
_______________________________________________
                                Tutor maillist  -   Tutor@python.org
        <mailto:Tutor@python.org>
        
http://mail.python.org/mailman/listinfo/tutor
<http://mail.python.org/mailman/listinfo/tutor> 
        
        
        
        
        
                        --
                        'There is only one basic human right, and that is to
do as
        you damn well please.
                        And with it comes the only basic human duty, to take
the
        consequences.'
        
        
        
        
                --
                'There is only one basic human right, and that is to do as
you damn
        well please.
                And with it comes the only basic human duty, to take the 
        consequences.'
        
        
        
        
        --
        'There is only one basic human right, and that is to do as you damn
well
        please.
        And with it comes the only basic human duty, to take the
consequences.'
        
        




--
'There is only one basic human right, and that is to do as you damn well
please.
And with it comes the only basic human duty, to take the consequences.' 

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

Reply via email to