On 01/08/2012 04:53 AM, Alex Hall wrote:
Hello all,
I have a file with xml-ish code in it, the definitions for units in a
real-time strategy game. I say xml-ish because the tags are like xml,
but no quotes are used and most tags do not have to end. Also,
comments in this file are prefaced by an apostrophe, and there is no
multi-line commenting syntax. For example:

<unit>
<number=1>
<name=my unit>
<canMove=True>
<canCarry=unit2, unit3, unit4>
'this line is a comment
</unit>


The format is closer to sgml than to xml, except for the tag being able to have values. I'd say you probably would have a better chance of transforming this into sgml than transforming it to xml.

Try this re:

s = re.sub('<([a-zA-Z]+)=([^>]+)>', r'<\1 __attribute__="\2">', s)

and use an SGML parser to parse the result. I find Fredrik Lundh's sgmlop to be easier to use for this one, just use easy_install or pip to install sgmlop.

import sgmlop

class Unit(object): pass

class handler:
    def __init__(self):
        self.units = {}
    def finish_starttag(self, tag, attrs):
        attrs = dict(attrs)
        if tag == 'unit':
            self.current = Unit()
        elif tag == 'number':
            self.current.number = int(attrs['__attribute__'])
        elif tag == 'canmove':
            self.current.canmove = attrs['__attribute__'] == 'True'
        elif tag in ('name', 'cancarry'):
            setattr(self.current, tag, attrs['__attribute__'])
        else:
            print 'unknown tag', tag, attrs
    def finish_endtag(self, tag):
        if tag == 'unit':
            self.units[self.current.name] = self.current
            del self.current
    def handle_data(self, data):
        if not data.isspace(): print data.strip()

s = '''
<unit>
<number=1>
<name=my unit>
<canMove=True>
<canCarry=your unit, her unit, his unit>
'this line is a comment
</unit>
<unit>
<number=2>
<name=your unit>
<canMove=False>
<canCarry=her unit, his unit>
'this line is a comment
</unit>
<unit>
<number=3>
<name=her unit>
<canMove=True>
<canCarry=her unit>
'this line is a comment
</unit>
<unit>
<number=4>
<name=his unit>
<canMove=True>
<canCarry=his unit, her unit>
'this line is a comment
</unit>
'''
s = re.sub('<([a-zA-Z]+)=([^>]+)>', r'<\1 __attribute__="\2">', s)
parser = sgmlop.SGMLParser()
h = handler()
parser.register(h)
parser.parse(s)
print h.units

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to