Re: [Tutor] module to parse XMLish text?

Terry Carroll Fri, 14 Jan 2011 14:44:54 -0800

On Fri, 14 Jan 2011, Stefan Behnel wrote:

Terry Carroll, 14.01.2011 03:55:

Does anyone know of a module that can parse out text with XML-like tags as
in the example below? I emphasize the "-like" in "XML-like". I don't think
I can parse this as XML (can I?).


Sample text between the dashed lines::

---------------------------------
Blah, blah, blah
<AAA>
<BING ZEBRA>
<BANG ROOSTER>
<BOOM GARBONZO BEAN>
<BLIP>SOMETHING ELSE</BLIP>
<BASH>SOMETHING DIFFERENT</BASH>
</AAA>
---------------------------------

You can't parse this as XML because it's not XML. The three initial childtags are not properly closed.


Yeah, that's what I figured.

If the format is really as you describe, i.e. one line per tag, regularexpressions will work nicely.

Now there's an idea! I hadn't thought of using regexs, probably becauseI'm terrible at all but the most simple ones.


As it happens, I'm only interested in four of the tags' contents, so I

could probably manage to write a seried of regexes that even I couldmaintain, one for each of the pieces of data I want to extract; if I tryto write a grand unified regex, I'm bound to shoot myself in the foot.


Thanks very much.

On Fri, 14 Jan 2011, Karim wrote:

from xml.etree.ElementTree import ElementTree

I don't think straight XML parsing will work on this, as it's not validXML; it just looks XML-like enough to cause confusion.

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] module to parse XMLish text?

Reply via email to