On Fri, 14 Jan 2011, Stefan Behnel wrote:

Terry Carroll, 14.01.2011 03:55:
Does anyone know of a module that can parse out text with XML-like tags as
in the example below? I emphasize the "-like" in "XML-like". I don't think
I can parse this as XML (can I?).

Sample text between the dashed lines::

---------------------------------
Blah, blah, blah
<AAA>
<BING ZEBRA>
<BANG ROOSTER>
<BOOM GARBONZO BEAN>
<BLIP>SOMETHING ELSE</BLIP>
<BASH>SOMETHING DIFFERENT</BASH>
</AAA>
---------------------------------

You can't parse this as XML because it's not XML. The three initial child tags are not properly closed.

Yeah, that's what I figured.

If the format is really as you describe, i.e. one line per tag, regular expressions will work nicely.

Now there's an idea! I hadn't thought of using regexs, probably because I'm terrible at all but the most simple ones.

As it happens, I'm only interested in four of the tags' contents, so I
could probably manage to write a seried of regexes that even I could maintain, one for each of the pieces of data I want to extract; if I try to write a grand unified regex, I'm bound to shoot myself in the foot.

Thanks very much.

On Fri, 14 Jan 2011, Karim wrote:

from xml.etree.ElementTree import ElementTree

I don't think straight XML parsing will work on this, as it's not valid XML; it just looks XML-like enough to cause confusion.
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to