Terry Carroll, 14.01.2011 03:55:
Does anyone know of a module that can parse out text with XML-like tags as
in the example below? I emphasize the "-like" in "XML-like". I don't think
I can parse this as XML (can I?).

Sample text between the dashed lines::

---------------------------------
Blah, blah, blah
<AAA>
<BING ZEBRA>
<BANG ROOSTER>
<BOOM GARBONZO BEAN>
<BLIP>SOMETHING ELSE</BLIP>
<BASH>SOMETHING DIFFERENT</BASH>
</AAA>
---------------------------------

You can't parse this as XML because it's not XML. The three initial child tags are not properly closed.

If the format is really as you describe, i.e. one line per tag, regular expressions will work nicely. Something like (untested)

  import re
  parse_tag_and_text = re.compile(
        # accept a tag name and then either space+tag or '>'+text+'</...'
        '^<([^> ]+)(?: ([^>]+)>\s*|>([^<]+)</.*)$').match

  special_tags = set(['AAA'])

  result = {}
  for line in the_file:
      match = parse_tag_and_text(line)
      if match:
          if match.group(1) in special_tags:
              pass # do something special?
          else:
              # don't care which format, take whatever text group matched
              result[match.group(1)] = match.group(2) or match.group(3)

Stefan

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to