is this XML, or just something that looks a little like XML ?
Unfortunately, something that looks a little XML so I can't use a XML
parser. But the HTML parser does the job.
--
http://mail.python.org/mailman/listinfo/python-list
Hi there,
in a text with no carriage returns I need to look for all occurancies of
this string:
source id=boxparameter key=path.../parameter/source
The ... can contain different values. I need to extract the string
between source id=boxparameter key=path and /parameter/source.
Example text:
At Thursday 31/8/2006 12:44, Nico Grubert wrote:
in a text with no carriage returns I need to look for all occurancies of
this string:
source id=boxparameter key=path.../parameter/source
Try Beautiful Soup, or if your input is simple enough, the re module.
Gabriel Genellina
Softlab SRL
Try Beautiful Soup, or if your input is simple enough, the re module.
Hi Gabriel,
I first tried HTMLParser and wrote this short script:
from HTMLParser import HTMLParser
from htmlentitydefs import entitydefs
class MyDocParser(HTMLParser):
def __init__(self):
self.paths = []
This works as long as there are no other paramter Tags in the content
that I parse.
Got it.
I forgot to handle the 'attrs' parameter in handle_starttag().
Changed it to:
def handle_starttag(self, tag, attrs):
if tag == 'parameter':
if attrs == [('key',
Nico Grubert wrote:
in a text with no carriage returns I need to look for all occurancies of
this string:
source id=boxparameter key=path.../parameter/source
The ... can contain different values. I need to extract the string
between source id=boxparameter key=path and
Nico, perhaps this would be suitable:
s = '''Example text:
This is a test. link url=/www/folder target=_self title= A test.
source id=boxparameter key=path/www/mydoc1/parameter/source
And I need to extraxt /www/mydoc1 and /www/mydoc2 from this text.
source id=boxparameter