hello,
I have this code:

>>> import re
>>> import MySQLdb, csv, sys
>>> conn = MySQLdb.connect (host = "localhost",user = "usr", passwd= "pass",db 
>>> = "databasename")
>>> c = conn.cursor()
>>> file = open('Data/asdsp-lao-farmers-et-batieng-products.html', 'r')
>>> data = file.read()
>>> get_records = re.compile(r"""<div id=\"flexicontent\" 
>>> class=\"flexicontent\">(.*)<\/div>""", re.DOTALL).findall
>>> get_titles = re.compile(r"""<h3>(.*)<\/h3>""").findall
>>> get_description = re.compile(r"""<div class=\"description\">(.*)<\/div>""", 
>>> re.DOTALL).findall

>>> block_record = []
>>> block_url = []
>>> records = get_records(data)
>>> for record in records:
...     description = get_description(record)
...     print description # see http://paste.lisp.org/+21XF for output
...     c.execute("INSERT INTO a (description) VALUES (%s)", description)
>>> c.commit()
>>> c.close()

the problem is that the 'html' comes out like:

http://paste.lisp.org/+21XF

is there a way to format the output so that it does not include the
\n\t\t and has the correct encoding?

thanks
norman
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to