hello, I have this code: >>> import re >>> import MySQLdb, csv, sys >>> conn = MySQLdb.connect (host = "localhost",user = "usr", passwd= "pass",db >>> = "databasename") >>> c = conn.cursor() >>> file = open('Data/asdsp-lao-farmers-et-batieng-products.html', 'r') >>> data = file.read() >>> get_records = re.compile(r"""<div id=\"flexicontent\" >>> class=\"flexicontent\">(.*)<\/div>""", re.DOTALL).findall >>> get_titles = re.compile(r"""<h3>(.*)<\/h3>""").findall >>> get_description = re.compile(r"""<div class=\"description\">(.*)<\/div>""", >>> re.DOTALL).findall
>>> block_record = [] >>> block_url = [] >>> records = get_records(data) >>> for record in records: ... description = get_description(record) ... print description # see http://paste.lisp.org/+21XF for output ... c.execute("INSERT INTO a (description) VALUES (%s)", description) >>> c.commit() >>> c.close() the problem is that the 'html' comes out like: http://paste.lisp.org/+21XF is there a way to format the output so that it does not include the \n\t\t and has the correct encoding? thanks norman _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor