Tommy Kaas wrote:
> Steven D'Aprano wrote:
>> But in your case, the best way is not to use print at all. You are
>> writing
> to a
>> file -- write to the file directly, don't mess about with print.
>> Untested:
>>
>>
>> f = open('tabeltest.txt', 'w')
>> url = 'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm'
>> soup = BeautifulSoup(urllib2.urlopen(url).read())
>> rows = soup.findAll('tr')
>> for tr in rows:
>> cols = tr.findAll('td')
>> output = "#".join(cols[i].string for i in (0, 1, 2, 3))
>> f.write(output + '\n') # don't forget the newline after each row
>> f.close()
>
> Steven, thanks for the advice.
> I see the point. But now I have problems with the Danish characters. I get
> this:
>
> Traceback (most recent call last):
> File "C:/pythonlib/kursus/kommuner-regioner_ny.py", line 36, in <module>
> f.write(output + '\n') # don't forget the newline after each row
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in
> position 5: ordinal not in range(128)
>
> I have tried to add # -*- coding: utf-8 -*- to the top of the script, but
> It doesn't help?
The coding cookie only affects unicode string constants in the source code,
it doesn't change how the unicode data coming from BeautifulSoup is handled.
As I suspected in my other post you have to convert your data to a specific
encoding (I use UTF-8 below) before you can write it to a file:
import urllib2
import codecs
from BeautifulSoup import BeautifulSoup
html = urllib2.urlopen(
'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm').read()
soup = BeautifulSoup(html)
with codecs.open('tabeltest.txt', "w", encoding="utf-8") as f:
rows = soup.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
print >> f, "#".join(col.string for col in cols)
The with statement implicitly closes the file, so you can avoid f.close() at
the end of the script.
Peter
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor