Gilles Ganault schrieb:
> Hello
> 
> Data that I download from the web seems to be using different code
> pages at times, and Python doesn't like this.

Not sure what this has to do with SQLite...

Basically both Python (2.x, 3.0 finally gets sane) encoding handling and
HTML/HTTP encoding handling are a little messy.

HTML pages have too many ways to specify there encoding and often lie about
it. You have encoding settings in the HTTP headers, in the page inside a
meta tag or someone could try to put it into an xhtml header <?xml> comment.

So, basically you make an educated guess.
A tool like chardet helps:
http://chardet.feedparser.org/

If you have done that use the python codecs module to handle the encoding.
If it still blows up complain at the source that creates such mislabeled
data. And look at the error='replace' option for pythons encode/decode.

And if you want to put data into SQLite, to come back to the topic of this
mailing list you could either stuff the raw bytes into a BLOB field or find
out the encoding and put it in a normal TEXT field.

Michael

-- 
Michael Schlenker
Software Engineer

CONTACT Software GmbH           Tel.:   +49 (421) 20153-80
Wiener Straße 1-3               Fax:    +49 (421) 20153-41
28359 Bremen
http://www.contact.de/          E-Mail: [EMAIL PROTECTED]

Sitz der Gesellschaft: Bremen
Geschäftsführer: Karl Heinz Zachries, Ralf Holtgrefe
Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to