Folks: Thanks so much for your replies. I have absorbed a lot of information about code pages and unicode in the last couple of days. My understanding is far from complete, but I'm ahead of where I was...
In the end, my best answer was to set the text_factory property of the connection object to str. That way, there is no translation of the bytes in memory to the database and back out of the database. But, it also assumes (I suppose) that the code page in use by the workstation machine that posted the page will match the code page selected by my server machine. Since that should almost always be the case for me (at least in the near term), I have decided to just set text_factory to str. I am now able to store résumé to my Sqlite database and faithfully have it returned in the same condition (which was the challenge that brought me to you in the first place). Thanks again for all your help and insight! Doug wcmadness wrote: > > Surely there is an answer to this question... > > I'm using Python and PySqlite. I'm trying to store the word résumé to a > text field. I'm really doing this as a test to see how to handle > diacritical letters, such as umlaut characters (from German) or accented > characters (from French). I can produce French é on my keyboard with > Alt-130... > > If I were coding a string literal, I would send through the data as > unicode, as in: u'résumé'. But, I'm not that lucky. The data is coming > from an HTML form or from a flat file. It will take on the default codec > used on my machine (latin-1). If I just send it through as is, it has > problems either when I fetchall or when I try to print what I've fetched. > So, for example: > > Insert Into tblTest (word) values ('résumé') > > will cause problems. > > I know that Sqlite stores text data as utf-8. I know that in Python (on > my machine, at least) strings are stored as latin-1. So, for example, in > Python code: > > v = 'résumé' > > v would be of type str, using latin-1 encoding. > > So, I have tried sending through my data as follows: > > cur.execute("Insert Into tblTest (word) values (?)", > ("résumé".decode("latin-1").encode("utf-8"),)) > > That stores the data just fine, but when I fetchall, I still have > problems. Say, I select * from tblTest and then do: > > l = cur.fetchall() > > Doing print l[0][1] (to print the word résumé) will give a nasty message > about ascii codec can't convert character \x082 (or some variation of that > message). > > I've tried doing: > > print l[0][1].decode('utf-8').encode('latin-1') > > But to no avail. > > The simple question is this: > > How do I store the word résumé to a Sqlite DB without using a unicode > literal (e.g. u'résumé'), such that printing the results retrieved from > fetchall will not crash? > > Surely someone is doing this... Say you get data from an HTML page that > contains diacritical characters. You need to store it to Sqlite and > retrieve it back out for display. What do you do??? > > I'm stuck! > > Doug > -- View this message in context: http://www.nabble.com/Unicode-Again...-Still-Stuck...-A-Challenge...-Store-and-retrieve-the-word-r%C3%A9sum%C3%A9-without-using-a-unicode-string-literal-tf4190926.html#a11938827 Sent from the SQLite mailing list archive at Nabble.com. ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------