Thanks, Wikus. Yes, I know (or at least I think I know -- unicode sometimes makes me feel really stupid) how to encode and decode. I think I even know when to encode and decode, too. In this case, I'm doing a file upload from a component. The upload is getting "stored". Just before storing, I'm trying to decode and re-encode into utf-8 to ensure compatibility with the rest of my app. The questions are: - am I encoding the right file (request.vars.name.file)? - Am I using the right method (text=decoder.decoder(request.vars.name.file.getvalue())? - Am I doing it at the right time ... just before insert? - Why isn't decoder.decoder figuring out the encoding? The text in the upload file is "Español". I don't know the precise encoding, but I will look into that today. - Is there a better way?
On Mar 12, 10:06 pm, Wikus van de Merwe <dupakrop...@googlemail.com> wrote: > To convert a string to utf-8 you need to do two operations: > - decode the string to unicode (using the original file codec) > - encode the unicode string using utf-8 codec > > This is what decoder.decoder function is doing but it is guessing the > original codec. > You need to either provide the right codec for decoding (if you know it is > always the > same) or guess it better (e.g. by catching exception and trying different > codecs in order). > > input_codec = "iso-8592-1" > output = text.decode(input_codec).encode("utf-8")