Hi all, I'm looking for some general hints and tips on how to handle unicode input and output. The software I'm writing takes input from various untrusted, exotic sources (lots of which are giving me unicode characters, in various encodings). I want to store this data in the database and then redisplay it, un-mangled, on my website.
As a simplified example, when the client POSTs up the URL of this page: http://en.wikipedia.org/wiki/K%C5%99i%C5%A1%C5%A5an_of_Prachatice to my controller, I can't help but get I get errors of the form: UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 3297: ordinal not in range(128) The page is UTF-8 encoded, but the title parameter to my controller is of unicode type. I've tried every combination of manual encoding and decoding of parameters I can think of, but can't help getting variants on this same error. And this is for a page I know the encoding of up front!! For right now, I'd even be content to lose the characters that can't be processed, but passing 'ignore' or 'replace' to unicode.encode still doesn't help... This must be quite a common problem - how should I treat this incoming data? What type should the database columns be? How should it be re- displayed in my controller? Any help appreciated!! Thanks, James --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~---

