Ok - resolved the issue: the python Solr wrapper (http://
wiki.apache.org/solr/SolPython) was invoking str() without checking
for unicode first.

What a kerfuffle!

James

On Feb 1, 9:19 pm, James <[EMAIL PROTECTED]> wrote:
> Hi all,
> I'm looking for some general hints and tips on how to handle unicode
> input and output. The software I'm writing takes input from various
> untrusted, exotic sources (lots of which are giving me unicode
> characters, in various encodings). I want to store this data in the
> database and then redisplay it, un-mangled, on my website.
>
> As a simplified example, when the client POSTs up the URL of this
> page:http://en.wikipedia.org/wiki/K%C5%99i%C5%A1%C5%A5an_of_Prachatice
> to my controller, I can't help but get
> I get errors of the form:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in
> position 3297: ordinal not in range(128)
>
> The page is UTF-8 encoded, but the title parameter to my controller is
> of unicode type. I've tried every combination of manual encoding and
> decoding of parameters I can think of, but can't help getting variants
> on this same error. And this is for a page I know the encoding of up
> front!!
>
> For right now, I'd even be content to lose the characters that can't
> be processed, but passing 'ignore' or 'replace' to unicode.encode
> still doesn't help...
>
> This must be quite a common problem - how should I treat this incoming
> data? What type should the database columns be? How should it be re-
> displayed in my controller?
>
> Any help appreciated!!
> Thanks,
> James
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to