Hi,

To diagnose this properly, you're going to have to figure out if you're dealing with encoded bytes or unicode, and what django does. See http://www.joelonsoftware.com/articles/Unicode.html.

As a short-term solution, you can force things to ascii using:

str(s.decode('ascii', 'ignore')) # assuming s is a bytestring
u.encode('ascii', 'ignore') # assuming u is a unicode string

-Mike

On 15-Jun-07, at 2:45 AM, vanderkerkoff wrote:


Hi Mike
The characters that are giving us problems are the old favourites of
apostrophe's and quotes pasted from Microsoft Word into a Django Web Site.
I'm not using django's newforms yet, but still using the old ones.

Any help or tips or sending me off to sites to read stuff Mike I'll be
grateful.

I'm coming round to the idea that I might have to strip these odd characters out with python before they get sent into the database, that would be the
most sensible option I think.



Mike Klaas wrote:

I've dealt with tons of issues with python and unicode, but I need
more information before proceeding with tips.

Specifically, what is the format of the "shit" being copied and
pasted into your app, and what python datatype is handling it?  I
suspect it is encoded somehow, which could be problematic.  Is it
going through a web browser?  How is it getting into mysql?

-MIke





--
View this message in context: http://www.nabble.com/problems- getting-data-into-solr-index-tf3915542.html#a11136156
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to