On 18-Jun-07, at 6:27 AM, vanderkerkoff wrote:
Cheesr Mike, read the page, it's starting to get into my brian now.
Django was giving me unicode string, so I did some encoding and
decoding and
now the data is getting into solr, and it's simply not passing the
characters that are cuasing problems, which is great.
Glad to hear that it is working.
2 little things, I'm getting an error when it's trying to optimise
the index
AttributeError: SolrConnection instance has no attribute 'optimise'
You don't know what that is about do you?
Er, it means that SolrConnection has no optimise command. Instead do
conn.commit(optimize=True)
I'm still on solr1.1 as we were having trouble getting this sort of
interaction to work with 1.2, not sure if it's related.
2. I've used your suggestions to force the output into ascii, but
if I try
to force it into utf8, which I though solr would accept, it fails.
I'm not
sure why though.
Perhaps this is why: solr.py expects unicode. You can pass it ascii,
and it will transparently convert to unicode fine because that is the
default codec. If you end up with utf-8, it will try to convert to
unicode using the ascii codec and fail.
So, you could completely skip the ;encode('ascii', 'ignore') line.
Of course, you'd have the characters in the text. I'm not quite sure
what you're after, since leaving it in utf-8 would leave the funny
characters that you wanted to strip.
-MIke