I double checked all code on that page and it looks like everything is in utf-8 and works just perfect. The problematic URLs are called always by bots like google bot. Looks like they are operating with a different encoding. The page itself has an utf-8 meta tag.
So it looks like I have to find a way that checks for the encoding and encodes apropriatly. this should be a common solr problem if all search engines treat utf-8 that way, right? Any ideas how to fix that? Is there maybe a special solr functionality for this? 2011/8/27 François Schiettecatte <fschietteca...@gmail.com> > Merlin > > Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so > it looks like there is a charset mismatch somewhere. > > > Cheers > > François > > > > On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote: > > > Hello, > > > > I am having problems with searches that are issued from spiders that > contain > > the ASCII encoded character "ü" > > > > For example in : "Übersetzung" > > > > The solr log shows following query request: /suche/%DCbersetzung > > which has been translated into solr query: q=?ersetzung > > > > If you enter the search term directly as a user into the search box it > will > > result into: > > /suche/Übersetzung which returns perfect results. > > > > I am decoding the URL within PHP: $term = trim(urldecode($q)); > > > > Somehow urldecode() translates the Character Ü (%DC) into a ? which is a > > illigeal first character in Solr. > > > > I tried it without urldecode(), with rawurldecode() and with > utf8_decode() > > but all of those did not help. > > > > Thank you for any help or hint on how to solve that problem. > > > > Regards, Merlin > >