Ahh, thank you for the hints Martin... German stopwords without Umlaut work
correctly.

So I'm trying to figure out where the UTF-8 chars are getting messed up.
 Using the Solr admin web UI, I did a search for title:für and the xml (or
json) output in the browser shows the query with the proper encoding, but
the Solr logs show this:

INFO: [page_30d_de] webapp=/solr path=/select
params={explainOther=&fl=*,score&indent=on&start=0&q=title:f?r&hl.fl=&qt=standard&wt=xml&fq=&version=2.2&rows=10}
hits=76 status=0 QTime=2

Notice the title:f?r.  How do I fix that?  I'm using Jetty btw...

Thanks for the help.

On Fri, Mar 25, 2011 at 3:05 AM, Martin Rödig <r...@shi-gmbh.com> wrote:

> I have some questions about your config:
>
> Is the stopwords-de.txt in the same diractory as the shema.xml?
> Is the title field from type text?
> Have you the same problem with german stopwords with out Umlaut (ü,ö,ä)
> like the word "denn"?
>
> A Problem can be that the stopwords-de.txt is not save as UTF-8, so the
> filter can not read the umlaut ü in the file.
>
>
> Mit freundlichen Grüßen
> M.Sc. Dipl.-Inf. (FH) Martin Rödig
>
> SHI Elektronische Medien GmbH
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - -
> AKTUELL - NEU - AB SOFORT
> Solr/Lucene Schulung vom 19. - 21. April in Berlin
>
> Als erster zertifizierter Trainingspartner von Lucid Imagination in
> Deutschland, Österreich und Schweiz bietet SHI ab sofort
> deutschsprachige Solr Schulungen an.
> Weitere Informationen: www.shi-gmbh.com/services/solr-training
> Achtung: Die Anzahl der Plätze ist beschränkt!
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - -
> Postadresse: Watzmannstr. 23, 86316 Friedberg
> Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
> Tel.: 0821 7482633 18
> Tel.: 0821 7482633 0 (Zentrale)
> Fax: 0821 7482633 29
>
> Internet: http://www.shi-gmbh.com
> Registergericht Augsburg HRB 17382
> Geschäftsführer: Peter Spiske
> Steuernummer: 103/137/30412
>
> -----Ursprüngliche Nachricht-----
> Von: Christopher Bottaro [mailto:cjbott...@onespot.com]
> Gesendet: Freitag, 25. März 2011 05:37
> An: solr-user@lucene.apache.org
> Betreff: stopwords not working in multicore setup
>
> Hello,
>
> I'm running a Solr server with 5 cores.  Three are for English content and
> two are for German content.  The default stopwords setup works fine for the
> English cores, but the German stopwords aren't working.
>
> The German stopwords file is stopwords-de.txt and resides in the same
> directory as stopwords.txt.  The German cores use a different schema (named
> schema.page.de.xml) which has the following text field definition:
> http://pastie.org/1711866
>
> The stopwords-de.txt file looks like this:  http://pastie.org/1711869
>
> The query I'm doing is this:  q => "title:für"
>
> And it's returning documents with für in the title.  Title is a text field
> which should use the stopwords-de.txt, as seen in the aforementioned pastie.
>
> Any ideas?  Thanks for the help.
>

Reply via email to