Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

-- Jack Krupansky

From: Erol Akarsu 
Sent: Sunday, December 02, 2012 10:38 PM
To: solr-user@lucene.apache.org 
Subject: Luke and SOLR search giving different results

Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish language 
for a sample test document. I have configured SOLR schema.xml, activated the 
part below. I have added the attached document testTurkishDoc.xml that is 
inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI 
is giving different results. In picture Selection_006.png, the word "baş" is 
listed as top term. I search the word "baş" in Luke and I got the result result 
that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word "baş" in picture 
Selection_002.png.

In the text we have  features field, that has word "baştan" that is being 
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words "umut", "bul" and 
"gör". 

I will appreciate if you can help me to get same results from SOLR UI.


<field name="features">
       Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
  </field>



Added to schema.xml for SOLR:

<field name="features" type="text_tr" indexed="true" stored="true" 
multiValued="true"/>
<fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.TurkishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.TurkishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
      </analyzer>
    </fieldType>


Reply via email to