Jack, I see interesting stuff here now.
I tried as search query not "baş" but "features:baş" in field "q" in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Is this true? Erol Akarsu On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu <eaka...@gmail.com> wrote: > Jack, > > I have these in schema.xml that defines "features" as type of text_tr > > But unfortunately, this fails. > > > <field name="features" type="text_tr" indexed="true" stored="true" > multiValued="true"/> > <copyField source="features" dest="text"/> > > > <fieldType name="text_tr" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.TurkishLowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > <filter class="solr.SnowballPorterFilterFactory" > language="Turkish"/> > </analyzer> > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.TurkishLowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > <filter class="solr.SnowballPorterFilterFactory" > language="Turkish"/> > </analyzer> > </fieldType> > > > > > On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky <j...@basetechnology.com>wrote: > >> Ah! See where it says "<str name="parsedquery_toString">**text:baş</str>"? >> Your query is against the "text" field, which probably doesn't have the >> Turkish analysis. >> >> There is probably a copyField from "features" to "text". You use the >> "text_tr" field type for "features", but probably not for the "text" field. >> >> >> -- Jack Krupansky >> >> -----Original Message----- From: Erol Akarsu >> Sent: Monday, December 03, 2012 1:06 PM >> >> To: solr-user@lucene.apache.org >> Subject: Re: Luke and SOLR search giving different results >> >> Jack, >> >> I have already set tomcat server fro UTF-Encoding before. I have added >> URIEncoding="UTF-8" to all <Connector ..> elements in server.xml in Tomcat >> 7. >> >> As you see below, when I search word "baş" with debug mode I can see >> empty response. But when I search word "baştan", I can get correct >> response. >> >> It seems to me that TurkishAnalyser is not being used in SOLR search >> because we can make only full word search "baştan" but not the root word >> "baş". Probably, English Analyzer is being used and could not find the >> root >> word. For example, in Luke, if I change "Analyser to use for query >> parsing" >> to EnglishAnalyser, then it can not find word "baş" but it can with >> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. >> >> Is this assumption true? I could not find any other reason >> >> >> <?xml version="1.0" encoding="UTF-8"?> >> <response> >> <lst name="responseHeader"> >> <int name="status">0</int> >> <int name="QTime">58</int> >> <lst name="params"> >> <str name="debugQuery">true</str> >> <str name="q">baş</str> >> <str name="wt">xml</str> >> </lst> >> </lst> >> <result name="response" numFound="0" start="0" /> >> <lst name="debug"> >> <str name="rawquerystring">baş</**str> >> <str name="querystring">baş</str> >> <str name="parsedquery">text:baş</**str> >> <str name="parsedquery_toString">**text:baş</str> >> <lst name="explain" /> >> <str name="QParser">LuceneQParser</**str> >> <lst name="timing"> >> <double name="time">38.0</double> >> <lst name="prepare"> >> <double name="time">16.0</double> >> <lst >> name="org.apache.solr.handler.**component.QueryComponent"> >> <double name="time">3.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.HighlightComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.DebugComponent"> >> <double name="time">0.0</double> >> </lst> >> </lst> >> <lst name="process"> >> <double name="time">10.0</double> >> <lst >> name="org.apache.solr.handler.**component.QueryComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.HighlightComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.DebugComponent"> >> <double name="time">10.0</double> >> </lst> >> </lst> >> </lst> >> </lst> >> </response> >> >> <response> >> <lst name="responseHeader"> >> <int name="status">0</int> >> <int name="QTime">2</int> >> <lst name="params"> >> <str name="debugQuery">true</str> >> <str name="q">baştan</str> >> <str name="wt">xml</str> >> </lst> >> </lst> >> <result name="response" numFound="1" start="0"> >> <doc> >> <str name="url">htt://111.a.b1</**str> >> <str name="id">6H500F0XXXX</str> >> <str name="lang">tr</str> >> <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - >> SATA-300 >> </str> >> <str name="manu">Maxtor Corp.</str> >> <str name="manu_id_s">maxtor</str> >> <arr name="cat"> >> <str>electronics</str> >> <str>hard drive</str> >> </arr> >> <arr name="features"> >> <str>SATA 3.0Gb/s, NCQ</str> >> <str>8.5ms seek</str> >> <str>16MB cache</str> >> <str> >> Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim >> senaryoyu!" diyerek >> baştan savma reklamlarla kotarmaya bakıyor işi. >> Futbolcu Arda Turan >> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un >> oynatıldığı >> giyim firması reklamı da tam bir fiyasko. Birbirinden >> ünlü bu iki >> ismin oynadığı reklam Arda'nın kabinde papağan gibi >> tekrarladığı >> "My darling!" repliği, sonunda Paris'i görünce anlam >> veremediğimiz >> uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez >> izledikten >> sonra anlaşılan "Paris seçti, firma yaptı, Arda >> bayıldı." >> sözleriyle kazındı hafızalara, "Keşke unutabilsek!" >> dedirterek. >> </str> >> </arr> >> <float name="price">350.0</float> >> <str name="price_c">350,USD</str> >> <int name="popularity">6</int> >> <bool name="inStock">true</bool> >> <date name="manufacturedate_dt">**2006-02-13T15:26:37Z</date> >> <long name="_version_">**1420300467908378624</long> >> </doc> >> </result> >> <lst name="debug"> >> <str name="rawquerystring">baştan</**str> >> <str name="querystring">baştan</**str> >> <str name="parsedquery">text:**baştan</str> >> <str name="parsedquery_toString">**text:baştan</str> >> <lst name="explain"> >> <str name="6H500F0XXXX"> >> 0.028767452 = (MATCH) weight(text:baştan in 0) >> [DefaultSimilarity], result of: >> 0.028767452 = fieldWeight in 0, product of: >> 1.0 = tf(freq=1.0), with freq of: >> 1.0 = termFreq=1.0 >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.09375 = fieldNorm(doc=0) >> </str> >> </lst> >> <str name="QParser">LuceneQParser</**str> >> <lst name="timing"> >> <double name="time">2.0</double> >> <lst name="prepare"> >> <double name="time">1.0</double> >> <lst >> name="org.apache.solr.handler.**component.QueryComponent"> >> <double name="time">1.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.HighlightComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.DebugComponent"> >> <double name="time">0.0</double> >> </lst> >> </lst> >> <lst name="process"> >> <double name="time">1.0</double> >> <lst >> name="org.apache.solr.handler.**component.QueryComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.HighlightComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst >> name="org.apache.solr.handler.**component.DebugComponent"> >> <double name="time">1.0</double> >> </lst> >> </lst> >> </lst> >> </lst> >> </response> >> >> On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky <j...@basetechnology.com> >> **wrote: >> >> Two points: >>> >>> 1. Possibly an encoding problem with your container? Is UTF-8 encoding >>> enabled? >>> 2. Add &debugQuery=true to your query (from the browser) and see if the >>> parser_query has the expected term that matches what Luke reports for the >>> index and what Solr Admin Analysis also reports for index analysis. >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Erol Akarsu >>> Sent: Monday, December 03, 2012 11:35 AM >>> >>> To: solr-user@lucene.apache.org >>> Subject: Re: Luke and SOLR search giving different results >>> >>> Jack, >>> >>> Yes. >>> >>> I expect SOLR should give same search results as Luked does. >>> >>> Term analyzer gives correct answer in SOLR as expected. But SOLR does not >>> return correct search results. >>> >>> I don't know why. >>> >>> Erol Akarsu >>> >>> On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky <j...@basetechnology.com >>> >* >>> *wrote: >>> >>> >>> So, does that highlight the problem for you or not? Is the term analyzed >>> >>>> as you expected? >>>> >>>> -- Jack Krupansky >>>> >>>> From: Erol Akarsu >>>> Sent: Monday, December 03, 2012 8:44 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Luke and SOLR search giving different results >>>> >>>> Jack, >>>> >>>> Thanks for help. >>>> >>>> I removed data folder of SOLR and indexed this sample doc from scratch, >>>> there was no document in SOLR but only one. >>>> >>>> When I analysed , I can see stemming is correct and I can see these for >>>> words "bul", "baş" ,"gör" and "umut" in SF row >>>> I attached analyse screens >>>> >>>> Erol Akarsu >>>> >>>> >>>> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky < >>>> j...@basetechnology.com> >>>> wrote: >>>> >>>> Have you tried using the Solr Admin Analysis page, using the word and >>>> a >>>> few words of context for index analysis and the word alone for query >>>> analysis? >>>> >>>> And be sure to fully reindex if you change ANYTHING in the schema >>>> fields >>>> or field types. >>>> >>>> -- Jack Krupansky >>>> >>>> From: Erol Akarsu >>>> Sent: Sunday, December 02, 2012 10:38 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Luke and SOLR search giving different results >>>> >>>> >>>> Hi, >>>> >>>> I am trying to apply SOLR for Turkish Language for my research. >>>> >>>> Instead of using language identification, I manually assigned Turkish >>>> language for a sample test document. I have configured SOLR schema.xml, >>>> activated the part below. I have added the attached document >>>> testTurkishDoc.xml that is inserted to SOLR database. >>>> >>>> But searching for raw Lucene index through Luke and SOLR 4.0 search >>>> though GUI is giving different results. In picture Selection_006.png, >>>> the >>>> word "baş" is listed as top term. I search the word "baş" in Luke and I >>>> got >>>> the result result that is only document, shown in Selection_004.png. >>>> >>>> But in SOLR GUI, I am getting empty result for word "baş" in picture >>>> Selection_002.png. >>>> >>>> In the text we have features field, that has word "baştan" that is >>>> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI >>>> is >>>> doing search different than Luke. I could not figure it out why I could >>>> not >>>> find it while getting in Luke. The same thing happens for words "umut", >>>> "bul" and "gör". >>>> >>>> I will appreciate if you can help me to get same results from SOLR UI. >>>> >>>> >>>> <field name="features"> >>>> Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim >>>> senaryoyu!" >>>> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda >>>> Turan >>>> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı >>>> giyim >>>> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin >>>> oynadığı >>>> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, >>>> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir >>>> de >>>> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma >>>> yaptı, >>>> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" >>>> dedirterek. >>>> </field> >>>> >>>> >>>> >>>> Added to schema.xml for SOLR: >>>> >>>> <field name="features" type="text_tr" indexed="true" stored="true" >>>> multiValued="true"/> >>>> <fieldType name="text_tr" class="solr.TextField" >>>> positionIncrementGap="100"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.****StandardTokenizerFactory"/> >>>> <filter class="solr.****TurkishLowerCaseFilterFactory"****/> >>>> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="lang/stopwords_tr.txt" enablePositionIncrements="****true"/> >>>> <filter class="solr.****SnowballPorterFilterFactory" >>>> >>>> language="Turkish"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.****StandardTokenizerFactory"/> >>>> <filter class="solr.****TurkishLowerCaseFilterFactory"****/> >>>> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="lang/stopwords_tr.txt" enablePositionIncrements="****true"/> >>>> <filter class="solr.****SnowballPorterFilterFactory" >>>> >>>> language="Turkish"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >