Re: Alphanumeric wildcard search problem

Hasnain Wed, 01 Sep 2010 13:35:29 -0700

Thankyou for your suggestions

when before removing the wordDelimiterFilterFactory, the results for q=R-*
returned perfect results but not for q=R-1*, also after removing
wordDelimiterFilterFactory, it didnt bring me results for q=R-*


the results before removing wordDelimiterFilterFactory using debugQuery=on
were

<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">78</int>
−
<lst name="params">
<str name="debugQuery">on</str>
<str name="fl">mat_nr</str>
<str name="q">R-1*</str>
<str name="qt">standard2</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
−
<lst name="debug">
<str name="rawquerystring">R-1*</str>
<str name="querystring">R-1*</str>
−
<str name="parsedquery">
+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()
</str>
−
<str name="parsedquery_toString">
+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()
</str>
<lst name="explain"/>
<str name="QParser">DisMaxQParser</str>
<null name="altquerystring"/>
<null name="boostfuncs"/>
−
<lst name="timing">
<double name="time">31.0</double>
−
<lst name="prepare">
<double name="time">15.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">15.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
−
<lst name="process">
<double name="time">16.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">16.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
</lst>
</lst>
</response>

and after removing wordDelimiterFilterFactory

<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">78</int>
−
<lst name="params">
<str name="debugQuery">on</str>
<str name="fl">mat_nr</str>
<str name="q">R-1*</str>
<str name="qt">standard2</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
−
<lst name="debug">
<str name="rawquerystring">R-1*</str>
<str name="querystring">R-1*</str>
−
<str name="parsedquery">
+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()
</str>
−
<str name="parsedquery_toString">
+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()
</str>
<lst name="explain"/>
<str name="QParser">DisMaxQParser</str>
<null name="altquerystring"/>
<null name="boostfuncs"/>
−
<lst name="timing">
<double name="time">31.0</double>
−
<lst name="prepare">
<double name="time">15.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">15.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
−
<lst name="process">
<double name="time">16.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">16.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
</lst>
</lst>
</response>

also at first the wordDelimiterFilterFactory used was this
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>

before removing wordDelimiterFilterFactory, solr admin showed

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload         
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload         
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
catenateNumbers=1}
term position   1       2
term text       R       1110
term type       word    word
source start,end        0,1     2,6
payload                 
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1       2
term text       r       1110
term type       word    word
source start,end        0,1     2,6
payload                 
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
term position   1       2
term text       r       1110
term type       word    word
source start,end        0,1     2,6
payload                 
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1       2
term text       r       1110
term type       word    word
source start,end        0,1     2,6
payload                 



also after removing wordDelimiterFilterFactory,solr admin looks like this

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload         
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload         
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload         
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1
term text       R-1110
term type       word
source start,end        0,6
payload 


any suggestions?

thankyou


Erick Erickson wrote:
> 
> Really look at the analysis page in solr admin for how your
> analyzer chain handles things, or you'll spend time until you're
> really old having trouble :).
> 
> Here's what I see on a quick scan:
> 
>> StandardTokenizer tries to, among other things, preserve
> email addresses. The kinds of strings you're working with may
> trip something up here.
> 
>> Remove WordDelimiterFactory altogether. The point of WDF
> is to break words apart at transitions.
> 
>> Remove EnglishPorterFilterFactory too. What the effect
> of applying an algorithmic stemming process to words like
> you're interested in is...er...not obvious.
> 
> All that said, I took a quick at the analysis page with your definition
> and nothing jumped out at me. Are you sure that:
>> you're getting to the request handler you think? What does adding
> &debugQuery=on show?
>> you've indexed the data after you've made the changes you outlined above?
> The SOLR
> admin page can help here, especially the [full interface] link, with debug
> info on.
> 
> If nothing shows up, can you post the results of &debugQuery=on?
> 
> Best
> Erick
> 
> On Tue, Aug 31, 2010 at 6:11 AM, Hasnain <hasn...@hotmail.com> wrote:
> 
>>
>> I have gone through all the of the related posts, but could not find a
>> proper
>> answer that works, so Im writing this post
>>
>> Is there anyway of using wilcard searches on alphanumeric text
>> like...R-1*
>> ?
>>
>> let me share relevent information
>>
>>
>> <fieldType name="textShoaib" class="solr.TextField"
>> positionIncrementGap="100">
>>      <analyzer type="index">
>>        <tokenizer class="solr.StandardTokenizerFactory"/>   <!--This was
>> originally <tokenizer class="solr.WhitespaceTokenizerFactory"/> just
>> playing
>> around-->
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="stopwords.txt"
>>                enablePositionIncrements="true"
>>                />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="0" generateNumberParts="0" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>      <analyzer type="query">
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>> <!--This was originally <tokenizer
>> class="solr.WhitespaceTokenizerFactory"/>-->
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="0" generateNumberParts="0" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>>
>>
>>
>>
>> my requestHandler is...
>>
>>
>>
>>
>>
>>  <requestHandler name="standard2" class="solr.SearchHandler">
>>    <!-- default values for query parameters -->
>>     <lst name="defaults">
>>  <str name="defType">dismax</str>
>>       <str name="echoParams">explicit</str>
>>  <str name="tie">0.6</str>
>>  <str name="pf">name^2.3 mat_nr^0.4</str>
>>  <str name="mm">0%</str>
>>       <!--
>>       <int name="rows">10</int>
>>       <str name="fl">*</str>
>>       <str name="version">2.1</str>
>>        -->
>>     </lst>
>>
>>  </requestHandler>
>>
>>
>>
>> and also the field on which I want to apply searching on
>>
>>
>>
>>  <field name="mat_nr"  type="textShoaib" indexed="true" stored="true"
>> omitNorms="true"/>
>>
>>
>>
>> and the query Im using is
>>
>>
>>
>> qt=standard2&q=R-1*
>>
>>
>>
>> but this still doesnt work.
>>
>>
>> any suggestions on this?
>>
>> thanks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1393332.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1402772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Alphanumeric wildcard search problem

Reply via email to