Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

Mike Phillips Fri, 21 Feb 2020 14:24:14 -0800

It looks like the debug result you are showing me is the results forRod's not Rod’s, but in answer to your question

This is why I think "Rod’s finds fields Rod's andRod’s that are now in the index as rod's"


The analysis page shows Rod’s gets stored in the index as:
rod's rods rod s

Field Value (Index)

Rod’s

Analyse Fieldname / FieldType: _text_ Schema Browser<https://centos1:8985/solr/#/rat_11/schema?field=_text_>


 *
   Verbose Output

WT
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position

        
Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

SF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position

        
Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

WDGF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

FGF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

PRF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

PRF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

PRF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

PRF
        
text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false

        
Rods
[52 6f 64 73]
0
5
2
word
1
1
false

        
Rod
[52 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

LCF
        
tex

t
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

        
rod's
[72 6f 64 27 73]
0
5
2
word
1
1
false

        
rods
[72 6f 64 73]
0
5
2
word
1
1
false

        
rod
[72 6f 64]
0
3
1
word
1
1
false

        
s
[73]
4
5
1
word
1
2
false

This is what we were trying to achieve with the <filterclass="solr.PatternReplaceFilterFactory" pattern="’" replacement="'"/>



The problem is when using wildcard *Rod’s* we get no hits
||

|"responseHeader":{ "status":0, "QTime":2, "params":{ "q":"*Rod’s*","debugQuery":"on", "_":"1582315262594"}},"response":{"numFound":0,"start":0,"docs":[] }, "debug":{"rawquerystring":"*Rod’s*", "querystring":"*Rod’s*","parsedquery":"_text_:*rod’s*", "parsedquery_toString":"_text_:*rod’s*","explain":{}, "QParser":"LuceneQParser", ... |







On 2/21/2020 11:52 AM, Erick Erickson wrote:

Why do you say “…that are now in the index as rod’s”? You have 
WordDelimiterGraphFilterFactory, which breaks things up. When I put your field 
definition in the schema and use the analysis page, turns “rod’s” into  the 
following 4 tokens:

rod’s
rods
rod
s

And querying on field:”*Rod’s*” works just fine. I’m using 8.x, and when I add 
“&debug=query” to the URL, I see:
{
"responseHeader": {
"status": 0, "QTime": 10, "params": {
"q": "eoe:\"*Rod's*\"", "debug": "query"
}
}, "response": {
"numFound": 1, "start": 0, "docs": [
{
"id": "1", "eoe": "Rod's", "_version_": 1659176849231577088
}
]
}, "debug": {
"rawquerystring": "eoe:\"*Rod's*\"", "querystring": "eoe:\"*Rod's*\"", "parsedquery": "SynonymQuery(Synonym(eoe:*rod's* 
eoe:rod))", "parsedquery_toString": "Synonym(eoe:*rod's* eoe:rod)", "QParser": "LuceneQParser"
}
}

What do you see?

Best,
Erick

On Feb 21, 2020, at 12:57 PM, Mike Phillips <m.phill...@prosperodigital.com> 
wrote:

Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's

but *Rod’s* finds nothing because the index now only contains rod's

Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

Reply via email to