Thanks, Erick.
I am actually not trying to use the String field (prefer a TextField here). 
But, in my comparisons with TextField, it seems that something like phrase 
matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or 
say, 'my dog has*') can only be accomplished with a string type field, 
especially because, with a WhitespaceTokenizer in TextField, the space will be 
lost, and all tokens will be individually considered. Am I missing something? 
SRK 

    On Friday, November 11, 2016 10:05 PM, Erick Erickson 
<erickerick...@gmail.com> wrote:
 

 You have to query text and string fields differently, that's just the
way it works. The problem is getting the query string through the
parser as a _single_ token or as multiple tokens.

Let's say you have a string field with the "a b" example. You have a
single token
a b that starts at offset 0.

But with a text field, you have two tokens,
a at position 0
b at position 1

But when the query parser sees "a b" (without quotes) it splits it
into two tokens, and only the text field has both tokens so the string
field won't match.

OTOH, when the query parser sees "a\ b" it passes this through as a
single token, which only matches the string field as there's no
_single_ token "a b" in the text field.

But a more interesting question is why you want to search this way.
String fields are intended for keywords, machine-generated IDs and the
like. They're pretty useless for searching anything except
1> exact tokens
2> prefixes

While if you have "my dog has fleas" in a string field, you _can_
search "*dog*" and get a hit but the performance is poor when you get
a large corpus. Performance for "my*" will be pretty good though.

In all this sounds like an XY problem, what's the use-case you're
trying to solve?

Best,
Erick



On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
<sandeep_khanz...@yahoo.com.invalid> wrote:
> Hi Erick, Reth,
>
> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for 
> StrField for me.
>
> Any attempt at creating a 'a\ b*' for a TextField does not match any 
> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure 
> there are documents that should match.
> Another (maybe unrelated) observation is if I have 'field:a\ b', then the 
> parsedQuery is field:a field:b. Which does not match as expected (matches 
> individually).
>
> Can you please provide an example that I can use in Solr Query dashboard? 
> That will be helpful.
>
> I have also seen that wildcard queries work irrespective of field type i.e. 
> StrField as well as TextField. That makes sense because with a 
> WhitespaceTokenizer only creates word boundaries when we do not use a 
> EdgeNGramFilter. If I am not wrong, that is. SRK
>
>    On Friday, November 11, 2016 5:00 AM, Erick Erickson 
><erickerick...@gmail.com> wrote:
>
>
>  You can escape the space with a backslash as  'a\ b*'
>
> Best,
> Erick
>
> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM <reth.ik...@gmail.com> wrote:
>> I don't think you can do wildcard on StrField. For text field, if your
>> query is "category:(test m*)"  the parsed query will be  "category:test OR
>> category:m*"
>> You can add q.op=AND to make an AND between those terms.
>>
>> For phrase type wild card query support, as per docs, it
>> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>>
>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>
>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>> sandeep_khanz...@yahoo.com.invalid> wrote:
>>
>>> Hi,
>>> How does a search like abc* work in StrField. Since the entire thing is
>>> stored as a single token, is it a type of a trie structure that allows such
>>> wildcard matching?
>>> How can searches with space like 'a b*' be executed for text fields
>>> (tokenized on whitespace)? If we specify this type of query, it is broken
>>> down into two queries with field:a and field:b*. I would like them to be
>>> contiguous, sort of, like a phrase search with wild card.
>>> SRK
>
>
>

   

Reply via email to