what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote:
> Hi Erick, > I gave this a try. > These are my results. There is a record with "John D. Smith", and another > named "John Doe". > > 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any > results. > > 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. > > > > Second observation: There is a record with "John D Smith" > 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any > results. > > 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. > > 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. > > SRK > > On Sunday, November 13, 2016 7:43 AM, Erick Erickson < > erickerick...@gmail.com> wrote: > > > Right, for that kind of use case you want complexPhraseQueryParser, > see: https://cwiki.apache.org/confluence/display/solr/Other+ > Parsers#OtherParsers-ComplexPhraseQueryParser > > Best, > Erick > > On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode > <sandeep_khanz...@yahoo.com> wrote: > > Thanks, Erick. > > > > I am actually not trying to use the String field (prefer a TextField > here). > > But, in my comparisons with TextField, it seems that something like > phrase > > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', > or > > say, 'my dog has*') can only be accomplished with a string type field, > > especially because, with a WhitespaceTokenizer in TextField, the space > will > > be lost, and all tokens will be individually considered. Am I missing > > something? > > > > SRK > > > > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > > <erickerick...@gmail.com> wrote: > > > > > > You have to query text and string fields differently, that's just the > > way it works. The problem is getting the query string through the > > parser as a _single_ token or as multiple tokens. > > > > Let's say you have a string field with the "a b" example. You have a > > single token > > a b that starts at offset 0. > > > > But with a text field, you have two tokens, > > a at position 0 > > b at position 1 > > > > But when the query parser sees "a b" (without quotes) it splits it > > into two tokens, and only the text field has both tokens so the string > > field won't match. > > > > OTOH, when the query parser sees "a\ b" it passes this through as a > > single token, which only matches the string field as there's no > > _single_ token "a b" in the text field. > > > > But a more interesting question is why you want to search this way. > > String fields are intended for keywords, machine-generated IDs and the > > like. They're pretty useless for searching anything except > > 1> exact tokens > > 2> prefixes > > > > While if you have "my dog has fleas" in a string field, you _can_ > > search "*dog*" and get a hit but the performance is poor when you get > > a large corpus. Performance for "my*" will be pretty good though. > > > > In all this sounds like an XY problem, what's the use-case you're > > trying to solve? > > > > Best, > > Erick > > > > > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > > <sandeep_khanz...@yahoo.com.invalid> wrote: > >> Hi Erick, Reth, > >> > >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only > >> for StrField for me. > >> > >> Any attempt at creating a 'a\ b*' for a TextField does not match any > >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am > sure > >> there are documents that should match. > >> Another (maybe unrelated) observation is if I have 'field:a\ b', then > the > >> parsedQuery is field:a field:b. Which does not match as expected > (matches > >> individually). > >> > >> Can you please provide an example that I can use in Solr Query > dashboard? > >> That will be helpful. > >> > >> I have also seen that wildcard queries work irrespective of field type > >> i.e. StrField as well as TextField. That makes sense because with a > >> WhitespaceTokenizer only creates word boundaries when we do not use a > >> EdgeNGramFilter. If I am not wrong, that is. SRK > >> > >> On Friday, November 11, 2016 5:00 AM, Erick Erickson > >> <erickerick...@gmail.com> wrote: > >> > >> > >> You can escape the space with a backslash as 'a\ b*' > >> > >> Best, > >> Erick > >> > >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM <reth.ik...@gmail.com> wrote: > >>> I don't think you can do wildcard on StrField. For text field, if your > >>> query is "category:(test m*)" the parsed query will be "category:test > >>> OR > >>> category:m*" > >>> You can add q.op=AND to make an AND between those terms. > >>> > >>> For phrase type wild card query support, as per docs, it > >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it > >>> myself) > >>> > >>> > >>> https://cwiki.apache.org/confluence/display/solr/Other+ > Parsers#OtherParsers-ComplexPhraseQueryParser > >>> > >>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < > >>> sandeep_khanz...@yahoo.com.invalid> wrote: > >>> > >>>> Hi, > >>>> How does a search like abc* work in StrField. Since the entire thing > is > >>>> stored as a single token, is it a type of a trie structure that allows > >>>> such > >>>> wildcard matching? > >>>> How can searches with space like 'a b*' be executed for text fields > >>>> (tokenized on whitespace)? If we specify this type of query, it is > >>>> broken > >>>> down into two queries with field:a and field:b*. I would like them to > be > >>>> contiguous, sort of, like a phrase search with wild card. > >>>> SRK > >> > >> > >> > > > > > > > >