Thanks for the clarification. You may be able to get by using an ngram filter at index time - but not at query time.

Then "Tom" would be indexed at position 0 as "to", "om", and "tom", and "Hanks" would be indexed at position 1 as "ha", "an", "nk", "ks", "han", "ank", "nks", "hank", "anks", and "hanks", permitting all of your queries, as unquoted terms or quoted simple phrases, such as "to ank".

Use the standard tokenizer combined with the NGramFilterFactory and lower case filter, but only use the ngram filter at index time.

See:
http://lucene.apache.org/core/4_10_2/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

But be aware that use of the ngram filter dramatically increases the index size, so don't use it on large text fields, just short text fields like names.

-- Jack Krupansky

-----Original Message----- From: Dinesh Babu
Sent: Sunday, December 7, 2014 2:58 PM
To: solr-user@lucene.apache.org
Subject: RE: How to stop Solr tokenising search terms with spaces

Hi Alex,

My requirement is that I should be able to search for a person , for example Tom Hanks, by either

1) the whole of first name (Tom)
2) or partial first name with prefix  (To )
3) or partial first name without prefix  ( om)
4) or the whole of surname ( Hanks)
5) or partial surname with prefix (Han)
6) or partial surname without prefix (ank)
7) or the whole name (Tom Hanks)
8) or partial first name with or without prefix and partial surname with or without prefix ( To Han , om ank)
9) All of the above as case insensitive search

Thanks in advance for your help

Regards,
Dinesh Babu.


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: 07 December 2014 01:20
To: solr-user
Subject: Re: How to stop Solr tokenising search terms with spaces

There is no spoon. And, there is no "phrase search". Certainly nothing that is one approach that fits all.

What is actually happening is that you seem to want both phrase and prefix search. In your original question you did not explain the second part. So, you were given a solution for the first one.

To get the second part, you now need to to put some sort of NGram into the index-type analyzer chain. But the problem is, you need to be very clear on what you want there. Do you want:
1) Major Hanks
2) Major Ha
3) Hanks Ma (swapped)
4) Hanks random text Major (swapped and apart)
4) Ha Ma (prefix on both words)
5) ha ma (lower case searches too)
Or only some of those?

Each of these things have implications and trade-offs. Once you know what you want to find, we can help you get there.

Regards,
  Alex.
P.s. If you are not sure what I am talking about with the analyzer chain, may I recommend my own book:
http://www.amazon.ca/Instant-Apache-Solr-Indexing-Data-ebook/dp/B00D85K9XC
It seems to be on sale right now.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 6 December 2014 at 19:17, Dinesh Babu <dinesh.b...@pb.com> wrote:

Just curious, why solr does not provide a simple mechanism to do a phrase search ? It is a very common use case and it is very surprising that there is no straight forward, at least I have not found one after so much research, way to do it in Solr.

Regards,
Dinesh


-----Original Message-----
From: Dinesh Babu [mailto:dinesh.b...@pb.com]
Sent: 05 December 2014 17:29
To: solr-user@lucene.apache.org
Subject: RE: How to stop Solr tokenising search terms with spaces

Hi Erik,

Probably I celebrated too soon. When I tested {!field} it seemed to
work as the query was on such a data that it made to look like it is
working.  using the example that I originally mentioned to search for
Tom Hanks Major

1) If I search {!field f=displayName}: Hanks Major,  it works

2) If I provide partial word {!field f=displayName}: Hanks Ma,  it
does not work

Is this how {!field is designed to work?

Also I tried without and with escaping space as you suggested. It has
the same issue

1) q= field1:"Hanks Major" , it works
2) q= field1:"Hanks Maj" , does not works

Regards,
Dinesh Babu.



-----Original Message-----
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: 05 December 2014 16:44
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

But also, to spell out the more typical way to do that:

   q=field1:”…” OR field2:”…”

The nice thing about {!field} is that the value doesn’t have to have quotes and deal with escaping issues, but if you just want phrase queries and quote/escaping isn’t a hassle maybe that’s cleaner for you.

        Erik


On Dec 5, 2014, at 11:30 AM, Dinesh Babu <dinesh.b...@pb.com> wrote:

One more quick question Erik,

If I want to do search on multiple fields using {!field} do we have a
query similar to what  {!prefix} has
:  q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val}
where &f1_val=<field 1 value>&f2_val=<field2 value>

Regards,
Dinesh Babu.



-----Original Message-----
From: Dinesh Babu
Sent: 05 December 2014 16:26
To: solr-user@lucene.apache.org
Subject: RE: How to stop Solr tokenising search terms with spaces

Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate
your help


Regards,
Dinesh Babu.



-----Original Message-----
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: 05 December 2014 16:00
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

try using {!field} instead of {!prefix}.  {!field} will create a
phrase query (or term query if it’s just one term) after analysis.
[it also could construct other query types if the analysis overlaps
tokens, but maybe not relevant here]

Also note that you can use multiple of these expressions if needed:
q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where
&f1_val=<field 1 value>&f2_val=<field2 value>

       Erik



On Dec 5, 2014, at 10:45 AM, Dinesh Babu <dinesh.b...@pb.com> wrote:

Hi,

We are using Solr 4.10.2 to store user names from LDAP. I want Solr
not to tokenise my search term which has space in it Eg: If there is
a user by the name Tom Hanks Major, then

1) When I do a query for " Tom Hanks Major " , I don't want solr
break this search phrase and search for individual words (ie, Tom
,Hanks, Major), but search for the whole phrase and get me the Tom
Hanks Major user

2) Also if I query for "Hanks Major" I should get the Tom Hanks
Major user back

We used !prefix, but that does no allow the scenario 2. Also !prefix will restrict the search to one field and can't do on mutiple fields. Any solutions?

Regards,
Dinesh Babu.

________________________________



________________________________



________________________________


________________________________


________________________________

Reply via email to