Re: Strategies for effective prefix queries?

Alexandre Rafalovitch Wed, 16 Jul 2014 23:50:24 -0700

I guess you did not bother clicking through the link then, because
that's exactly the filter I was using. :-) I am glad you found it this
way.


You can also find the full list of filters and tokenizers at:
http://www.solr-start.com/info/analyzers/

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Thu, Jul 17, 2014 at 12:53 PM, Hayden Muhl <haydenm...@gmail.com> wrote:
> Thank you Jorge. I didn't know about that filter. It's just what I was
> looking for.
>
> - Hayden
>
>
> On Wed, Jul 16, 2014 at 4:35 PM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>
>> Perhaps what you’re trying to do could be addressed by using the
>> EdgeNGramFilterFactory filter? For query suggestions I’m using a very
>> similar approach, this is an extract of the configuration I’m using:
>>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
>> catenateAll="0" splitOnCaseChange="1"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.EdgeNGramFilterFactory" maxGramSize=“10"
>> minGramSize="1"/>
>>
>> Basically this allows you to get partial matches from any part of the
>> string, let’s say the field get’s this content at index time: "A brown
>> fox”, this document will be matched by the query (“bro”) for instance. My
>> personal recommendation is to use this in a separated field that get’s
>> populated through a copyField, this way you could apply different boosts.
>>
>> Greetings,
>>
>> On Jul 16, 2014, at 2:00 PM, Hayden Muhl <haydenm...@gmail.com> wrote:
>>
>> > A copy field does not address my problem, and this has nothing to do with
>> > stored fields. This is a query parsing problem, not an indexing problem.
>> >
>> > Here's the use case.
>> >
>> > If someone has a username like "bob-smith", I would like it to match
>> > prefixes of "bo" and "sm". I tokenize the username into the tokens "bob"
>> > and "smith". Everything is fine so far.
>> >
>> > If someone enters "bo sm" as a search string, I would like "bob-smith" to
>> > be one of the results. The query to do this is straight forward,
>> > "username:bo* username:sm*". Here's the problem. In order to construct
>> that
>> > query, I have to tokenize the search string "bo sm" **on the client**. I
>> > don't want to reimplement tokenization on the client. Is there any way to
>> > give Solr the string "bo sm", have Solr do the tokenization, then treat
>> > each token like a prefix?
>> >
>> >
>> > On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> So copyField it to another and apply alternative processing there. Use
>> >> eDismax to search both. No need to store the copied field, just index
>> it.
>> >>
>> >> Regards,
>> >>     Alex
>> >> On 16/07/2014 2:46 am, "Hayden Muhl" <haydenm...@gmail.com> wrote:
>> >>
>> >>> Both fields? There is only one field here: username.
>> >>>
>> >>>
>> >>> On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> >> arafa...@gmail.com
>> >>>>
>> >>> wrote:
>> >>>
>> >>>> Search against both fields (one split, one not split)? Keep original
>> >>>> and tokenized form? I am doing something similar with class name
>> >>>> autocompletes here:
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
>> >>>>
>> >>>> Regards,
>> >>>>   Alex.
>> >>>> Personal: http://www.outerthoughts.com/ and @arafalov
>> >>>> Solr resources: http://www.solr-start.com/ and @solrstart
>> >>>> Solr popularizers community:
>> >> https://www.linkedin.com/groups?gid=6713853
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl <haydenm...@gmail.com>
>> >>> wrote:
>> >>>>> I'm working on using Solr for autocompleting usernames. I'm running
>> >>> into
>> >>>> a
>> >>>>> problem with the wildcard queries (e.g. username:al*).
>> >>>>>
>> >>>>> We are tokenizing usernames so that a username like "solr-user" will
>> >> be
>> >>>>> tokenized into "solr" and "user", and will match both "sol" and "use"
>> >>>>> prefixes. The problem is when we get "solr-u" as a prefix, I'm having
>> >>> to
>> >>>>> split that up on the client side before I construct a query
>> >>>> "username:solr*
>> >>>>> username:u*". I'm basically using a regex as a poor man's tokenizer.
>> >>>>>
>> >>>>> Is there a better way to approach this? Is there a way to tell Solr
>> >> to
>> >>>>> tokenize a string and use the parts as prefixes?
>> >>>>>
>> >>>>> - Hayden
>> >>>>
>> >>>
>> >>
>>
>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>> julio de 2014. Ver www.uci.cu
>>

Re: Strategies for effective prefix queries?

Reply via email to