Re: Fast autocomplete for large dataset

Erick Erickson Sat, 01 Aug 2015 18:49:12 -0700

Here's some background:

http://lucidworks.com/blog/solr-suggester/


Basically, the limitation is that to build the suggester all docs in
the index need to be read to pull out the stored field and build
either the FST or the sidecar Lucene index, which can be a _very_
costly operation (as in minutes/hours for a large dataset).

bq: The requirement is that the autocomplete should be fast (not
slowdown by the volume of data as dataset become bigger)

Well, in some alternate universe this may be possible. But the larger
the corpus the slower the processing will be, there's just no way
around that. Whether it's fast enough for your application is a better
question ;).

Best,
Erick


On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina
<olivier.aust...@gmail.com> wrote:
> Thank you Eric,
>
> I would like to implement an autocomplete for large dataset.  The
> autocomplete should show the phrase or the question the user want as the
> user types. The requirement is that the autocomplete should be fast (not
> slowdown by the volume of data as dataset become bigger), and easy to
> maintain. The autocomplete can have its own Solr server.  It is an
> autocomplete like others but it should be only fast and easy to maintain.
>
> What is the limitations of suggesters mentioned in the article? Thank you.
>
> Regards
> Olivier
>
>
> 2015-08-01 19:41 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
>
>> Not really. There's no need to use ngrams as the article suggests if the
>> terms component does what you need. Which is why I asked you about what
>> autocomplete means in your context. Which you have not clarified. Have you
>> even looked at terms component?  Especially the terms.prefix option?
>>
>> Terms component has it's limitations, but performance isn't one of them.
>> The suggesters mentioned in the article have other limitations. It's really
>> useless to discuss those limitations, though, until the problem you're
>> trying to solve is clearly stated.
>> On Aug 1, 2015 1:01 PM, "Olivier Austina" <olivier.aust...@gmail.com>
>> wrote:
>>
>> > Thank you Eric for your reply.
>> > If I understand it seems that these approaches are using index to hold
>> > terms. As the index grows bigger, it can be a performance issues.
>> > Is it right? Please can you check this article
>> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see
>> > what I mean?   Thank you.
>> >
>> > Regards
>> > Olivier
>> >
>> >
>> > 2015-08-01 17:42 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
>> >
>> > > Well, defining what you mean by "autocomplete" would be a start. If
>> it's
>> > > just
>> > > a user types some letters and you suggest the next N terms in the list,
>> > > TermsComponent will fix you right up.
>> > >
>> > > If it's more complicated, the AutoSuggest functionality might help.
>> > >
>> > > If it's correcting spelling, there's the spellchecker.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
>> > > <olivier.aust...@gmail.com> wrote:
>> > > > Hi,
>> > > >
>> > > > I am looking for a fast and easy to maintain way to do autocomplete
>> for
>> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
>> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
>> > > > But I would like to know if there is something I missed such as best
>> > > > practice, Solr new feature. Any suggestion is welcome. Thank you.
>> > > >
>> > > > Regards
>> > > > Olivier
>> > >
>> >
>>

Re: Fast autocomplete for large dataset

Reply via email to