[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756050#action_12756050
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-----------------------------------------

bq. These enable suffix compression and create much smaller word graphs.

DAWGs are problematic, because they are essentially immutable once created (the 
cost of insert / delete is very high). So I propose to stick to TSTs for now.

Also, I think that populating TST from the index would have to be 
discriminative, perhaps based on a threshold (so that it only adds terms with 
large enough docFreq), and it would be good to adjust the content of the tree 
based on actual queries that return some results (poor man's auto-learning), 
gradually removing least frequent strings to save space.. We could also use as 
a source a field with 1-3 word shingles (no tf, unstored, to save space in the 
source index, with a similar thresholding mechanism).

Ankul, I'm not sure what's the behavior of your implementation when dynamically 
adding / removing keys? Does it still remain balanced?

I also found a MIT-licensed  impl. of radix tree here: 
http://code.google.com/p/radixtree, which looks good too, one spelling mistake 
in the API notwithstanding ;)


> Create autosuggest component
> ----------------------------
>
>                 Key: SOLR-1316
>                 URL: https://issues.apache.org/jira/browse/SOLR-1316
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to