[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756050#action_12756050 ]
Andrzej Bialecki commented on SOLR-1316: ----------------------------------------- bq. These enable suffix compression and create much smaller word graphs. DAWGs are problematic, because they are essentially immutable once created (the cost of insert / delete is very high). So I propose to stick to TSTs for now. Also, I think that populating TST from the index would have to be discriminative, perhaps based on a threshold (so that it only adds terms with large enough docFreq), and it would be good to adjust the content of the tree based on actual queries that return some results (poor man's auto-learning), gradually removing least frequent strings to save space.. We could also use as a source a field with 1-3 word shingles (no tf, unstored, to save space in the source index, with a similar thresholding mechanism). Ankul, I'm not sure what's the behavior of your implementation when dynamically adding / removing keys? Does it still remain balanced? I also found a MIT-licensed impl. of radix tree here: http://code.google.com/p/radixtree, which looks good too, one spelling mistake in the API notwithstanding ;) > Create autosuggest component > ---------------------------- > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.