The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ...

My main goal is to implement autocompletion with a mix of words and short phrases, where the words are drawn from the text of largish documents, and the phrases are author names and document titles.

I think the best way to accomplish this is to concoct a single field that contains data from these other "source" fields (as usual with copyField), but with some of the fields treated as keywords (ie with their values inserted as single tokens), and others tokenized. I believe this would be possible at the Lucene level by calling Document.addField () with multiple fields having the same name: some marked as TOKENIZED and others not. I think the tokenized fields would have to share the same analyzer, but that's OK for my case.

I can't see how this could be made to happen in Solr without a lot of custom coding though. It seems as if the conversion from Solr fields to Lucene fields is not an easy thing to influence. If anyone has an idea how to achieve the subgoal, or perhaps a different way of getting at the main goal, I'd love to hear about it.

So far my only other idea is to write some kind of custom analyzer that treats short texts as keywords and tokenizes longer ones, which is probably what I'll look at if nothing else comes up.

Thanks

Mike


On 4/9/2014 4:16 PM, Michael Sokolov wrote:
I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal?

Which is to have a suggester that suggests words from my full text field and complete phrases drawn from my author and title fields all at the same time. So If I could index author and title using KeyWordAnalyzer, and full text tokenized, that would be the bees knees.

-Mike

Reply via email to