I think the second way is probably the most robust, and it's surprisingly un-complicated. You wouldn't really be using copyField in that case, just adding them to the proper field in the document.
Anything you do outside of the update chain would suffer from having to be kept in synch with the stopwords & etc. Which would be a pain to maintain whereas putting in your own element in the chain would let Solr/Lucene do a lot of that work for you... Best Erick On Sun, Jul 8, 2012 at 4:01 PM, Pranav Prakash <pra...@gmail.com> wrote: > Hi, > > I want to store top 5 high frequency non-stopwords words. I use DIH to > import data. Now I have two approaches - > > 1. Use DIH JavaScript to find top 5 frequency words and put them in a > copy field. The copy field will then stem it and remove stop words based on > appropriate tokenizers. > 2. Write a custom function for the same and add it to > UpdateRequestProcessor Chain. > > Which of the two would be better suited? I find the first approach rather > simple, but the issue is that I won't be having access to stop > words/synonyms etc at the DIH time. > > In the second approach, if I add it to UpdateRequestProcessor Chain and > insert the function after StopWordsFilterFactory and > DuplicateRemoveFilterFactory, should be rather good way of doing this? > > -- > *Pranav Prakash* > > "temet nosce"