Hi,

I want to store top 5 high frequency non-stopwords words. I use DIH to
import data. Now I have two approaches -

   1. Use DIH JavaScript to find top 5 frequency words and put them in a
   copy field. The copy field will then stem it and remove stop words based on
   appropriate tokenizers.
   2. Write a custom function for the same and add it to
   UpdateRequestProcessor Chain.

Which of the two would be better suited? I find the first approach rather
simple, but the issue is that I won't be having access to stop
words/synonyms etc at the DIH time.

In the second approach, if I add it to UpdateRequestProcessor Chain and
insert the function after StopWordsFilterFactory and
DuplicateRemoveFilterFactory, should be rather good way of doing this?

--
*Pranav Prakash*

"temet nosce"

Reply via email to