You may want something more like "significant terms" - terms statistically significant in a document. Possibly not just based on doc freq
https://saumitra.me/blog/solr-significant-terms/ On Fri, May 15, 2020 at 2:16 PM A Adel <aa.0...@gmail.com> wrote: > Hi Walter, > > Thank you for your explanation, I understand the point and agree with you. > However, the use case at hand is building a word cloud based on faceting > the multilingual text field (very simple) which in case of not using stop > words returns many generic terms, articles, etc. If stop words filter is > not used, is there any other/better technique to be used instead to build a > meaningful word cloud? > > > On Fri, May 15, 2020, 5:20 PM Walter Underwood <wun...@wunderwood.org> > wrote: > > > Just don’t use stop words. That will give much better relevance and works > > for all languages. > > > > Stop words are an obsolete hack from the days of search engines running > > on 16 bit CPUs. They save space by throwing away important information. > > > > The classic example is “to be or not to be”, which is made up entirely of > > stop words. Remove them and it is impossible to search for that phrase. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > On May 14, 2020, at 10:47 PM, A Adel <aa.0...@gmail.com> wrote: > > > > > > Hi - Is there a way to configure stop words to be dynamic for each > > document > > > based on the language detected of a multilingual text field? Combining > > all > > > languages stop words in one set is a possibility however it introduces > > > false positives for some language combinations, such as German and > > English. > > > Thanks, A. > > > > > -- *Doug Turnbull **| CTO* | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull>; Contributor: *AI Powered Search <http://aipoweredsearch.com>* This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.