You may want something more like "significant terms" - terms statistically
significant in a document. Possibly not just based on doc freq

https://saumitra.me/blog/solr-significant-terms/

On Fri, May 15, 2020 at 2:16 PM A Adel <aa.0...@gmail.com> wrote:

> Hi Walter,
>
> Thank you for your explanation, I understand the point and agree with you.
> However, the use case at hand is building a word cloud based on faceting
> the multilingual text field (very simple) which in case of not using stop
> words returns many generic terms, articles, etc. If stop words filter is
> not used, is there any other/better technique to be used instead to build a
> meaningful word cloud?
>
>
> On Fri, May 15, 2020, 5:20 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
>
> > Just don’t use stop words. That will give much better relevance and works
> > for all languages.
> >
> > Stop words are an obsolete hack from the days of search engines running
> > on 16 bit CPUs. They save space by throwing away important information.
> >
> > The classic example is “to be or not to be”, which is made up entirely of
> > stop words. Remove them and it is impossible to search for that phrase.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On May 14, 2020, at 10:47 PM, A Adel <aa.0...@gmail.com> wrote:
> > >
> > > Hi - Is there a way to configure stop words to be dynamic for each
> > document
> > > based on the language detected of a multilingual text field? Combining
> > all
> > > languages stop words in one set is a possibility however it introduces
> > > false positives for some language combinations, such as German and
> > English.
> > > Thanks, A.
> >
> >
>


-- 
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>; Contributor: *AI
Powered Search <http://aipoweredsearch.com>*
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Reply via email to