Stop words for CLDR

Marius Spix via Unicode Thu, 23 Jan 2020 10:33:18 -0800

I wonder if there is any interest in adding stop words to CLDR? Stop
words are ignored by natural language processing algorithms, with use
cases like search engines, word clouds and text classification.


There are already existing collections with stop words like [1] or [2]
which could be used, but I believe that Unicode CLDR would be the best
place for such lists.

Regards,

Marius Spix

[1] https://pypi.org/project/stop-words/
[2]
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip

Stop words for CLDR

Reply via email to