Stopwords are commonly occurring words that don't add _much_ value to search, such as the, an, a and are usually removed during analysis. Protwords (protected words) are words that would be stemmed by the English porter stemmer that you do not want to be stemmed.

In the end, removing stopwords may keep your index smaller and can keep some queries from taking a long time, but they also mean you can't query for those words. As for protwords, that is something you would do if you felt the results for those tokens was "off".

Many people use stopwords, many don't. Personally, I don't think removing them is the right thing to do, as there isn't always a way to recover them and they do provide meaning, otherwise why would they be needed in the language? Often, the best thing to do, is keep stopwords, but handle them intelligently on the query side (in phrases, etc.). However, since you're a beginner, it probably makes sense to just throw out stopwords for now.

-Grant

On May 21, 2008, at 1:50 AM, Akeel wrote:

Hi,

I am a beginner to Solr, I have successfully indexed my db in solr. I want to know that what are the stopwords and protwords ??? and how much they have
effect on my search results ?



Thanks in advance.



--

Akeel


Reply via email to