On Mar 16, 2010, at 9:51 PM, blargy wrote:
>
> I was reading "Scaling Lucen and Solr"
> (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
> and I came across the section StopWords.
>
> In there it mentioned that i
010 11:13 AM
To:
Subject: Re: Stopwords
That discussion cites a paper via a URL:
http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf
Unfortunately when I go to this URL I get:
"L'accès à ce document est limité.
lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
>> and I came across the section StopWords.
>>
>> In there it mentioned that its not recommended to remove
>> stop words at index
>> time. Why is this the case? Don't all the ex
> I was reading "Scaling Lucen and Solr"
> (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
> and I came across the section StopWords.
>
> In there it mentioned that its not recommended to remove
> stop words at i
I was reading "Scaling Lucen and Solr"
(http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
and I came across the section StopWords.
In there it mentioned that its not recommended to remove stop words at index
time. Why is this the case?
> > Does any body know how to provide this ability to
> search for stopwords
>
> CommonGramsFilterFactory [1] may help.
>
Sorry, Solr 1.4 has this filter.
> Hi,
>
> I have some common stopwords defined like [a,the,of] etc.
> Our users need the
> ability to include stopwords in their search. I tried using
> + sign like,
> [Bank +of America] to get accurate results, but it does not
> work.
>
> Does any body know
Hi,
I have some common stopwords defined like [a,the,of] etc. Our users need the
ability to include stopwords in their search. I tried using + sign like,
[Bank +of America] to get accurate results, but it does not work.
Does any body know how to provide this ability to search for stopwords - we
@Mahout experts: could you please, elaborate on that?
It seems that I am stopping successfully quite some words with the stopwords
mechanism in Solr (I do not get search results when querying with stopwords
with the localhost/solr/select interface) but this somehow is not effective
when Solr index
Fields are both stored and indexed. The stored copy is exactly what
you sent in. The index is built with the "text" type's analysis stack
and is not stored. This output has the stopwords removed. The output
is not stored in one place, but parts of it are scattered around the
Lu
ut when I index
some documents the index contains stopwords - I can see this with the Luke
tool.
Am I supposed to see these terms in the index after they are declared in the
stopwords.txt file?
What could be wrong?
Best regards,
Bogdan
> From: Pooja Verlani
> Subject: Phrase stopwords
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 23, 2009, 1:15 PM
> Hi,
> Is it possible to have a phrase as a stopword in solr? In
> case, please share
> how to do so?
>
> regards,
> Pooja
&g
Hi,
Is it possible to have a phrase as a stopword in solr? In case, please share
how to do so?
regards,
Pooja
field=content}que" you bypass the query
> parsers (which is respecting your stopwords file) and see all docs that
> contain the raw term "que" in the content field.
>
> if you look at some of the docs that match, and paste their content field
> into the analysis tool,
: When indexing or querying text, i'm using the solr.StopFilterFactory ; it
seems to works just fine...
:
: But I want to use the text field as a facet, and get all the commonly
: used words in a set of results, without the stopwords. As far as I
: tried, I always get stopwords, and nume
ou do a query for "{!raw field=content}que" you bypass the query
parsers (which is respecting your stopwords file) and see all docs that
contain the raw term "que" in the content field.
if you look at some of the docs that match, and paste their content field
into the a
x27;m using the
solr.StopFilterFactory ; it seems to works just fine...
But I want to use the text field as a facet, and get all the
commonly used words in a set of results, without the stopwords. As
far as I tried, I always get stopwords, and numerical terms, that
pollute my facets results. How can
Hello,
When indexing or querying text, i'm using the solr.StopFilterFactory ; it seems
to works just fine...
But I want to use the text field as a facet, and get all the commonly used
words in a set of results, without the stopwords. As far as I tried, I always
get stopwords, and nume
che.solr.analysis.LowerCaseFilterFactory args:{}
4. org.apache.solr.analysis.SnowballPorterFilterFactory args:{languange:
Spanish }
5. org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
The field is indexed, tokenized, stored and termvectors are stored.
So, why the stopwords are in the index?
--
: Date: Tue, 9 Jun 2009 16:04:03 -0700 (PDT)
: From: JCodina
: Subject: facets and stopwords
: I have a text field from where I remove stop words, as a first approximation
: I use facets to see the most common words in the text, but.. stopwords are
: there, and if I search documents having the
I have a text field from where I remove stop words, as a first approximation
I use facets to see the most common words in the text, but.. stopwords are
there, and if I search documents having the stopwords, then , there are no
documents in the answer.
You can test it in this address (using
Thanks, I'm inclined not to even bother with stopwords as in my case I've got
a fairly small dataset and leaving them in doesn't seem to have a noticeable
effect on performance.
On Thursday 12 February 2009 15:41:05 Jeff Newburn wrote:
> Unfortunately, the stopword filter act
Unfortunately, the stopword filter acts funny (depending on who you ask) in
dismax. The short version is that the stopwords filter has to be on all
fields being queried on for minimum matches to work. We have the same issue
with one of our brands. We require all word matching so "The North
If a naive user enters a string that contains typical stopwords like "and"
and "the", these seem to be included in the word count for the must
match criteria of the the dismax query.
So, if for example the mm parameter is the default " 2>-1 5>-2
6>90%" an
orks against the way one would reasonably expect it to - that stopwords
: shouldn't impact the counts for mm (so, "the 7449078" would count as 1 term
: for mm since "the" is a stopword).
this is back to the original "problem"...
"stopwords" is an analyzer
gainst the way one would reasonably expect
it to - that stopwords shouldn't impact the counts for mm (so, "the
7449078" would count as 1 term for mm since "the" is a stopword).
Would there be a way around this? Could we possibly get it reworked?
What would the down
: Would this mean that, for example, if we wanted to search productId (long)
: we'd need to make a field type that had stopwords in it rather than simply
: using (long)?
not really ... that's kind of a special usecase. if someone searches for
a productId that's usually *all* the
Would this mean that, for example, if we wanted to search productId
(long) we'd need to make a field type that had stopwords in it rather
than simply using (long)?
Thanks for your time!
Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833
On Dec 12, 2008, at 11:
: such that if there are 2 words then both are required. Unfortunately, the
you haven't mentioned what qf you're using, and you only listed one field
type, which includes stopwords -- but i suspect your qf contains at least
one field that *doesn't* remove stopwords.
this is in fact
there are 2 words then both are required. Unfortunately, the
stopwords are pulled resulting in ³the² being removed and then solr is
requiring 2 words when only 1 exists to match on. Is there a way around
this? I really need it to either require only non-stopwords or not filter
out stopwords. We
EMAIL PROTECTED]
Sent: Saturday, December 06, 2008 1:17 AM
To: solr-user@lucene.apache.org
Subject: RE: Russian stopwords
Hi Steve,
You were right,it turned out to be a an encoding issue but a really weird
one. I was using windows notepad to save the stopwords file in UTF-8
encoding. On the other h
Hi Steve,
You were right,it turned out to be a an encoding issue but a really weird
one. I was using windows notepad to save the stopwords file in UTF-8
encoding. On the other hand I was using editplus to save synonyms file. That
was the only difference. The moment I switched to editplus for
Hi Tushar,
On 12/05/2008 at 5:18 AM, tushar kapoor wrote:
> I am trying to filter russian stopwords but have not been
> successful with that.
[...]
> words="stopwords.txt"/>
>ignoreCase="true" expand="false"
I am trying to filter russian stopwords but have not been successful with
that. I am using the following schema entry -
.
..
Intrestingly, Russian synonyms are working fine. English and russian
synonyms get
See https://issues.apache.org/jira/browse/SOLR-879
we never enabled position increments in the query parser.
-Yonik
On Mon, Nov 24, 2008 at 9:48 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Ack! I tried it too, and it failed for me also.
> The analysis page indicates that the tokens are all in
Ack! I tried it too, and it failed for me also.
The analysis page indicates that the tokens are all in the same
positions... need to look into this deeper.
Could you open up a JIRA issue?
-Yonik
On Mon, Nov 24, 2008 at 5:58 PM, Robert Haschart <[EMAIL PROTECTED]> wrote:
> Yonik,
>
> I did make s
: Subject: Phrase query search with stopwords
: In-Reply-To: <[EMAIL PROTECTED]>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. E
Yonik,
I did make sure enablePositionIncrements="true" for both indexing and
queries and just did a test where I re-indexed a couple of test record
sets, and submitted a query from the solr admin page, this time
searching for title_text:"gone with the wind" which should return
three hits,
Robert,
I've reproduced (sort of) this bad behavior with the example schema.
There was an example configuration "bug" introduced in SOLR-521
where enablePositionIncrements="true" was only set on the index
analyzer but not the query analyzer for the "text" fieldType.
A query on the example data of
Greetings all,
I'm having trouble tracking down why a particular query is not
working. A user is trying to do a search for
alternate_form_title_text:"three films by louis malle" specifically to
find the 4 records that contain the phrase "Three films by Louis Malle"
in their alternate_form_
Norberto Meijome wrote:
On Tue, 07 Oct 2008 09:27:30 -0700
Jon Drukman <[EMAIL PROTECTED]> wrote:
Yep, you can "fake" it by only using fieldsets (qf) that have a
consistent set of stopwords.
does that mean changing the query or changing the schema?
Jon,
- you change sche
Hi,
I'm not sure if that will work, but have you tried using a full path to the
stopwords file?
If that doesn't work, you can always just create symbolic links to a single
stopwords file to avoid having duplicate files.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - So
i use solr1.2
the synonyms and stopwords is at conf directory
when i have more than one webapp , i must configure synonyms and stopwords
each.
i want to define a directory for synonyms and stopwords for all webapp, it
means that all webapp share one synonyms and stopwords .
how to do it?
On Sun, 10 Aug 2008 19:58:24 -0700 (PDT)
SoupErman <[EMAIL PROTECTED]> wrote:
> I needed to run a search with a query containing the word "not", so I removed
> "not" from the stopwords.txt file. Which seemed to work, at least as far as
> parsing the query. It was now successfully searching for tha
bably something really simple..
Any help is appreciated
--
View this message in context:
http://www.nabble.com/Still-no-results-after-removing-from-stopwords-tp18919496p18919496.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks in advance
>>
>> --
>> Akeel
>>
>> On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll <[EMAIL PROTECTED]>
>> wrote:
>>
>> Stopwords are commonly occurring words that don't add _much_ value to
>>> search, such as the, an, a and are
to change something there in solrconfig.xml ?
Please
help me in this regards
Thanks in advance
--
Akeel
On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
Stopwords are commonly occurring words that don't add _much_ value to
search, such as the, an, a and
ds
>>
>> Thanks in advance
>>
>> --
>> Akeel
>>
>> On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Stopwords are commonly occurring words that don't add _much_ value to
>>> sea
something there in solrconfig.xml ? Please
> help me in this regards
>
> Thanks in advance
>
> --
> Akeel
>
> On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll <[EMAIL PROTECTED]>
> wrote:
>
>> Stopwords are commonly occurring words that don't add _much_ valu
Hi Akeel
-Stopwords are general words of language, which, as such do not contain any
meaning in searches like; a,an, the, where, who, am etc. The analyzer in
lucene ignores such words and do not index them. You can also specify you
own stopwords in stopwords.txt in SOLR
-Protwords are the
help me in this regards
Thanks in advance
--
Akeel
On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> Stopwords are commonly occurring words that don't add _much_ value to
> search, such as the, an, a and are usually removed during analysis.
> Protwo
Stopwords are commonly occurring words that don't add _much_ value to
search, such as the, an, a and are usually removed during analysis.
Protwords (protected words) are words that would be stemmed by the
English porter stemmer that you do not want to be stemmed.
In the end, rem
Hi,
I am a beginner to Solr, I have successfully indexed my db in solr. I want
to know that what are the stopwords and protwords ??? and how much they have
effect on my search results ?
Thanks in advance.
--
Akeel
t: Wednesday, April 23, 2008 4:51:55 AM
Subject: Multi language, one "body" field, multi stopwords ?
Multi language, one "body" field, multi stopwords ?
Hi all,
we are in the situation that we want to store documents from x number of
languages but in the query we want to quer
Hi all,
we are in the situation that we want to store documents from x number of
languages but in the query we want to query the same field,
but at indexing time we want different stopwords text file to be used for the
language of the uploaded document.
I thought perhaps creating a body field
>
> Phil
>
>
--
View this message in context:
http://www.nabble.com/stopwords-and-phrase-queries-tp16204254p16287383.html
Sent from the Solr - User mailing list archive at Nabble.com.
Music is another domain where this is a real problem. E.g., "The The",
"The Who", not to mention the song and album names.
-Sean
Walter Underwood wrote:
We do a similar thing with a no stopword, no stemming field.
There are a surprising number of movie titles that ar
We do a similar thing with a no stopword, no stemming field.
There are a surprising number of movie titles that are entirely
stopwords. "Being There" was the first one I noticed, but
"To be and to have" wins the prize for being all-stopwords
in two languages.
See
Yes. Our in-house example is the movie title "The Sound Of Music". Given in
quotes as a phrase this will pull up "anystopword Sound anystopword Music".
For example, "A Sound With Music". Your example is also a test case of ours.
For some Lucenicious reason "si
Am I correct that if I index with stop words: "to", "be", "or" and "not"
then phrase query "to be or not to be" will not retrieve any documents?
Is there any documentation that discusses the interaction of stop words
and phrase queries? Thanks.
Phil
Thank you very much Daniel!
Maria
Daniel Alheiros wrote:
If you do want more stopwords sources, there is this one too:
http://snowball.tartarus.org/algorithms/
And I would go for the language identification and then I would apply the
proper set.
Cheers,
Daniel
On 18/10/07 16:18, "
If you do want more stopwords sources, there is this one too:
http://snowball.tartarus.org/algorithms/
And I would go for the language identification and then I would apply the
proper set.
Cheers,
Daniel
On 18/10/07 16:18, "Maria Mosolova" <[EMAIL PROTECTED]> wrote:
>
t; http://members.unine.ch/jacques.savoy/clef/index.html for stopwords for
> several languages or check in some standard programming modules like:
> http://search.cpan.org/~fabpot/Lingua-StopWords-0.02/lib/Lingua/StopWords.pm
>
>
>
> On 10/18/07, Maria Mosolova <[EMAIL PROTECTE
Maria,
It's perfectly reasonable to build a single list, sort it, and scan it for
especially bad cases. See for example,
http://members.unine.ch/jacques.savoy/clef/index.html for stopwords for
several languages or check in some standard programming modules like:
http://search.cpan.org/~f
Original Message-
> From: Maria Mosolova [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 18, 2007 8:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: multilingual list of stopwords
>
> Thanks a lot to everyone who responded. Yes, I agree that eventually we
&
solr-user@lucene.apache.org
Subject: Re: multilingual list of stopwords
Thanks a lot to everyone who responded. Yes, I agree that eventually we
need to use separate stopword lists for different languages.
Unfortunately the data we are trying to index at the moment does not
contain any direct co
"die" in German and English. --wunder
>
> On 10/18/07 4:16 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
>
> > One example that I'm familiar with: words "is" and "by" in English and
> > in Swedish. Both words are
Also "die" in German and English. --wunder
On 10/18/07 4:16 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
> One example that I'm familiar with: words "is" and "by" in English and
> in Swedish. Both words are stopwords in English,
of multilingual stop words list before. What
should be the
purpose of it? This seems to odd to me :-)
That's because multilingual stopword list doesn't make sense ;)
One example that I'm familiar with: words "is" and "by" in English
and in Swedish. Both word
quot;by" in English and
in Swedish. Both words are stopwords in English, but they are content
words in Swedish (ice and village, respectively). Similarly, "till" in
Swedish is a stopword (to, towards), but it's a content word in English.
So, as Lukas correctly suggested, you
to merge the various language stopword
> files I need to one and use it. But the main problem in this case is,
> having collusions with words which are stopwords in one language and in
> the other not.
>
> Cheers,
> Joe
>
>
> Maria Mosolova schrieb:
> >
Hi Maria,
this is a "me too". ;)
At the moment I'll take the way to merge the various language stopword
files I need to one and use it. But the main problem in this case is,
having collusions with words which are stopwords in one language and in
the other not.
Hi,
I am looking for a multilingual list of stopwords to use with
Solr/Lucene and would greatly appreciate an advice on where I could
find it.
Thanks,
Maria
You're absolute right. I missed one field, which did not have the
solr.StopFilterFactory applied to. I must of missed that while reading the
post yesterday. Anyways, I ensured all the fields that dismax was searching
across had the stopwords applied, and now everything works great!
Thanks
: I'm having the same issues. We are using Dismax, with a stopword list.
: Currently we are having customers typing in "model ipod", we added model to
: the stopwords list and tested with the standard handler..works fine, but not
: with dismax (MM = 3<-1 5<-2 6<90%). W
I'm having the same issues. We are using Dismax, with a stopword list.
Currently we are having customers typing in "model ipod", we added model to
the stopwords list and tested with the standard handler..works fine, but not
with dismax (MM = 3<-1 5<-2 6<90%). When i comm
Thank you! That makes sense.
--Casey
>>> Mike Klaas <[EMAIL PROTECTED]> 6/7/2007 2:35 PM >>>
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:
> It appears that if your search terms include stopwords and you use
> the DisMax request handler, you get no results wh
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:
It appears that if your search terms include stopwords and you use
the DisMax request handler, you get no results whereas the same
search with the standard request handler does give you results. Is
this a bug or by design?
There is a subtlety
[EMAIL PROTECTED]> 6/7/2007 2:12 PM >>>
: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results. Is this a bug or by
: design?
dismax works j
: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results. Is this a bug or by
: design?
dismax works just fine with stop words ... can you give a
It appears that if your search terms include stopwords and you use the DisMax
request handler, you get no results whereas the same search with the standard
request handler does give you results. Is this a bug or by design?
Thanks,
--Casey
301 - 381 of 381 matches
Mail list logo