Re: stopwords in solr

2012-11-27 Thread Joe Zhang
that is really strange. so basic stopwords such as a the' are not eliminated from the index? On Tue, Nov 27, 2012 at 11:16 PM, 曹霖 cao...@babytree-inc.com wrote: justt no stopwords are considered in that case 2012/11/28 Joe Zhang smartag...@gmail.com t no stopwords are considered

Re: stopwords in solr

2012-11-27 Thread Andy Lester
On Nov 28, 2012, at 12:33 AM, Joe Zhang smartag...@gmail.com wrote: that is really strange. so basic stopwords such as a the' are not eliminated from the index? There is no list of basic stopwords anywhere. If you want stop words, you have to put them in the file yourself

Re: stopwords in solr

2012-11-27 Thread Walter Underwood
Eliminating stopwords is generally a bad idea. It means you cannot search for vitamin a. Back in the 1970's, search engines eliminated stopwords so they could work on 16-bit machines. That isn't a problem any more. wunder On Nov 27, 2012, at 10:33 PM, Joe Zhang wrote: that is really strange

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
words with umlauts in the text box for indexing and queries. Lance - Original Message - | From: Daniel Brügge daniel.brue...@googlemail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, November 7, 2012 8:45:45 AM | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
...@googlemail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, November 7, 2012 8:45:45 AM | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters | | Hi, | | i am running a SolrCloud cluster with the 4.0.0 version. I have a | stopwords | file | which

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Robert Muir
On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
at 12:12 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
issue, which somehow destroys my file. Will check. On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer. Ensure any of your

SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-07 Thread Daniel Brügge
Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. It contains german Umlaute like e.g. 'ü'. I am also running a standalone Zookeeper which contains this stopwords file. In my schema i am using the stopwords file in the standard

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-07 Thread Lance Norskog
: Wednesday, November 7, 2012 8:45:45 AM | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters | | Hi, | | i am running a SolrCloud cluster with the 4.0.0 version. I have a | stopwords | file | which is in the correct encoding. It contains german Umlaute like | e.g. 'ü

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Spadez
form within Solr -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008580.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Erick Erickson
still learning about this, but by importing it twice, I think remove the need to ever store the uneccessary fulltext document in its original form within Solr -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Ahmet Arslan
Hi James, In order to do the copyfield technique, I need to store the original full text document within Solr, like this: field name=truncated_description indexed=false stored=false field name=quot;keyword_descriptionquot; indexed=quot;truequot; stored=quot;lt;btrue* No, that's not

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Jack Krupansky
@lucene.apache.org Subject: Re: Taking a full text, then truncate and duplicate with stopwords Ok, I’ve been doing a bit more research. In order to do the copyfield technique, I need to store the original full text document within Solr, like this: field name=truncated_description indexed=false stored

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Spadez
in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008615.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Jack Krupansky
Krupansky -Original Message- From: Spadez Sent: Tuesday, September 18, 2012 10:33 AM To: solr-user@lucene.apache.org Subject: Re: Taking a full text, then truncate and duplicate with stopwords Ok, thank you for the reply. I have one more question then I think everything is cleared up. If I

Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
stopwords to remove common words):* How should I be doing this. Purely with index analyzer's? -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269.html Sent from the Solr - User mailing list archive

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Ahmet Arslan
Form (using stopwords to remove common words):* Are you going to use this keyword form for searching or displaying purposes?

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
Purely for searching. The truncated form is just to show to the user as a preview, and the keyword form is for the keyword searching. -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008295.html Sent

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
it first into truncated_description and then again into keyword_description. -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008327.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Ahmet Arslan
--- On Mon, 9/17/12, Spadez james_will...@hotmail.com wrote: From: Spadez james_will...@hotmail.com Subject: Re: Taking a full text, then truncate and duplicate with stopwords To: solr-user@lucene.apache.org Date: Monday, September 17, 2012, 5:32 PM In an attempt to answer my own question

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
bench rings man engages hands-free speaker function begins talk Everyone else -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008358.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Ahmet Arslan
The trouble is, I want the truncated desciption to still have the keywords. copyField copies raw text, it has noting to do with analysis.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
. copyField source=keyword_description dest=truncated_description maxChars=3000/ -- View this message in context: http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008372.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Ahmet Arslan
--- On Mon, 9/17/12, Spadez james_will...@hotmail.com wrote: From: Spadez james_will...@hotmail.com Subject: Re: Taking a full text, then truncate and duplicate with stopwords To: solr-user@lucene.apache.org Date: Monday, September 17, 2012, 7:10 PM Maybe I dont understand, but if you

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Jack Krupansky
and duplicate with stopwords The trouble is, I want the truncated desciption to still have the keywords. copyField copies raw text, it has noting to do with analysis.

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
into keyword_document which uses stopwords to remove words like and it this. Now I only have 3000 words for example. Then if I do copy command to move it into truncate_document then even though I can reduce it down to say 100 words, it is lacking words like and it and this because it has been copied from

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Ahmet Arslan
Then if I do copy command to move it into truncate_document then even though I can reduce it down to say 100 words, it is lacking words like and it and this because it has been copied from the keyword_document. That's not true. copy operation is performed before analysis (stopword removal,

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Jack Krupansky
the value in different ways. -- Jack Krupansky -Original Message- From: Spadez Sent: Monday, September 17, 2012 12:29 PM To: solr-user@lucene.apache.org Subject: Re: Taking a full text, then truncate and duplicate with stopwords I'm really confused here. I have a document which is say 4000

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Spadez
source value even if they analyze and index the value in different ways. -- Jack Krupansky -Original Message- From: Spadez Sent: Monday, September 17, 2012 12:29 PM To: solr-user@.apache Subject: Re: Taking a full text, then truncate and duplicate with stopwords I'm really

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-17 Thread Jack Krupansky
@lucene.apache.org Subject: Re: Taking a full text, then truncate and duplicate with stopwords Ah, ok this is news to me and makes a lot more sense. If I can just run this back past you to make sure I understand. If I move my full_text to If I move my fulltext document from my SQL database

Re: are stopwords indexed?

2012-07-17 Thread Erick Erickson
Two things: 1 did you re-index after you got your stopwords file set up? And I'd blow away the index directory before re-indexing. 2 If you _store_ your field, the stopwords will be in your results lists, but _not_ in your index. As a secondary check, try going into your admin/schema browser

Re: are stopwords indexed?

2012-07-16 Thread Lance Norskog
Look at the index with the Schema Browser in the Solr UI. This pulls the terms for each field. On Sun, Jul 15, 2012 at 8:38 PM, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say

Re: are stopwords indexed?

2012-07-16 Thread Michael Belenki
Hi Giovanni, you have entered the stopwords into stopword.txt file, right? But in the definition of the field type you are referencing stopwords_FR.txt.. best regards, Michael On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Hi all, are stopwords from

Re: are stopwords indexed?

2012-07-16 Thread Giovanni Gherdovich
, but... they were all there. Michael: Hi Giovanni, you have entered the stopwords into stopword.txt file, right? But in the definition of the field type you are referencing stopwords_FR.txt.. good catch Micheal, but that's not the problem. In my message I referred to stopwords.txt, but actually my

are stopwords indexed?

2012-07-15 Thread Giovanni Gherdovich
Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType text from the example schema.xml, i.e. I have -- -- 8

Re: Filter facet_fields with Solr similar to stopwords

2012-03-08 Thread Chris Hostetter
: I am using a solr.StopFilterFactory in a query filter for a text_general : field (here: content). It works fine, when I query the field for the : stopword, then I am getting no results. ... : used in the text. What I am trying to achieve is, to also filter the : stopwords from

Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
to achieve is, to also filter the stopwords from the facet_fields, but it's not working. It would only work if the stopwords are also used during the indexing of the text_general field, right? The problem here is, that it's too much data to re-index every time I add a new stopword. My current

Re: Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
doing a facet.field=content call to get the words which are used in the text. What I am trying to achieve is, to also filter the stopwords from the facet_fields, but it's not working. It would only work if the stopwords are also used during the indexing of the text_general field, right

Re: Highlighting stopwords

2012-02-15 Thread O. Klein
rather than the value of q. There's no tricks, I think. koji -- Apache Solr Query Log Visualizer http://soleami.com/ Field definitions: content_text (no stopwords, only synonyms in index) content_hl (stopwords, synonyms in index and query, and only field in hl.fl) Searching is done

Re: Highlighting stopwords

2012-02-14 Thread O. Klein
O. Klein wrote Hmm, now the synonyms aren't highlighted anymore. OK back to basic (im using trunk and FVH). What is the way to go about if I want to search on a field without stopwords, but still want to highlight the stopwords? (and still highlight synonyms and stemmed words)? I

Re: Highlighting stopwords

2012-02-14 Thread O. Klein
O. Klein wrote O. Klein wrote Hmm, now the synonyms aren't highlighted anymore. OK back to basic (im using trunk and FVH). What is the way to go about if I want to search on a field without stopwords, but still want to highlight the stopwords? (and still highlight synonyms

Re: Highlighting stopwords

2012-02-14 Thread Koji Sekiguchi
. There's no tricks, I think. When using hl.q=content_hl:(spell Check) I now get highlighting including stopwords. but when using hl.q=content_hl:(SC) where SC is synonym I get no highlighting. Can you verify if synonyms work when using hl.q? : OK I got it working by using hl.q=content_hl

Re: Highlighting stopwords

2012-02-14 Thread O. Klein
flexibility. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3744054.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting stopwords

2012-02-13 Thread O. Klein
to explain a bit more how hl.q is supposed to work and with some examples? Thanx. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3740114.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting stopwords

2012-02-13 Thread Koji Sekiguchi
I got it fixed now I think. I thought that if you used it like hl.q=spell Checker it would use the query analysis of the field that was being highlighted as default. But in my case it needs to be hl.q=content_hl:(spell Checker) for it to work. The behavour I got default made no sense whatsoever.

Re: Highlighting stopwords

2012-02-13 Thread O. Klein
Hmm, now the synonyms aren't highlighted anymore. OK back to basic (im using trunk and FVH). What is the way to go about if I want to search on a field without stopwords, but still want to highlight the stopwords? (and still highlight synonyms and stemmed words)? -- View this message

Re: Highlighting stopwords

2012-02-11 Thread O. Klein
someone confirm whether this is a bug? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3734892.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting stopwords

2012-02-11 Thread Koji Sekiguchi
(12/02/11 21:19), O. Klein wrote: Koji Sekiguchi wrote (12/01/24 9:31), O. Klein wrote: Let's say I search for spellcheck solr on a website that only contains info about Solr, so solr was added to the stopwords.txt. The query that will be parsed then (dismax) will not contain the term solr.

Re: Highlighting stopwords

2012-01-24 Thread Koji Sekiguchi
(12/01/24 9:31), O. Klein wrote: Let's say I search for spellcheck solr on a website that only contains info about Solr, so solr was added to the stopwords.txt. The query that will be parsed then (dismax) will not contain the term solr. So fragments won't contain highlights of the term solr. So

Re: Highlighting stopwords

2012-01-24 Thread O. Klein
Ah, I never used the hl.q That did the trick. Thanx! -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3684245.html Sent from the Solr - User mailing list archive at Nabble.com.

solr stopwords issue - documents are not matching

2012-01-24 Thread Ankita Patil
Hi, I am using solr-3.4. My part of the schema looks like : fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory

Highlighting stopwords

2012-01-23 Thread O. Klein
Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Is there a way to do this? -- View this message in context: http://lucene.472066.n3

Re: Highlighting stopwords

2012-01-23 Thread Koji Sekiguchi
(12/01/23 23:14), O. Klein wrote: Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Please provide more info. In particular, how your

Re: Highlighting stopwords

2012-01-23 Thread O. Klein
Koji Sekiguchi wrote (12/01/23 23:14), O. Klein wrote: Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Please provide more

Re: stopwords as privacy measure

2012-01-10 Thread Michael Lissner
It's a bit of a privacy through obscurity measure, unfortunately. The problem is that American courts do a lousy job of removing social security numbers from cases that I put on my site. I do anonymization before sending the cases to Solr, but if you're clever (and the stopwords weren't

Re: stopwords as privacy measure

2012-01-09 Thread Erik Hatcher
at index and query time, so sounds like I'm all set. I'm doing anonymization of social security numbers, converting them to xxx-xx-. I don't *think* users can find a way of identifying these docs if the stopwords-based block works. Thank you both for the confirmation. Mike On Sun 08

stopwords as privacy measure

2012-01-08 Thread Michael Lissner
I have a unique use case where I have words in my corpus that users shouldn't ever be allowed to search for. My theory is that if I add these to the stopwords list, that should do the trick. I'm using the edismax parser and it seems to be working in my dev environment. Is there any risk

Re: stopwords as privacy measure

2012-01-08 Thread Ted Dunning
On Sun, Jan 8, 2012 at 3:33 PM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have a unique use case where I have words in my corpus that users shouldn't ever be allowed to search for. My theory is that if I add these to the stopwords list, that should do the trick. That should do

Re: stopwords as privacy measure

2012-01-08 Thread Gora Mohanty
On Mon, Jan 9, 2012 at 5:03 AM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have a unique use case where I have words in my corpus that users shouldn't ever be allowed to search for. My theory is that if I add these to the stopwords list, that should do the trick. Yes, that should

Re: stopwords as privacy measure

2012-01-08 Thread Michael Lissner
I've got them configured at index and query time, so sounds like I'm all set. I'm doing anonymization of social security numbers, converting them to xxx-xx-. I don't *think* users can find a way of identifying these docs if the stopwords-based block works. Thank you both

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Pranav Prakash
intentions of using both of them is - first I want to use phrase queries so used CommonGramsFilterFactory. Secondly, I dont want those stopwords in my index, so I have used StopFilterFactory to remove them. The commongrams filter turns each found occurrence of a word in the file into two tokens

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Shawn Heisey
On 9/23/2011 1:45 AM, Pranav Prakash wrote: Maybe I am wrong. But my intentions of using both of them is - first I want to use phrase queries so used CommonGramsFilterFactory. Secondly, I dont want those stopwords in my index, so I have used StopFilterFactory to remove them. CommonGrams

StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Pranav Prakash
Hi List, I included StopFilterFactory and I can see it taking action in the Analyzer Interface. However, when I go to Schema Analyzer, I see those stop words in the top 10 terms. Is this normal? fieldType name=text_commongrams class=solr.TextField analyzer charFilter

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Shawn Heisey
On 9/22/2011 3:54 AM, Pranav Prakash wrote: Hi List, I included StopFilterFactory and I can see it taking action in the Analyzer Interface. However, when I go to Schema Analyzer, I see those stop words in the top 10 terms. Is this normal? fieldType name=text_commongrams class=solr.TextField

KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi, I have an autocomplete fieldType that works really well, but because the KeywordTokenizerFactory (if I understand correctly) is emitting a single token, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did try

Re: KeywordTokenizerFactory and stopwords

2011-06-08 Thread Erik Hatcher
, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm not sure I want to add a bunch of reg-exps for replacing every stopword. Thanks, Matt Here's the fieldType definition

Re: KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi Erik. Yes something like what you describe would do the trick. I did find this: http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html I might try the pattern replace filter with stopwords, even though that feels kinda clunky. Matt On Wed, Jun 8, 2011 at 11

Re: Dismax Minimum Match/Stopwords Bug

2011-05-02 Thread Chris Hostetter
opinion) is not to give up on stop words -- if you want to use stop words, by all means use stop words. BUT! You must use them in all the fields of your qf ... evne fields where you think why in gods name would i need stopwords on this field, those terms will never exist in this field! ... you may

Dismax Minimum Match/Stopwords Bug

2011-04-15 Thread Jan Høydahl
A thread with this same subject from 2008/2009 is here: http://search-lucene.com/m/jkBgXnSsla We're seeing customers being bitten by this bug now and then, and normally my workaround is to simply not use stopwords at all. However, is there an actual fix in the 3.1 eDisMax parser which solves

AW: stopwords not working in multicore setup

2011-03-28 Thread Martin Rödig
: stopwords not working in multicore setup Ahh, thank you for the hints Martin... German stopwords without Umlaut work correctly. So I'm trying to figure out where the UTF-8 chars are getting messed up. Using the Solr admin web UI, I did a search for title:für and the xml (or json) output

AW: stopwords not working in multicore setup

2011-03-25 Thread Martin Rödig
I have some questions about your config: Is the stopwords-de.txt in the same diractory as the shema.xml? Is the title field from type text? Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like the word denn? A Problem can be that the stopwords-de.txt is not save as UTF

Re: stopwords not working in multicore setup

2011-03-25 Thread Christopher Bottaro
Ahh, thank you for the hints Martin... German stopwords without Umlaut work correctly. So I'm trying to figure out where the UTF-8 chars are getting messed up. Using the Solr admin web UI, I did a search for title:für and the xml (or json) output in the browser shows the query with the proper

stopwords not working in multicore setup

2011-03-24 Thread Christopher Bottaro
Hello, I'm running a Solr server with 5 cores. Three are for English content and two are for German content. The default stopwords setup works fine for the English cores, but the German stopwords aren't working. The German stopwords file is stopwords-de.txt and resides in the same directory

How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
Hello everyone, I am developing a multilingual index so there is a need for different languages support. I need some answers to the follwing questions: 1. Which steps should I follow in order to get(download) all the stopwords-synonyms files for several languages? 2. Is there any site

Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread Markus Jelsma
On Friday 18 March 2011 17:09:35 abiratsis wrote: Hello everyone, I am developing a multilingual index so there is a need for different languages support. I need some answers to the follwing questions: 1. Which steps should I follow in order to get(download) all the stopwords-synonyms

Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
OK thanx Markus, is clear enough now -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698566.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
OK thanx Markus, is clear enough now -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698567.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698593.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread Markus Jelsma
depend on what you're indexing you mean that I probably need to implement a mechanism for handling synonyms right? If yes, you have any suggestions how to implement this? Thanx, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files

Question regarding indexing multiple languages, stopwords, etc.

2011-02-21 Thread Greg Georges
Hello all, I have gotten my DataImporthandler to index my data from my MySQL database. I was looking at the schema tool and noticing that stopwords in different languages are being indexed as terms. The 6 languages we have are English, French, Spanish, Chinese, German and Italian. Right now I

Re: Question regarding indexing multiple languages, stopwords, etc.

2011-02-21 Thread Otis Gospodnetic
Greg, You need to get stopword lists for your 6 languages. Then you need to create new field types just like that 'text' type, one for each language. Point them to the appropriate stopwords files and instead of English specify each one of your languages. You can either index each language

Re: stopwords file configuration

2010-11-16 Thread alendo
I reply to myself because I founded the mistake. The italian stopwords file that I founded on apache site contains on the same line of each stopword a comment shell style, the stopwords tokenizer probably is basical and doesn't accept comments on the same line of stopwords. I dropped them

stopwords file configuration

2010-11-16 Thread alendo
I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR 1.4). I would like to use stopwords, and I installed in LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the file. Moreover the field where I want to clean stopwords is declared in schema.xml

subquery with stopwords

2010-10-07 Thread Rodrigo Rezende
I'm not sure but it seems to me that subqueries query(.) [ http://wiki.apache.org/solr/FunctionQuery#query ] with only stopwords are evaluated forall documents. Example: q={!func}myFunction(query(field:the))fq=field:(helloworld) Since the is a stopword for field field, query(field:the

Re: subquery with stopwords

2010-10-07 Thread Otis Gospodnetic
- Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Rodrigo Rezende rcreze...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, October 7, 2010 4:03:57 PM Subject: subquery with stopwords I'm not sure but it seems to me that subqueries

stopwords in AND clauses

2010-09-13 Thread Xavier Noria
Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like body_t:foo AND flag_t:true to be an intersection, but if foo is a stopword I get all documents for which flag_t is true, as if the first class was dropped, or if

Re: stopwords in AND clauses

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria f...@hashref.com wrote: Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like    body_t:foo AND flag_t:true this is solr right? why don't you use filterquery for you unexposed

Re: stopwords in AND clauses

2010-09-13 Thread Xavier Noria
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer simon.willna...@googlemail.com wrote: On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria f...@hashref.com wrote: Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like    

Re: Prefix-Search with Stopwords - no results?

2010-05-31 Thread Gert Brinkmann
filter the EdgeNGramTokenFilter field? Otherwise I would run into the same problems again, won't I? Or if stopword filtering is ok on this field: Do you filter the stopwords before or after EdgeNGram tokenizing? Thanks, Gert

Re: Prefix-Search with Stopwords - no results?

2010-05-29 Thread Gert Brinkmann
dismax... q=word1 word2 word3 qf=text text_prefix mm=100% tie=0 Ok, I will think about this. But I wonder if this will be more efficient than just not filtering stopwords? (But I have to study the EdgeNGram thing first. AFAIK it indexes all WORDS as WORDS, WORD, WOR, WO. So the index

Re: Prefix-Search with Stopwords - no results?

2010-05-29 Thread Erick Erickson
=0 Ok, I will think about this. But I wonder if this will be more efficient than just not filtering stopwords? (But I have to study the EdgeNGram thing first. AFAIK it indexes all WORDS as WORDS, WORD, WOR, WO. So the index will be blown up, too?) What I do not understand in your idea, why I

Prefix-Search with Stopwords - no results?

2010-05-28 Thread Gert Brinkmann
Hello, I am having some problems with solr 1.4. I am indexing and querying data using the following fieldType: fieldType name=text_de_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter

Re: Prefix-Search with Stopwords - no results?

2010-05-28 Thread Erick Erickson
Hmmm, I don't really see the problem here. I'll have to use English examples... Searching on the* (assuming the is a stopword) will search on (them OR theory OR thespian) assuming those three words are in your index. It will NOT search on the. So I think you're OK, or are you seeing anomalous

Re: Prefix-Search with Stopwords - no results?

2010-05-28 Thread Chris Hostetter
: Searching on the* (assuming the is a stopword) will search on : (them OR theory OR thespian) assuming those three words are in : your index. It will NOT search on the. So I think you're OK, or are : you seeing anomalous results? i think the missing pieces to hte puzzle here are: 1) wildcard

Re: Stopwords

2010-03-17 Thread Ahmet Arslan
I was reading Scaling Lucen and Solr (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) and I came across the section StopWords. In there it mentioned that its not recommended to remove stop words at index time. Why is this the case? Don't

Re: Stopwords

2010-03-17 Thread Glen Newton
-the-Experts/Articles/Scaling-Lucene-and-Solr/) and I came across the section StopWords. In there it mentioned that its not recommended to remove stop words at index time. Why is this the case? Don't all the extraneous stopwords bloat the index and lead to less relevant results? Can someone

Re: Stopwords

2010-03-17 Thread Anthony Serfes
11:13 AM To: solr-user@lucene.apache.org Subject: Re: Stopwords That discussion cites a paper via a URL: http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf Unfortunately when I go to this URL I get: L'accès à ce

Re: Stopwords

2010-03-17 Thread Grant Ingersoll
On Mar 16, 2010, at 9:51 PM, blargy wrote: I was reading Scaling Lucen and Solr (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) and I came across the section StopWords. In there it mentioned that its not recommended to remove stop

<    1   2   3   4   >