Re: copyField at search time / multi-language support

2011-03-29 Thread lboutros
-user=t Subject: Re: copyField at search time / multi-language support To: [hidden email]http://user/SendEmail.jtp?type=nodenode=2747011i=2by-user=t Cc: Andy [hidden email]http://user/SendEmail.jtp?type=nodenode=2747011i=3by-user=t Date: Tuesday, March 29, 2011, 1:29 AM https

Re: copyField at search time / multi-language support

2011-03-29 Thread Erick Erickson
This may not be all that helpful, but have you looked at edismax? https://issues.apache.org/jira/browse/SOLR-1553 It allows the full Solr query syntax while preserving the goodness of dismax. This is standard equipment on 3.1, which is being released even as we speak, and I also know it's being

copyField at search time / multi-language support

2011-03-28 Thread Tom Mortimer
Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured field (e.g. mytext_de for German, mytext_cjk for Chinese/Japanese/Korean etc.) At search time I want to search all of

Re: copyField at search time / multi-language support

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer t...@flax.co.uk wrote: Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured field (e.g. mytext_de for German, mytext_cjk for

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Tom, Could you share the method you use to perform language detection? Any open source tools that do that? Thanks. --- On Mon, 3/28/11, Tom Mortimer t...@flax.co.uk wrote: From: Tom Mortimer t...@flax.co.uk Subject: copyField at search time / multi-language support To: solr-user

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
at search time / multi-language support To: solr-user@lucene.apache.org Date: Monday, March 28, 2011, 4:45 AM Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Thanks Markus. Do you know if this patch is good enough for production use? Thanks. Andy --- On Tue, 3/29/11, Markus Jelsma markus.jel...@openindex.io wrote: From: Markus Jelsma markus.jel...@openindex.io Subject: Re: copyField at search time / multi-language support To: solr-user

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
: From: Markus Jelsma markus.jel...@openindex.io Subject: Re: copyField at search time / multi-language support To: solr-user@lucene.apache.org Cc: Andy angelf...@yahoo.com Date: Tuesday, March 29, 2011, 1:29 AM https://issues.apache.org/jira/browse/SOLR-1979 Tom, Could you

Re: Help on Multi-language support

2011-03-06 Thread Jan Høydahl
this message in context: http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html Sent from the Solr - User mailing list archive at Nabble.com.

Help on Multi-language support

2011-03-04 Thread cyang2010
.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on Multi-language support

2011-03-04 Thread cyang2010
This is the solr schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636065.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi language support

2010-01-13 Thread Robert Muir
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not

Re: Multi language support

2010-01-13 Thread Paul Libbrecht
Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named The The. And a producer named Don Was. For a

Re: Multi language support

2010-01-13 Thread Lance Norskog
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote: Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to  weaker forms of matching?

Re: Multi language support

2010-01-12 Thread Lance Norskog
There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello,

Re: Multi language support

2010-01-12 Thread Robert Muir
I don't think this is something to consider across the board for all languages. The same grammatical units that are part of a word in one language (and removed by stemmers) are independent morphemes in others (and should be stopwords) so please take this advice on a case-by-case basis for each

Re: Multi language support

2010-01-12 Thread Robert Muir
sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like

Re: Multi language support

2010-01-12 Thread Walter Underwood
There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages.

Multi language support

2010-01-11 Thread Daniel Persson
Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms,

Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both

Re: Multi language support

2010-01-11 Thread Don Werve
This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of

Re: Multi-language support

2009-04-14 Thread Grant Ingersoll
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both.

Multi-language support

2009-04-09 Thread revas
Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the