Re: Language support

2016-08-23 Thread Walter Underwood
Synonyms are also domain specific. A synonym set for one area may be completely wrong in another. In cooking, arugula and rocket are the same thing. In military or aerospace, missile and rocket are very similar. I would start with librarians. They maintain controlled vocabularies (called

Language support

2016-08-23 Thread Bradley Belyeu
Hi, I’m trying to find a synonym list for any of the following languages: Catalan, Farsi, Hindi, Korean, Latvian, Dutch, Romanian, Thai, and Turkish Does anyone know of resources where I can get a synonym list for these languages?

Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-05 Thread shamik
-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-02 Thread Nicole Lacoste
Hi Shamik, I don't have an answer for you, just a couple of comments. Why not use dynamic field definitions in the schema? As you say most of your fields are not analysed you just add a language tag _en, _fr, _de, ...) to the field when you index or query. Then you can add languages as you need

What are the best practices on Multiple Language support in Solr Cloud ?

2014-04-30 Thread Shamik Bandopadhyay
Hi, I'm trying to implement multiple language support in Solr Cloud (4.7). Although we've different languages in index, we were only supporting english in terms of index and query. To provide some context, our current index size is 35 GB with close to 15 million documents. We've two shards

Re: eDisMax, multiple language support and stopwords

2013-11-11 Thread Liu Bo
November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query oscar and wilde

eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query oscar and wilde is equivalent to oscar wilde (this is with lowercaseOperators=false). Fair enough, I have stopword and in the query analyser chain.

RE: eDisMax, multiple language support and stopwords

2013-11-07 Thread Markus Jelsma
-Minimum-Match-Stopwords-Bug-td493483.html https://issues.apache.org/jira/browse/SOLR-3085 -Original message- From:Tom Mortimer tom.m.f...@gmail.com Sent: Thursday 7th November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all

Re: eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
- From:Tom Mortimer tom.m.f...@gmail.com Sent: Thursday 7th November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords

Re: copyField at search time / multi-language support

2011-03-29 Thread lboutros
-user=t Subject: Re: copyField at search time / multi-language support To: [hidden email]http://user/SendEmail.jtp?type=nodenode=2747011i=2by-user=t Cc: Andy [hidden email]http://user/SendEmail.jtp?type=nodenode=2747011i=3by-user=t Date: Tuesday, March 29, 2011, 1:29 AM https

Re: copyField at search time / multi-language support

2011-03-29 Thread Erick Erickson
This may not be all that helpful, but have you looked at edismax? https://issues.apache.org/jira/browse/SOLR-1553 It allows the full Solr query syntax while preserving the goodness of dismax. This is standard equipment on 3.1, which is being released even as we speak, and I also know it's being

copyField at search time / multi-language support

2011-03-28 Thread Tom Mortimer
Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured field (e.g. mytext_de for German, mytext_cjk for Chinese/Japanese/Korean etc.) At search time I want to search all of

Re: copyField at search time / multi-language support

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer t...@flax.co.uk wrote: Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured field (e.g. mytext_de for German, mytext_cjk for

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Tom, Could you share the method you use to perform language detection? Any open source tools that do that? Thanks. --- On Mon, 3/28/11, Tom Mortimer t...@flax.co.uk wrote: From: Tom Mortimer t...@flax.co.uk Subject: copyField at search time / multi-language support To: solr-user

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
at search time / multi-language support To: solr-user@lucene.apache.org Date: Monday, March 28, 2011, 4:45 AM Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Thanks Markus. Do you know if this patch is good enough for production use? Thanks. Andy --- On Tue, 3/29/11, Markus Jelsma markus.jel...@openindex.io wrote: From: Markus Jelsma markus.jel...@openindex.io Subject: Re: copyField at search time / multi-language support To: solr-user

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
: From: Markus Jelsma markus.jel...@openindex.io Subject: Re: copyField at search time / multi-language support To: solr-user@lucene.apache.org Cc: Andy angelf...@yahoo.com Date: Tuesday, March 29, 2011, 1:29 AM https://issues.apache.org/jira/browse/SOLR-1979 Tom, Could you

Re: Help on Multi-language support

2011-03-06 Thread Jan Høydahl
this message in context: http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html Sent from the Solr - User mailing list archive at Nabble.com.

Help on Multi-language support

2011-03-04 Thread cyang2010
.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on Multi-language support

2011-03-04 Thread cyang2010
This is the solr schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636065.html Sent from the Solr - User mailing list archive at Nabble.com.

slovene language support

2010-07-19 Thread Markus Goldbach
Hi, I want to setup an solr with support for several languages. The language list includes slovene, unfortunately I found nothing about it in the wiki. Has some one experiences with solr 1.4 and slovene? thanks for help Markus

Re: slovene language support

2010-07-19 Thread Robert Muir
Hello, There is some information here (prototype stemmer) about support in snowball. But Martin Porter had some unanswered questions/reservations so nothing ever got added to snowball: http://snowball.tartarus.org/archives/snowball-discuss/0725.html

Polish language support?

2010-07-09 Thread Peter Wolanin
In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin, Ph.D. Momentum

Re: Polish language support?

2010-07-09 Thread Robert Muir
...@acquia.comwrote: In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin

Re: Hindi language support in solr

2010-01-22 Thread Ranveer kumar
Hi Robert, Thanks for reply. As you write, I used textgen but still not able to search hindi text. Might be missing some important configuration. following is my schema.xml configuration fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index

Hindi language support in solr

2010-01-21 Thread Ranveer kumar
Hi all, I am very new in solr. I download latest release 1.4 and install. For Indexing and Searching I am using SolrJ api. My Question is How to enable solr to search hindi language text ?. Please Help me.. thanks with regards Ranveer K Kumar

Re: Hindi language support in solr

2010-01-21 Thread Robert Muir
hello, take a look at field type textgen (a general unstemmed text field) the whitespacetokenizer + worddelimiterfilter used by this type will work correctly for hindi tokenization and punctuation. On Thu, Jan 21, 2010 at 10:55 AM, Ranveer kumar ranveer.k.ku...@gmail.com wrote: Hi all, I am

Re: Multi language support

2010-01-13 Thread Robert Muir
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not

Re: Multi language support

2010-01-13 Thread Paul Libbrecht
Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named The The. And a producer named Don Was. For a

Re: Multi language support

2010-01-13 Thread Lance Norskog
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote: Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to  weaker forms of matching?

Re: Multi language support

2010-01-12 Thread Lance Norskog
There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello,

Re: Multi language support

2010-01-12 Thread Robert Muir
I don't think this is something to consider across the board for all languages. The same grammatical units that are part of a word in one language (and removed by stemmers) are independent morphemes in others (and should be stopwords) so please take this advice on a case-by-case basis for each

Re: Multi language support

2010-01-12 Thread Robert Muir
sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like

Re: Multi language support

2010-01-12 Thread Walter Underwood
There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages.

Multi language support

2010-01-11 Thread Daniel Persson
Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms,

Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both

Re: Multi language support

2010-01-11 Thread Don Werve
This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of

Re: Multi-language support

2009-04-14 Thread Grant Ingersoll
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both.

Multi-language support

2009-04-09 Thread revas
Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the

Re: Multiple language support

2008-12-29 Thread Otis Gospodnetic
: Monday, December 29, 2008 4:52:19 AM Subject: Multiple language support Hi All, I have a multiple language supporting schema in which there is a separate field for every language. I have a field product_name to store product name and its description that can be in any user preferred

RE: Language support

2008-03-20 Thread nicolas . dessaigne
@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way to do this? Here's

Re: Language support

2008-03-20 Thread David King
@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way to do this? Here's the longer

Re: Language support

2008-03-20 Thread Benson Margulies
] Envoyé : mercredi 19 mars 2008 20:07 À : solr-user@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum

Re: Language support

2008-03-20 Thread David King
people solving the problem of searching over multiple languages? What is the canonical way to do this? Nicolas -Message d'origine- De : David King [mailto:[EMAIL PROTECTED] Envoyé : mercredi 19 mars 2008 20:07 À : solr-user@lucene.apache.org Objet : Language support This has probably

Re: Language support

2008-03-20 Thread Benson Margulies
] Envoyé : mercredi 19 mars 2008 20:07 À : solr-user@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum

Re: Language support

2008-03-20 Thread Benson Margulies
Token/by/token seems a bit extreme. Are you concerned with macaronic documents? On Thu, Mar 20, 2008 at 12:42 PM, Walter Underwood [EMAIL PROTECTED] wrote: Nice list. You may still need to mark the language of each document. There are plenty of cross-language collisions: die and boot have

Re: Language support

2008-03-20 Thread Walter Underwood
Extreme, but guaranteed to work and it avoids bad IDF when there are inter-language collisions. In Ultraseek, we only stored the hash, so the size of the source token didn't matter. Trademarks are a bad source of collisions and anomalous IDF. If you have LaserJet support docs in 20 languages, the

Re: Language support

2008-03-20 Thread Benson Margulies
Oh, Walter! Hello! I thought that name was familiar. Greetings from Basis. All that makes sense. On Thu, Mar 20, 2008 at 1:00 PM, Walter Underwood [EMAIL PROTECTED] wrote: Extreme, but guaranteed to work and it avoids bad IDF when there are inter-language collisions. In Ultraseek, we only

Language support

2008-03-19 Thread David King
This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way to do this? Here's the longer version: I am trying to