Synonyms are also domain specific. A synonym set for one area may be completely
wrong in another.
In cooking, arugula and rocket are the same thing. In military or aerospace,
missile and rocket are very similar.
I would start with librarians. They maintain controlled vocabularies (called
Hi, I’m trying to find a synonym list for any of the following languages:
Catalan, Farsi, Hindi, Korean, Latvian, Dutch, Romanian, Thai, and Turkish
Does anyone know of resources where I can get a synonym list for these
languages?
-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Shamik,
I don't have an answer for you, just a couple of comments.
Why not use dynamic field definitions in the schema? As you say most of
your fields are not analysed you just add a language tag _en, _fr, _de,
...) to the field when you index or query. Then you can add languages as
you need
Hi,
I'm trying to implement multiple language support in Solr Cloud (4.7).
Although we've different languages in index, we were only supporting
english in terms of index and query. To provide some context, our current
index size is 35 GB with close to 15 million documents. We've two shards
November 2013 12:50
To: solr-user@lucene.apache.org
Subject: eDisMax, multiple language support and stopwords
Hi all,
Thanks for the help and advice I've got here so far!
Another question - I want to support stopwords at search time, so that
e.g.
the query oscar and wilde
Hi all,
Thanks for the help and advice I've got here so far!
Another question - I want to support stopwords at search time, so that e.g.
the query oscar and wilde is equivalent to oscar wilde (this is with
lowercaseOperators=false). Fair enough, I have stopword and in the query
analyser chain.
-Minimum-Match-Stopwords-Bug-td493483.html
https://issues.apache.org/jira/browse/SOLR-3085
-Original message-
From:Tom Mortimer tom.m.f...@gmail.com
Sent: Thursday 7th November 2013 12:50
To: solr-user@lucene.apache.org
Subject: eDisMax, multiple language support and stopwords
Hi all
-
From:Tom Mortimer tom.m.f...@gmail.com
Sent: Thursday 7th November 2013 12:50
To: solr-user@lucene.apache.org
Subject: eDisMax, multiple language support and stopwords
Hi all,
Thanks for the help and advice I've got here so far!
Another question - I want to support stopwords
-user=t
Subject: Re: copyField at search time / multi-language support
To: [hidden
email]http://user/SendEmail.jtp?type=nodenode=2747011i=2by-user=t
Cc: Andy [hidden
email]http://user/SendEmail.jtp?type=nodenode=2747011i=3by-user=t
Date: Tuesday, March 29, 2011, 1:29 AM
https
This may not be all that helpful, but have you looked at edismax?
https://issues.apache.org/jira/browse/SOLR-1553
It allows the full Solr query syntax while preserving the goodness of
dismax.
This is standard equipment on 3.1, which is being released even as we
speak, and I also know it's being
Hi,
Here's my problem: I'm indexing a corpus with text in a variety of
languages. I'm planning to detect these at index time and send the
text to one of a suitably-configured field (e.g. mytext_de for
German, mytext_cjk for Chinese/Japanese/Korean etc.)
At search time I want to search all of
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer t...@flax.co.uk wrote:
Hi,
Here's my problem: I'm indexing a corpus with text in a variety of
languages. I'm planning to detect these at index time and send the
text to one of a suitably-configured field (e.g. mytext_de for
German, mytext_cjk for
Tom,
Could you share the method you use to perform language detection? Any open
source tools that do that?
Thanks.
--- On Mon, 3/28/11, Tom Mortimer t...@flax.co.uk wrote:
From: Tom Mortimer t...@flax.co.uk
Subject: copyField at search time / multi-language support
To: solr-user
at search time / multi-language support
To: solr-user@lucene.apache.org
Date: Monday, March 28, 2011, 4:45 AM
Hi,
Here's my problem: I'm indexing a corpus with text in a
variety of
languages. I'm planning to detect these at index time and
send the
text to one of a suitably-configured
Thanks Markus.
Do you know if this patch is good enough for production use? Thanks.
Andy
--- On Tue, 3/29/11, Markus Jelsma markus.jel...@openindex.io wrote:
From: Markus Jelsma markus.jel...@openindex.io
Subject: Re: copyField at search time / multi-language support
To: solr-user
:
From: Markus Jelsma markus.jel...@openindex.io
Subject: Re: copyField at search time / multi-language support
To: solr-user@lucene.apache.org
Cc: Andy angelf...@yahoo.com
Date: Tuesday, March 29, 2011, 1:29 AM
https://issues.apache.org/jira/browse/SOLR-1979
Tom,
Could you
this message in context:
http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html
Sent from the Solr - User mailing list archive at Nabble.com.
.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html
Sent from the Solr - User mailing list archive at Nabble.com.
This is the solr schema:
--
View this message in context:
http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636065.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I want to setup an solr with support for several languages.
The language list includes slovene, unfortunately I found nothing about it in
the wiki.
Has some one experiences with solr 1.4 and slovene?
thanks for help
Markus
Hello,
There is some information here (prototype stemmer) about support in
snowball.
But Martin Porter had some unanswered questions/reservations so nothing ever
got added to snowball:
http://snowball.tartarus.org/archives/snowball-discuss/0725.html
In IRC trying to help someone find Polish-language support for Solr.
Seems lucene has nothing to offer? Found one stemmer that looks to be
compatibly licensed in case someone wants to take a shot at
incorporating it: http://www.getopt.org/stempel/
-Peter
--
Peter M. Wolanin, Ph.D.
Momentum
...@acquia.comwrote:
In IRC trying to help someone find Polish-language support for Solr.
Seems lucene has nothing to offer? Found one stemmer that looks to be
compatibly licensed in case someone wants to take a shot at
incorporating it: http://www.getopt.org/stempel/
-Peter
--
Peter M. Wolanin
Hi Robert,
Thanks for reply.
As you write, I used textgen but still not able to search hindi text.
Might be missing some important configuration.
following is my schema.xml configuration
fieldType name=textgen class=solr.TextField
positionIncrementGap=100
analyzer type=index
Hi all,
I am very new in solr.
I download latest release 1.4 and install. For Indexing and Searching I am
using SolrJ api.
My Question is How to enable solr to search hindi language text ?.
Please Help me..
thanks
with regards
Ranveer K Kumar
hello, take a look at field type textgen (a general unstemmed text field)
the whitespacetokenizer + worddelimiterfilter used by this type will
work correctly for hindi tokenization and punctuation.
On Thu, Jan 21, 2010 at 10:55 AM, Ranveer kumar
ranveer.k.ku...@gmail.com wrote:
Hi all,
I am
right, but we should not encourage users to significantly degrade
overall relevance for all movies due to a few movies and a band (very
special cases, as I said).
In english, by not using stopwords, it doesn't really degrade
relevance that much, so its a reasonable decision to make. This is not
Isn't the conclusion here that some stopword and stemming free
matching should be the best match if ever and to then gently degrade
to weaker forms of matching?
paul
Le 13-janv.-10 à 07:08, Walter Underwood a écrit :
There is a band named The The. And a producer named Don Was. For
a
Robert Muir: Thank you for the pointer to that paper!
On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote:
Isn't the conclusion here that some stopword and stemming free matching
should be the best match if ever and to then gently degrade to weaker forms
of matching?
There are a lot of projects that don't use stopwords any more. You
might consider dropping them altogether.
On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote:
This is the way I've implemented multilingual search as well.
2010/1/11 Markus Jelsma mar...@buyways.nl
Hello,
I don't think this is something to consider across the board for all
languages. The same grammatical units that are part of a word in one
language (and removed by stemmers) are independent morphemes in others
(and should be stopwords)
so please take this advice on a case-by-case basis for each
sorry, i forgot to include this 2009 paper comparing what stopwords do
across 3 languages:
http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf
in my opinion, if stopwords annoy your users for very special cases
like
There is a band named The The. And a producer named Don Was. For a list of
all-stopword movie titles at Netflix, see this post:
http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html
My favorite is To Be and To Have (Être et Avoir), which is all stopwords in
two languages.
Hi Solr users.
I'm trying to set up a site with Solr search integrated. And I use the
SolJava API to feed the index with search documents. At the moment I
have only activated search on the English portion of the site. I'm
interested in using as many features of solr as possible. Synonyms,
Hello,
We have implemented language specific search in Solr using language
specific fields and field types. For instance, an en_text field type can
use an English stemmer, and list of stopwords and synonyms. We, however
did not use specific stopwords, instead we used one list shared by both
This is the way I've implemented multilingual search as well.
2010/1/11 Markus Jelsma mar...@buyways.nl
Hello,
We have implemented language specific search in Solr using language
specific fields and field types. For instance, an en_text field type can
use an English stemmer, and list of
On Apr 9, 2009, at 7:09 AM, revas wrote:
Hi,
To reframe my earlier question
Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?
Some languages only have the stemmer from snowball but no analyzer?
Some have both.
Hi,
To reframe my earlier question
Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?
Some languages only have the stemmer from snowball but no analyzer?
Some have both.
Can we say then that solr supports all the
: Monday, December 29, 2008 4:52:19 AM
Subject: Multiple language support
Hi All,
I have a multiple language supporting schema in which there is a separate
field
for every language.
I have a field product_name to store product name and its description that
can
be in any user preferred
@lucene.apache.org
Objet : Language support
This has probably been asked before, but I'm having trouble finding
it. Basically, we want to be able to search for content across several
languages, given that we know what language a datum and a query are
in. Is there an obvious way to do this?
Here's
@lucene.apache.org
Objet : Language support
This has probably been asked before, but I'm having trouble finding
it. Basically, we want to be able to search for content across several
languages, given that we know what language a datum and a query are
in. Is there an obvious way to do this?
Here's the longer
]
Envoyé : mercredi 19 mars 2008 20:07
À : solr-user@lucene.apache.org
Objet : Language support
This has probably been asked before, but I'm having trouble finding
it. Basically, we want to be able to search for content across several
languages, given that we know what language a datum
people solving the problem of
searching over multiple languages? What is the canonical way to do
this?
Nicolas
-Message d'origine-
De : David King [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 19 mars 2008 20:07
À : solr-user@lucene.apache.org
Objet : Language support
This has probably
]
Envoyé : mercredi 19 mars 2008 20:07
À : solr-user@lucene.apache.org
Objet : Language support
This has probably been asked before, but I'm having trouble finding
it. Basically, we want to be able to search for content across
several
languages, given that we know what language a datum
Token/by/token seems a bit extreme. Are you concerned with macaronic
documents?
On Thu, Mar 20, 2008 at 12:42 PM, Walter Underwood [EMAIL PROTECTED]
wrote:
Nice list.
You may still need to mark the language of each document. There are
plenty of cross-language collisions: die and boot have
Extreme, but guaranteed to work and it avoids bad IDF when there are
inter-language collisions. In Ultraseek, we only stored the hash, so
the size of the source token didn't matter.
Trademarks are a bad source of collisions and anomalous IDF. If you have
LaserJet support docs in 20 languages, the
Oh, Walter! Hello! I thought that name was familiar. Greetings from Basis.
All that makes sense.
On Thu, Mar 20, 2008 at 1:00 PM, Walter Underwood [EMAIL PROTECTED]
wrote:
Extreme, but guaranteed to work and it avoids bad IDF when there are
inter-language collisions. In Ultraseek, we only
This has probably been asked before, but I'm having trouble finding
it. Basically, we want to be able to search for content across several
languages, given that we know what language a datum and a query are
in. Is there an obvious way to do this?
Here's the longer version: I am trying to
49 matches
Mail list logo