Re: Multi-language solr1.3 what would you reckon?

Hannes Carl Meyer Wed, 15 Oct 2008 05:46:03 -0700

Hi,

yes, if you don't handle (stopwords, stemming etc.) a specific language you
should create a general core.


In my project I'm supporting 10 languages and if I get unsupported languages
it is going to be logged and discarded right away!

Boosting on multiple cores is indeed a problem. An idea would be to merge
the result sets from core0 and core1 and sort by scoring?

Regards

Hannes

On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr <[EMAIL PROTECTED]> wrote:

>
>
> ok MultiCore is handy indeed to don't have this big index wich manage every
> language,
> but when you have one modification to do you have to do it on all of them.
>
> And the point as well is it's complicate too boost more one language than
> another one,
> ie with an Italian search video, if we don't have that much video then it
> might be more interesting to bring back english one.
>
> And if there is some language like Slovakia which are not managed by the
> website but people can come from there ... so the video will be stored in
> core0 which will be all language which are not english, spanish, germany ..
> french.
> so this kind of garbage core for every language which are not managed ...
> and I think it might be hard to manage.
>
> What do you think?
>
>
>
> Hannes Carl Meyer-2 wrote:
> >
> > I attached an example for you.
> >
> > The challenge with MultiCore is on the client's search logic. It would
> > help
> > if you know which language the person wants to search through. If not you
> > would have to perform multiple requests to the multiple cores. Ordinary
> > logic would be:
> >
> > 1. search "chien" in core0 (english)
> > 2. if #1 returned zero results search for "chien" in core1 (french)
> >
> > ---
> >
> > In your client you could even parallelize the requests to minimize
> waiting
> > time.
> >
> > *One feature I didn't try yet is the DistributedSearch (and how it will
> > help
> > with multiple cores)*, find it here:
> > http://wiki.apache.org/solr/DistributedSearch
> >
> > Regards,
> >
> > Hannes
> >
> > On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> Thanks for this explanation, but just to get it properly :
> >>
> >> One core per language, so with the same field and schema just the
> >> language
> >> part and management which is different?
> >> and one core which consider every language which are not managed by solr
> >> like russian or ???
> >> so different request to the dabase....
> >> ok
> >>
> >> Just don't get really when you look for the word 'chien' on the english
> >> website I want get back result from french video because chien is french
> >> so
> >> if it doesn't find any english video with chien I need my french video
> >> then.
> >>
> >> Exactly the same for user's core, if somebody look for 'chien' and there
> >> is
> >> one user with exactly the same username I would like to show it up.
> >>
> >> thanks for your time, really,
> >>
> >>
> >>
> >> John E. McBride wrote:
> >> >
> >> > Fairly nebulous requirements, but I recently was involved in a
> >> > multilingual search platform.
> >> >
> >> > The approach, translated to solr 1.3 would be to use multicore - one
> >> > core per geography.  Then a schema.xml per core, each with a different
> >> > language in the porter algorithm, stopwords etc - taken from snowball.
> >> >
> >> > Then on the german front end you make requests to the de core, on the
> >> > english front end make requests to the english core.
> >> >
> >> > This is much simpler than sorting every language in the one index, for
> >> > example german queries will need to be run through the german query
> >> > filters etc.  If you have all languages in one schema, then you will
> >> > have to do some front end logic to map the query to the correct field.
> >> >
> >> > You have failed to consider internationalisation of the query side of
> >> > the process - your field type merely have analysis filters.
> >> >
> >> > Additionally, if the data source for each different geography is
> >> > different it makes sense to separate the indexes and subsequently the
> >> > ingestion mechanisms and schedules.
> >> >
> >> > Just a few thoughts.
> >> >
> >> > John
> >> >
> >> > sunnyfr wrote:
> >> >> Hi,
> >> >>
> >> >> I would like to manage properly multi language search motor,
> >> >> I would like your advice about what have I done.
> >> >>
> >> >> Solr1.3
> >> >> tomcat55
> >> >>
> >> >> http://www.nabble.com/file/p19954805/schema.xml schema.xml
> >> >>
> >> >> Thanks a lot,
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> > Solr1.3 MultiCore Scenario
> >
> > core0 (french)                core1 (english)         ...
> core8 (russian)
> > |schema.xml                   schema.xml
>      schema.xml
> > |- analyzers          |- analyzers                            |-
> analyzers
> > |-- FrenchAnalyzer    |-- EnglishAnalyzer                     |--
> RussianAnalyzer
> > |-- FrenchStops               |-- EnglishStops                        |--
> RussianStops
> > |- fields                     |- fields
>     |- fields
> > |-- title                     |-- title
>     |-- title
> > |-- description               |-- description                         |--
> description
> > |-- id                                |-- id
>              |-- id
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19991949.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Multi-language solr1.3 what would you reckon?

Reply via email to