Well, it's this section shown below, which would change from geography
to geography.
Parameterise the EnglishPorterFilterFactory and protwords.
You could introduce logic in the front end which asks if num results is
zero then makes a call to the english language, but it doesn't make
logical sense? why would a search in the italian language bring up
anything in the english index?
I think you need to explain your application in a little more detail.
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
-
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
-
<!--
in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
-
<!--
Case insensitive stop word removal.
enablePositionIncrements=true ensures that a 'gap' is left to
allow for accurate phrase queries.
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
-
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
sunnyfr wrote:
Hi,
Thanks guys for your answer, but I don't think I can use multi-core for each
language,
because for exemple if somebody is connected from Italia and if there is not
that much Italian's book,
so by default I will show up few italian books but all the english one as
well.
Do you have an example ?
I'm quite lost about it,
John E. McBride wrote:
Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.
The approach, translated to solr 1.3 would be to use multicore - one
core per geography. Then a schema.xml per core, each with a different
language in the porter algorithm, stopwords etc - taken from snowball.
Then on the german front end you make requests to the de core, on the
english front end make requests to the english core.
This is much simpler than sorting every language in the one index, for
example german queries will need to be run through the german query
filters etc. If you have all languages in one schema, then you will
have to do some front end logic to map the query to the correct field.
You have failed to consider internationalisation of the query side of
the process - your field type merely have analysis filters.
Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently the
ingestion mechanisms and schedules.
Just a few thoughts.
John
sunnyfr wrote:
Hi,
I would like to manage properly multi language search motor,
I would like your advice about what have I done.
Solr1.3
tomcat55
http://www.nabble.com/file/p19954805/schema.xml schema.xml
Thanks a lot,