If you will have numerous documents, splitting documents into shard is a
strategy. This split is independent of lingo of document.

For documents with different languages, its necessary to use language
specific analyzers to obtain good search results. For example, assume you
have english language documents, its _text_ field should ideally be
text_en;  likewise, for Chinese/Japanese/Korean type documents, its fields'
fieldType should be text_cjk. If you mix documents of different language
type in same shard, then you will have to define multiple fieldTypes for
each language of document and also at query time manage, need to ensure to
query on respective fields.

There are different strategies that can be applied to have multilingual
search, slide 19 in this ppt explains them
http://www.slideshare.net/treygrainger/semantic-multilingual-strategies-in-lucenesolr
and
there's another article here  based on the assumption that we know the
language of the incoming document and the language in which the query could
be
https://support.lucidworks.com/hc/en-us/articles/203718886-How-to-implement-Multilingual-Search-using-Solr





On Mon, Oct 10, 2016 at 8:08 AM, Customer <mailinglists...@gmail.com> wrote:

> Hi,
>
>
> I'm started working on the project which will likely have lots of
> documents in every single language and because of that I'm a bit worried
> storing everything into one single shard. What would be the best way for
> data store, any advices how I should split my data ? I was thinking about
> going for alphabet (make a shard for every single alphabet letter, but
> knowing fact that there will be lots of languages - not only English, this
> is not an option).
>
> Thank you for your advicesin advance.
>

Reply via email to