Hi,

 

What is the correct process to only store documents in a desired language?

 

I'm currently doing this:

 

<property>
<name>http.accept.language</name>
<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the "Accept-Language" request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>

 

Using a seed.txt with URL's I know are in the language I want, but as the
crawl grows it seems I'm starting to get more and more docs in other
languages.

 

 

Thnx in advance

Reply via email to