Hi,

Looking at the code, you are right. Whitelist processing is only done on 
detected languages, not on the fallback or fallbackFields languages, since 
these are assumed to be correct. Thus you should not pass in a fallback 
language, either in the input document or with langid.fallback which cannot be 
handled by your schema.

This is by design. However, I can also see an argument for making 
fallbackFields subject to whitelist logic, especially if you do not control the 
application that populates this field, to safeguard against exception. Also, 
such a change woudl not harm any of the existing functionality, so it would be 
safe to introduce.

Feel free to write a JIRA issue for it.

A workaround could be to write a simple UpdateProcessor which removes any 
illegal value from langid.fallbackFields before the LangId processor.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. juli 2013 kl. 18:05 skrev adfel70 <adfe...@gmail.com>:

> Hi
> I'm trying to index a set of documents with solr's language detection
> component.
> I set
> <langid.fallbackFields>user_lan</langid.fallbackFields>
> <langid.whitelist>en,it</langid.whitelist>
> <langid.fallback>en</langid.fallback>
> 
> In some documents user_lan has 'sk', solr falls-back to 'sk' ,which is not
> in the whitelist, and instead of falling back to 'en' as stated  here
> <http://wiki.apache.org/solr/LanguageDetection#langid.fallbackFields>  , I
> get an excpetion regarding not having a text_sk field in the schema.
> 
> Anyone encountered this behavior?
> 
> thanks.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/lang-fallback-doesn-t-work-when-using-lang-fallbackFields-tp4076048.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to