Team,

Any recommendations/inputs for the question on language detection?

From: Neha Kamat
Sent: Thursday, February 15, 2024 3:19 PM
To: Tika User <[email protected]>
Subject: Language detection in TIKA

Hi Team,

I am using TIKA 2.9.1 for language detection in my application. I have few 
excel files (containing mix of numbers and small words) whose content is 
extracted using TIKA and same extracted content is processed further to decide 
the primary language associated with the document. Surprisingly,  language is 
detected as German through TIKA engine but with Google, detected language is 
English. Is there any setting/configuration available with TIKA which needs to 
be tweaked here to get accurate results. I tried changing min. confidence for 
language detection from 0.9 to 0.99 but getting same result. Any pointers/help 
is very much appreciated.

Thanks,
Neha

Reply via email to