Team, Any recommendations/inputs for the question on language detection?
From: Neha Kamat Sent: Thursday, February 15, 2024 3:19 PM To: Tika User <[email protected]> Subject: Language detection in TIKA Hi Team, I am using TIKA 2.9.1 for language detection in my application. I have few excel files (containing mix of numbers and small words) whose content is extracted using TIKA and same extracted content is processed further to decide the primary language associated with the document. Surprisingly, language is detected as German through TIKA engine but with Google, detected language is English. Is there any setting/configuration available with TIKA which needs to be tweaked here to get accurate results. I tried changing min. confidence for language detection from 0.9 to 0.99 but getting same result. Any pointers/help is very much appreciated. Thanks, Neha
