[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-2520. ------------------------------------- Resolution: Fixed Fix Version/s: 1.19 {noformat} nonas:tika2.0.0 mattmann$ git push -u origin branch_1x Counting objects: 14, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (14/14), 1.72 KiB | 252.00 KiB/s, done. Total 14 (delta 4), reused 0 (delta 0) remote: Resolving deltas: 100% (4/4), completed with 4 local objects. To github.com:/apache/tika.git cdca0f726..7e3e34caf branch_1x -> branch_1x Branch 'branch_1x' set up to track remote branch 'branch_1x' from 'origin'. nonas:tika2.0.0 mattmann${noformat} > OptimaizeLangDetector#loadModels() should not be called for every single > langdetect HTTP request > ------------------------------------------------------------------------------------------------ > > Key: TIKA-2520 > URL: https://issues.apache.org/jira/browse/TIKA-2520 > Project: Tika > Issue Type: Improvement > Components: server > Affects Versions: 1.16 > Reporter: Vincent van Donselaar > Assignee: Chris A. Mattmann > Priority: Minor > Labels: performance > Fix For: 1.19 > > Original Estimate: 2h > Remaining Estimate: 2h > > Tika REST server's `/language` resource invokes the relatively heavy > `loadModels` operation for every language detect call: > {code:title=LanguageResource.java} > public String detect(final String string) throws IOException { > LanguageResult language = new > OptimaizeLangDetector().loadModels().detect(string); > String detectedLang = language.getLanguage(); > LOG.info("Detecting language for incoming resource: [{}]", > detectedLang); > return detectedLang; > } > {code} > This could be optimized by (lazy?) loading the models only once and keep them > in memory. I assume the `LanguageDetector` is not thread safe, so I expect > this requires an ExecutorService with language detectors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)