[ https://issues.apache.org/jira/browse/TIKA-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284419#comment-17284419 ]
Luís Filipe Nassif edited comment on TIKA-3300 at 2/18/21, 1:12 PM: -------------------------------------------------------------------- I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr many files simultaneously). That gave me about 2x-2.5x overall speed up. But if the client app is monothreaded, I would use the default value, so tesseract will use multiple threads to OCR each submitted file. Maybe just tika-server should set this = 1 by default? was (Author: lfcnassif): I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr many files simultaneously). That gave me about 2x-2.5x overall speed up. But if the client app is monothreaded, I would use the default value, so tesseract will use multiple threads to OCR each submitted file. Maybe tika-server and tika-app should set this? > Figure out if we can improve tesseract parallelization > ------------------------------------------------------- > > Key: TIKA-3300 > URL: https://issues.apache.org/jira/browse/TIKA-3300 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > https://github.com/tesseract-ocr/tesseract/issues/2609 > https://twitter.com/jbaiter_/status/1360266497864704008?s=20 > Not sure if this affects us? h/t [~jbaiter] -- This message was sent by Atlassian Jira (v8.3.4#803005)