[ 
https://issues.apache.org/jira/browse/TIKA-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284419#comment-17284419
 ] 

Luís Filipe Nassif edited comment on TIKA-3300 at 2/18/21, 1:12 PM:
--------------------------------------------------------------------

I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr 
many files simultaneously). That gave me about 2x-2.5x overall speed up. But if 
the client app is monothreaded, I would use the default value, so tesseract 
will use multiple threads to OCR each submitted file. Maybe just tika-server 
should set this = 1 by default?


was (Author: lfcnassif):
I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr 
many files simultaneously). That gave me about 2x-2.5x overall speed up. But if 
the client app is monothreaded, I would use the default value, so tesseract 
will use multiple threads to OCR each submitted file. Maybe tika-server and 
tika-app should set this?

> Figure out if we can improve tesseract parallelization 
> -------------------------------------------------------
>
>                 Key: TIKA-3300
>                 URL: https://issues.apache.org/jira/browse/TIKA-3300
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> https://github.com/tesseract-ocr/tesseract/issues/2609
> https://twitter.com/jbaiter_/status/1360266497864704008?s=20
> Not sure if this affects us? h/t [~jbaiter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to