[
https://issues.apache.org/jira/browse/TIKA-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-321.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.6
Assignee: Jukka Zitting
I've made a number of optimizations to the type detection code and as a result
it's already over an order of magnitude faster than before. I believe there's
*still* an order of magnitude of improvement available (check most common types
first, short-circuit matching to only subtypes of already detected types,
etc.), but already now I've reached the performance goals I had so I'll mark
this as resolved for Tika 0.6. We can follow up with another issue in case
anyone has more strict performance requirements.
> Optimize type detection speed
> -----------------------------
>
> Key: TIKA-321
> URL: https://issues.apache.org/jira/browse/TIKA-321
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 0.6
>
>
> It would be good to do some simple benchmarks on the type detection code
> (Tika.detect) to see if there are obvious performance optimizations we could
> make. There are some use cases like attaching file type information directory
> listings where type detection speed is important and not necessarily dwarfed
> by IO waits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.