[
https://issues.apache.org/jira/browse/TIKA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-211.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.4
Assignee: Jukka Zitting
Thanks! Fixed in revision 757719.
PS. We don't need to worry about thread-safety as long as the NumberFormat
instances are local to the parse() method, which is how I implemented this for
now.
> memory issue in ExcelExtractor
> ------------------------------
>
> Key: TIKA-211
> URL: https://issues.apache.org/jira/browse/TIKA-211
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.3
> Reporter: Daan de Wit
> Assignee: Jukka Zitting
> Fix For: 0.4
>
>
> The excel extractor consumes lots and lots of memory when given an excel file
> containing a lot of numeric cells. I tested using a simple sheet containing
> 254 columns and 5511 rows resulting in an 8MB big file, this blowed with an
> OOME when given 512MB.
> The memory issue is caused by the java NumberFormat that is instantiated for
> every numeric cell. A solution would be to cache the NumberFormat instance in
> the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might
> be necessary to pool it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.