memory issue in ExcelExtractor
------------------------------
Key: TIKA-211
URL: https://issues.apache.org/jira/browse/TIKA-211
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.3
Reporter: Daan de Wit
The excel extractor consumes lots and lots of memory when given an excel file
containing a lot of numeric cells. I tested using a simple sheet containing 254
columns and 5511 rows resulting in an 8MB big file, this blowed with an OOME
when given 512MB.
The memory issue is caused by the java NumberFormat that is instantiated for
every numeric cell. A solution would be to cache the NumberFormat instance in
the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might be
necessary to pool it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.