Hans,
I'm sorry for my delay. There was a bug found in setting the global max
in POI, which may require us to wait for the next release, but I _think_
you should be ok with this:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
<parser class="org.apache.tika.parser.microsoft.OfficeParser">
<params>
<param name="byteArrayMaxOverride" type="int">2000000</param>
</params>
</parser>
</parsers>
</properties>
On Tue, Jan 21, 2020 at 3:44 PM <[email protected]> wrote:
> Hi
>
> Still stuck on this issue. Trying to take it up again to see if Tika can
> be an option.
>
>
>
> I still get the error message although i have tika-server 1.23 and python
> tika 1.23.
>
>
>
> The call to tika using file in the python code is
> parser.from_file(filename).
>
>
>
> I have tried setting the ByteMaxOverride using a tika config file:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
>
>
> <properties>
>
> <parsers>
>
> <parser
> class="org.apache.tika.parser.microsoft.OfficeParserConfig">
>
> <params>
>
> <param name="ByteArrayMaxOverride"
> type="int">2048000</param>
>
> </params>
>
> </parser>
>
> </parsers>
>
> </properties>
>
>
>
> But no luck in that the error message is not there anymore. It seems like
> all the content is parsed though but i would appreciate to not get the
> warning message:
>
>
>
> WARN Ignoring unexpected exception while parsing summary entry
> DocumentSummaryInformation
>
> org.apache.poi.util.RecordFormatException: Tried to allocate an array of
> length 1186960, but 100000 is the maximum for this record type.
>
> If the file is not corrupt, please open an issue on bugzilla to request
>
> increasing the maximum allowable size for this record type.
>
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
>
> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:591)
>
>
>
> Any hints on how to get rid of it?
>
> Everything is 1.23 version and i am using the python library.
>
>
>
>
>
> Really appreciate any hints!
>
>
>
> Kind regards
>
> Hans
>
>
>
> *Från:* Tim Allison <[email protected]>
> *Skickat:* den 18 december 2019 14:52
> *Till:* [email protected]
> *Kopia:* [email protected]
> *Ämne:* Re: 100000 is the maximum for this record type
>
>
>
> SummaryInformation parsing can be buggy so we catch pretty much everything
> there and parse the rest of the document.
>
>
>
> As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the
> OfficeParserConfig if you're calling Tika programmatically or via
> tika-config.xml.
>
>
>
> On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected]>
> wrote:
>
> Tika version 1.23:
> When trying to parse a larger excel file, size in bytes: 10038272, this
> error occurs:
> WARN Ignoring unexpected exception while parsing summary entry
> DocumentSummaryInformation
> org.apache.poi.util.RecordFormatException: Tried to allocate an array of
> length 1186960, but 100000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
>
> However, it seems like all text gets extracted etc. but still get the
> warning message.
>
> Any way to analyze more why the warning text is still coming if the content
> get extracted from the excel spread sheet.
>
>
>
>
> --
> Sent from: http://apache-tika-users.1629097.n2.nabble.com/
>
>