Hi,
Are you using tika-server ? If yes and you can submit the data using a
multipart/form-data payload then it may help, CXF (used by tika-server)
should do the best effort at saving the multipart payloads to the temp
locations on the disk, and thus minimize the memory requirements

Cheers, Sergey


On Thu, Nov 14, 2019 at 10:21 AM Ribeaud, Christian (Ext) <
christian.ribe...@novartis.com> wrote:

> Hi,
>
> My application handles all kind of documents (mainly PDFs). In a very few
> cases, you might expect huge PDFs (< 500MB).
>
> By around 400MB I am hitting the wall, parsing takes ages (although quite
> fast at the beginning). I've tried several ideas but none of them brought
> the desired amelioration.
>
> I have the impression that memory plays a role. I have no more than 3GB
> (and I think this should be enough as we are streaming the document and
> using event based XML parser).
>
> Are they things I should be aware of?
>
> Any hint would be very welcome. Thanks and have a nice day,
>
> christian
>
>

Reply via email to