That would be a more practical alternative. I have time scheduled next week for 
an in-house solution but I will first look properly at ForkParser and see if I 
could make something akin to that in generic and configurable fashion. If so, I 
will submit the code.

Jim 

> -----Original Message-----
> From: Allison, Timothy B. [mailto:[email protected]]
> Sent: Wednesday, November 29, 2017 23:52
> To: [email protected]
> Subject: RE: Very slow parsing of a few PDF files
> 
> >I am going to have to write my own application specific solution
> 
> Ugh.  I'm sorry.  If there's anything shareable, please do share.
> 
> > ForkParser tries to serialize every class it things will be needed across 
> > the
> connection and a lot of third party classes are not serializable. I think that
> ForkParser is a good enough idea but I am not sure how practical it is in a
> real-life application.
> 
> You make a very good point.  We've had issues serializing our own
> parsers...let alone user-specific addons.  I wonder if we could modify
> ForkClient to kick off the forkserver process from a user-specified "bin"
> directory (instead of the current bootstrapped jar), and that bin directory
> could include at least the tika-core.jar, tika-fat-parsers.jar and tika-
> serialization.jar but could also include optional dependencies and user-
> specific dependencies.
> 
> Hmmm....

Reply via email to