That would be a more practical alternative. I have time scheduled next week for an in-house solution but I will first look properly at ForkParser and see if I could make something akin to that in generic and configurable fashion. If so, I will submit the code.
Jim > -----Original Message----- > From: Allison, Timothy B. [mailto:[email protected]] > Sent: Wednesday, November 29, 2017 23:52 > To: [email protected] > Subject: RE: Very slow parsing of a few PDF files > > >I am going to have to write my own application specific solution > > Ugh. I'm sorry. If there's anything shareable, please do share. > > > ForkParser tries to serialize every class it things will be needed across > > the > connection and a lot of third party classes are not serializable. I think that > ForkParser is a good enough idea but I am not sure how practical it is in a > real-life application. > > You make a very good point. We've had issues serializing our own > parsers...let alone user-specific addons. I wonder if we could modify > ForkClient to kick off the forkserver process from a user-specified "bin" > directory (instead of the current bootstrapped jar), and that bin directory > could include at least the tika-core.jar, tika-fat-parsers.jar and tika- > serialization.jar but could also include optional dependencies and user- > specific dependencies. > > Hmmm....
