Hi Marian, Tika also now ships with a JAX-RS REST based service, called tika-server. You can check out some (sparse) documentation for it, here:
http://wiki.apache.org/tika/TikaJAXRS If you have comments, questions, and/or patches, they are all welcome! :) Cheers, Chris On Jun 24, 2011, at 3:31 AM, Marian Steinbach wrote: > Hi! > > I have tested the Tika client for extraction of content, metadata and > language and I'm really happy with the results. > > For performance reasons when extracting larger numbers of documents I > think it would be worthwhile to avoid starting the client three times > for each document, which also includes starting the virtual machine > etc. > > I was thinking about having Tika running as a daemon and pushing > document path info to it, in order to get the metadata, content and > language as a response. > > Is there a best practice for this? Maybe a servlet/jsp solution? Does > the current Tika release include an out of the box solution for that? > > (I only found https://issues.apache.org/jira/browse/TIKA-169 on this > topic, which is pretty old and has "won't fix" status.) > > Thanks! > > Marian ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
