Hi Jason, On Jul 1, 2012, at 6:05 AM, Jason Judge wrote:
> I see, so tika-app in server mode and tika-server are not the same thing. > tika-app in server mode is just a way of providing an alternative input > stream, but offers no control through that stream over what it actually does. > > I have downloaded the tika-server and it works like a charm. Glad to hear it's working for ya! > > The one thing I can't see how to do, is how to detect the language. The > language is neither in the text nor in the metadata. Would I need to fetch > the XHTML version of the document and get the language out of the header > section? Not sure how to fetch the XHTML TBH - the documentation only covers > plain text. I don't think we added a language detection end point yet, but it's certainly something we should do. In case we don't get to it as soon as you get a chance to, feel free to contribute it back by: 1. filing an issue in our JIRA system at: https://issues.apache.org/jira/browse/TIKA to record the desire for the language detection end point 2. submitting a patch and/or working with the committers on that issue you create in #1. HTH! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
