Hi Jason,

On Jul 1, 2012, at 6:05 AM, Jason Judge wrote:

> I see, so tika-app in server mode and tika-server are not the same thing. 
> tika-app in server mode is just a way of providing an alternative input 
> stream, but offers no control through that stream over what it actually does.
> 
> I have downloaded the tika-server and it works like a charm.

Glad to hear it's working for ya!

> 
> The one thing I can't see how to do, is how to detect the language. The 
> language is neither in the text nor in the metadata. Would I need to fetch 
> the XHTML version of the document and get the language out of the header 
> section? Not sure how to fetch the XHTML TBH - the documentation only covers 
> plain text.

I don't think we added a language detection end point yet, but it's certainly
something we should do.

In case we don't get to it as soon as you get a chance to, feel free to 
contribute it back by:

1. filing an issue in our JIRA system at: 
https://issues.apache.org/jira/browse/TIKA to record the desire for the 
language detection end point
2. submitting a patch and/or working with the committers on that issue you 
create in #1.

HTH!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to