Chris Mattmann wrote:


The best way to get started with Tika is to check out the unit tests right
now, available in:

http://svn.apache.org/repos/asf/incubator/tika/trunk/src/test

They really do the best job right now of exercising the features of the API.

ok

To get a good idea of what's been contributed/committed to the upcoming 0.1
release, check out the CHANGES.txt file:

http://svn.apache.org/repos/asf/incubator/tika/trunk/CHANGES.txt

In general, the mime type detector, metadata framework, and automatic
parsing framework are currently working.

cool

There is a near-stable Parser
interface and several implementations of the Parser that exist to handle MS
WORD files, MS Powerpoint, PDF, XML, plain text, etc. You can see the
generic Parser interface by going to:

http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apach
e/tika/parser/Parser.java

ok

Tika is also currently used within Nutch as the mime type detection
framework, since the commit of NUTCH-562 [1]. Checking out Nutch will give
you an idea of how the mime framework works.

will do that

If you have any further questions, please let me (and others on this list)
know. We'd love to help. Again, welcome!

thanks very much for all your help. I think this should get me started.

Cheers

Michael

Cheers,
 Chris

[1] http://issues.apache.org/jira/browse/NUTCH-562

On 12/6/07 2:38 PM, "Michael Wechner" <[EMAIL PROTECTED]> wrote:

Hi

I guess Tika has been fully "migrated" from

http://code.google.com/p/tika/

to

http://incubator.apache.org/tika/index.html

right? If so, then I would suggest to add a note to the Google Code site
resp. close the project at Google (if possible).

Also I wanted to ask what's best to get started with Tika?
I tried to find some documentation, but didn't really find anything like
"a first hops example".

Thanks in advance for any pointers

Michael

______________________________________________
Chris Mattmann, Ph.D.
[EMAIL PROTECTED]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




--
Michael Wechner
Wyona      -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
[EMAIL PROTECTED], [EMAIL PROTECTED]
+41 44 272 91 61

Reply via email to