hi, i just checked out tika for the first time,
since i could use it very well, i have a few newbie questions.
1. when will tika switch to apache's pdf box (is it still not mature enough?)
2. is it possible to skip html tags with tika (say i don't want to have
<script> or <style> contents in my resulting plain text
and most important
3. are there any plan for outputing the result into RDF (currently i'm using
aperture), but i would be more than happy to switch to an apache project
and i'm also willing to contribute on that one.
any insight appreciated
wkr www.turnguard.com