hi, i just checked out tika for the first time,

since i could use it very well, i have a few newbie questions.

1. when will tika switch to apache's pdf box (is it still not mature enough?)
2. is it possible to skip html tags with tika (say i don't want to have 
<script> or <style> contents in my resulting plain text

and most important

3. are there any plan for outputing the result into RDF (currently i'm using 
aperture), but i would be more than happy to switch to an apache project
    and i'm also willing to contribute on that one.

any insight appreciated
wkr www.turnguard.com    



      

Reply via email to