Re: Parser does not produce proper sentence breaks?

2013-06-03 Thread Michael McCandless
First off, those 3 *'s you see are annoying :) They are coming from the master slide, due to this issue: https://issues.apache.org/jira/browse/TIKA-1067 Would be nice to figure out how to stop this "false text" from coming out. Second, I think the PPT/X parsers do not put any information ab

Parser does not produce proper sentence breaks?

2013-05-29 Thread Shai Erera
Hi I've started to use Tika a couple of days ago, so it could very well be that I'm using the wrong ContentHandler, Parser configuration and what not. I hope I do, and there's a simple fix to the following problem: I index documents (for this discussion PPT) and then search and produce search hig