Re: MP4Parser Triggers no ContentHandler.startDocument() and ContentHandler.endDocument() in one case

2013-05-29 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 You are right - I checked this, and its a .mov file under an URL, but the length of the file is zero. Nevertheless, in this case an Exception (like in all other parsers) or a tika body with length zero, which is indicated at least by handler.endDocu

[jira] [Created] (TIKA-1128) Replace line tabulation with line break

2013-05-29 Thread Privezentsev Konstantin (JIRA)
Privezentsev Konstantin created TIKA-1128: - Summary: Replace line tabulation with line break Key: TIKA-1128 URL: https://issues.apache.org/jira/browse/TIKA-1128 Project: Tika Issue Ty

[jira] [Updated] (TIKA-1128) Replace line tabulation with line break

2013-05-29 Thread Privezentsev Konstantin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Privezentsev Konstantin updated TIKA-1128: -- Attachment: 0001-TIKA-1128-Replace-line-tabular-by-line-break-when-ex.patch

Re: MP4Parser Triggers no ContentHandler.startDocument() and ContentHandler.endDocument() in one case

2013-05-29 Thread Nick Burch
On Wed, 29 May 2013, Christian Reuschling wrote: Nevertheless, in this case an Exception (like in all other parsers) or a tika body with length zero, which is indicated at least by handler.endDocument() would be the appropriate way, isn't it? - From the ContentHandlers point of view, there is n

Parser does not produce proper sentence breaks?

2013-05-29 Thread Shai Erera
Hi I've started to use Tika a couple of days ago, so it could very well be that I'm using the wrong ContentHandler, Parser configuration and what not. I hope I do, and there's a simple fix to the following problem: I index documents (for this discussion PPT) and then search and produce search hig