On Wed, 29 May 2013, Nick Burch wrote:
I'm not sure if we do have a properly documented policy on what a parser
should do if it receives a file it can't handle. For ones that are
invalid (eg corrupt), I believe an exception is the expected result. The
case when the file seems valid, but can't b
On Wed, 29 May 2013, Christian Reuschling wrote:
Nevertheless, in this case an Exception (like in all other parsers) or a
tika body with length zero, which is indicated at least by
handler.endDocument() would be the appropriate way, isn't it? - From the
ContentHandlers point of view, there is n
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
You are right - I checked this, and its a .mov file under an URL, but the
length of the file is zero.
Nevertheless, in this case an Exception (like in all other parsers) or a tika
body with length
zero, which is indicated at least by handler.endDocu
On Tue, 28 May 2013, Christian Reuschling wrote:
This works like a charme, but inside MP4Parser, there exists these lines
of code:
Line 146-154, parse() method:
MovieBox moov = getOrNull(isoFile, MovieBox.class);
if (moov == null) {
// Bail out
return;
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Normally Parsers triggers the ContentHandler.startDocument() method in their
parse(...
ContentHandler ...) method for sure - this is also true in the case of an
Error, which normally
throws an Exception.
We wrote and maintain an open source crawler