Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
reseting the stream passed.
IOException occurs mostly for parsing pdf, zip formats.
Code snipped that I have used is
try {
// I have set the stream as BufferedInputStream of some sample.pdf
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
reseting the stream passed.
What kind of thing is the stream you're passing?
stream.mark(Integer.MAX_VALUE);
Does your stream support marking? And does it support marking
Hi Nick,
Thank you,
What kind of thing is the stream you're passing?
I am passing BufferedInputStream
Does your stream support marking? And does it support marking that much?
Yes it is mark supported and marking the stream is not problem.
Also, you could consider wrapping it with a
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
What kind of thing is the stream you're passing?
I am passing BufferedInputStream
What kind of stream is underneath that though?
As TikaInputStream not resetting the pos of the stream to zero for not all
the document types, so I need to do stream
What kind of stream is underneath that though?
It internally uses ByteArrayInputStream.
Tika will normally consume all of the stream when it parses a file
But in my case the stream that is used for parsing is also used for some
other processing too.
Praneesh
DefaultHandler is effectively a NullHandler; it doesn't store or do anything.
Try BodyContentHandler or ToXMLHandler or maybe WriteoutHandler.
If you want to write out each embedded file as a binary, try subclassing
EmbeddedResourceHandler.
QUOTE:
0down
Might want to look into RecursiveMetadata Parser
http://wiki.apache.org/tika/RecursiveMetadata
Or
https://issues.apache.org/jira/i#browse/TIKA-1329?issueKey=TIKA-1329serverRenderedViewIssue=true
From: yeshwanth kumar [mailto:yeshwant...@gmail.com]
Sent: Monday, June 30, 2014 3:24 PM
To: Allison,