Getting IOException: Resetting to invalid mark while reseting the stream

2014-06-30 Thread PRANEESH KUMAR
Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while reseting the stream passed. IOException occurs mostly for parsing pdf, zip formats. Code snipped that I have used is try { // I have set the stream as BufferedInputStream of some sample.pdf

Re: Getting IOException: Resetting to invalid mark while reseting the stream

2014-06-30 Thread Nick Burch
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote: Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while reseting the stream passed. What kind of thing is the stream you're passing? stream.mark(Integer.MAX_VALUE); Does your stream support marking? And does it support marking

Re: Getting IOException: Resetting to invalid mark while reseting the stream

2014-06-30 Thread PRANEESH KUMAR
Hi Nick, Thank you, What kind of thing is the stream you're passing? I am passing BufferedInputStream Does your stream support marking? And does it support marking that much? Yes it is mark supported and marking the stream is not problem. Also, you could consider wrapping it with a

Re: Getting IOException: Resetting to invalid mark while reseting the stream

2014-06-30 Thread Nick Burch
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote: What kind of thing is the stream you're passing? I am passing BufferedInputStream What kind of stream is underneath that though? As TikaInputStream not resetting the pos of the stream to zero for not all the document types, so I need to do stream

Re: Getting IOException: Resetting to invalid mark while reseting the stream

2014-06-30 Thread PRANEESH KUMAR
What kind of stream is underneath that though? It internally uses ByteArrayInputStream. Tika will normally consume all of the stream when it parses a file But in my case the stream that is used for parsing is also used for some other processing too. Praneesh

RE: Stack Overflow Question

2014-06-30 Thread Allison, Timothy B.
DefaultHandler is effectively a NullHandler; it doesn't store or do anything. Try BodyContentHandler or ToXMLHandler or maybe WriteoutHandler. If you want to write out each embedded file as a binary, try subclassing EmbeddedResourceHandler. QUOTE: 0down

RE: Stack Overflow Question

2014-06-30 Thread Allison, Timothy B.
Might want to look into RecursiveMetadata Parser http://wiki.apache.org/tika/RecursiveMetadata Or https://issues.apache.org/jira/i#browse/TIKA-1329?issueKey=TIKA-1329serverRenderedViewIssue=true From: yeshwanth kumar [mailto:yeshwant...@gmail.com] Sent: Monday, June 30, 2014 3:24 PM To: Allison,