[
https://issues.apache.org/jira/browse/TIKA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith R. Bennett updated TIKA-45:
---------------------------------
Attachment: RereadableInputStreamTest.java
RereadableInputStream.java
tika45.patch
I've attached both a patch, and the patched source files for your convenience
in viewing.
Changes to the RereadableInputStream include:
* Addresses this issue by defaulting to reading until the end of the original
input stream on the first rewind, but also provides a constructor with a
boolean value specifying whether or not to do this.
* Added javadoc.
Thanks to Chris Mattmann for his suggestion regarding this issue.
As you can see, this class has a unit test, but given its importance, more
testing would be a Good Thing.
I'm pasting here a TODO comment from the file because it describes what I think
is a better solution to the problem:
// TODO: At some point it would be better to replace the current approach
// (specifying the above) with more automated behavior. The stream could
// keep the original stream open until EOF was reached. For example, if:
//
// the original stream is 10 bytes, and
// only 2 bytes are read on the first pass
// rewind() is called
// 5 bytes are read
//
// In this case, this instance gets the first 2 from its store,
// and the next 3 from the original stream, saving those additional 3
// bytes in the store. In this way, only the maximum number of bytes
// ever needed must be saved in the store; unused bytes are never read.
// The original stream is closed when EOF is reached, or when close()
// is called, whichever comes first. Using this approach eliminates
// the need to specify the flag (though makes implementation more complex).
- Keith
> RereadableInputStream needs to be able to read to the end of the original
> stream on first rewind.
> -------------------------------------------------------------------------------------------------
>
> Key: TIKA-45
> URL: https://issues.apache.org/jira/browse/TIKA-45
> Project: Tika
> Issue Type: Improvement
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Fix For: 0.1-incubator
>
> Attachments: RereadableInputStream.java,
> RereadableInputStreamTest.java, tika45.patch
>
>
> RereadableInputStream reads a stream's content into a store (memory or file)
> on its first pass. If rewind() is called before end of stream is reached,
> the bytes not yet read will not be available on subsequent reads of the
> RereadableInputStream. This could be a problem, for example, if a parser
> uses it to get metadata from the beginning of a stream and calls rewind(),
> expecting to get the entire document.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.