[ 
https://issues.apache.org/jira/browse/TIKA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith R. Bennett updated TIKA-45:
---------------------------------

    Attachment: RereadableInputStreamTest.java
                RereadableInputStream.java
                tika45.patch

I've attached both a patch, and the patched source files for your convenience 
in viewing.

Changes to the RereadableInputStream include:

* Addresses this issue by defaulting to reading until the end of the original 
input stream on the first rewind, but also provides a constructor with a 
boolean value specifying whether or not to do this.

* Added javadoc.

Thanks to Chris Mattmann for his suggestion regarding this issue.

As you can see, this class has a unit test, but given its importance, more 
testing would be a Good Thing.

I'm pasting here a TODO comment from the file because it describes what I think 
is a better solution to the problem:

    // TODO: At some point it would be better to replace the current approach
    // (specifying the above) with more automated behavior.  The stream could
    // keep the original stream open until EOF was reached.  For example, if:
    //
    // the original stream is 10 bytes, and
    // only 2 bytes are read on the first pass
    // rewind() is called
    // 5 bytes are read
    //
    // In this case, this instance gets the first 2 from its store,
    // and the next 3 from the original stream, saving those additional 3
    // bytes in the store.  In this way, only the maximum number of bytes
    // ever needed must be saved in the store; unused bytes are never read.
    // The original stream is closed when EOF is reached, or when close()
    // is called, whichever comes first.  Using this approach eliminates
    // the need to specify the flag (though makes implementation more complex).

- Keith

> RereadableInputStream needs to be able to read to the end of the original 
> stream on first rewind.
> -------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-45
>                 URL: https://issues.apache.org/jira/browse/TIKA-45
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: RereadableInputStream.java, 
> RereadableInputStreamTest.java, tika45.patch
>
>
> RereadableInputStream reads a stream's content into a store (memory or file) 
> on its first pass.  If rewind() is called before end of stream is reached, 
> the bytes not yet read will not be available on subsequent reads of the 
> RereadableInputStream.  This could be a problem, for example, if a parser 
> uses it to get metadata from the beginning of a stream and calls rewind(), 
> expecting to get the entire document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to