Hi, On 9/28/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > On 9/28/07, kbennett <[EMAIL PROTECTED]> wrote: > > ...It would be nice if there were some implementation of BufferedReader that > > used disk instead of memory if the readaheadLimit exceeded a threshold. If > > not, we may need to write our own.... > > Agreed, a BufferedReader with "unlimited" storage on disk sounds like > the way to go. > > I don't know of any existing implementation, though.
I've implemented such classes a few times before, based on support classes (like DeferredFileOutputStream) from commons-io. I can dig up some of my old code and contribute it to commons-io and/or Tika. There's an interesting question about a potential optimization: If the stream being processed is based on a File, a URI, or a byte array, should we still create a temporary copy of the data while parsing or can we rely on rereading the source of the data? A temporary copy introduces quite a bit of overhead, but avoids nasty problems with files/resources/arrays being overwritten between consecutive parsing passes. BR, Jukka Zitting
