[ https://issues.apache.org/jira/browse/SANSELAN-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267182#comment-13267182 ]
Damjan Jovanovic commented on SANSELAN-78: ------------------------------------------ Well what we really want here is an interface that will allow seeking as well as I/O on any backend representation (byte[], InputStream or File). Such an interface doesn't exist in Java - RandomAccessFile and FileChannel both require local files, while InputStream doesn't allow seeking. Ideally we'd have a SeekableInputStream and some way to get it from a ByteSource and then keep reusing it. > Improve speed of random-access-file handling for TIFF format, potentially > others > -------------------------------------------------------------------------------- > > Key: SANSELAN-78 > URL: https://issues.apache.org/jira/browse/SANSELAN-78 > Project: Commons Sanselan > Issue Type: Improvement > Components: Format: TIFF > Reporter: Gary Lucas > > Large TIFF files can be organized into chunks (either strips or tiles) so > that the image can be read a piece-at-a-time. In the Apache Imaging > implementation, each time one of these pieces is read, the TiffReader uses > the getBlock() method of the ByteSourceFile class. This class opens the file > using the Java RandomAccessFile class, seeks to the position of the data in > the file, reads its content, and closes the file. Although this operation > can be performed several times and thus entails a lot of redundant file opens > and reads, the file cache performance on modern computers is truly amazing > and for files of less than 5 megabytes, it often doesn't make a difference. > On larger files, however, it can be significant. > This Tracker Item proposes to modify the ByteSourceFile class so that an > access routine can optionally hold the file open between getBlock() method > calls. It will accomplish this by adding a new method called > .setPersistent(boolean). By default, persistence will be set to false and > the ByteSourceFile class will continue to work just as it always has > (existing code will not be affected). If persistence is set to true, the > RandomAccessFile will be held open. > To get some sense of the performance difference, I ran several tests. For > the sample "ron and andy.tif" file provided with the Apache Imaging package, > which is under 5 megabytes, the change made little difference. However, > when I tested with a larger files, such as the Apache Imaging sample > 2560-by-1920 pixel PICT2833.TIF file (a blurry picture of a pretty girl), > and a 2500-by-2500 pixel file I downloaded from the US Geological Survey > (USGS), I saw notable differences. > I also tested on a fast local disk (my PC) and on a network disk. Not > surprisingly, the network disk showed the biggest change (in order to keep > the test environment clean, I ran the network test early in the morning when > the network was lightly used). > As you can see in the tests below on the local disk the savings is modest > even for the largest file. However, when dealing with a network file system, > the change becomes significant. > {code} > ron and andy.tif 1500-by-1125 4.8 MB > local original: 25.9 ms. > local modified: 24.8 ms. > network original: 122.7 ms. > network modified: 117.6 ms. > PICT2833.TIF 2560-by-1920 14.1 MB > local original: 77.7 ms. > local modified: 61.7 ms. > network original: 774.1 ms. > network modified: 463.8 ms. > USGS1 2500-by-2500 18.8 MB > local original: 192.3 ms. > local modified: 94.5 ms. > network original: 3992.8 ms. > network modified: 1807.1 ms. > USGS2 10000-by-10000 286 MB > local original: 1930.5 ms. > local modified: 1344.5 ms. > network original: 26627.6 ms. > network modified: 13402.1 ms. > {code} > One consequence of this change is that if persistence is set to true, the > file will be held open until the ByteSourceFile goes out-of-scope and is > garbage collected. So this change will also make sure that the TiffReader > sets the persistence back to false when it is done reading the file in order > to expedite the release of file resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira