[ https://issues.apache.org/jira/browse/IMAGING-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damjan Jovanovic resolved IMAGING-70. ------------------------------------- Resolution: Fixed Fix Version/s: 1.0 Thank you! Patch applied to SVN. Resolving fixed. > Reduce memory use of TIFF readers > --------------------------------- > > Key: IMAGING-70 > URL: https://issues.apache.org/jira/browse/IMAGING-70 > Project: Apache Commons Imaging > Issue Type: Improvement > Components: Format: TIFF > Reporter: Gary Lucas > Fix For: 1.0 > > Attachments: Tracker_76_Test_5_May_2012.patch > > Original Estimate: 80h > Remaining Estimate: 80h > > This Tracker Item proposes changes to the TIFF file readers to address memory > issues when reading very large images from TIFF files. The TIFF format is > used extensively in technical applications such as aerial photographs, > satellite images, and digital raster maps which feature very large image > sizes. For example, the public-domain Natural Earth Data set features raster > files sized 21,600 by 10,800 pixels (222.5 megapixels). Although this > example is unusually large, image sizes of 25 to 100 megapixels are common > for such applications. > Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as > much memory as is necessary. The reader operates in two stages. First, it > reads the entire source file into memory then it builds the output image, > also in memory. In the example file mentioned above, the source data runs > from 83.19 to 373 megabytes (depending on compression). Thus Sanselan would > require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for > one of these files (allowing 4 bytes per pixel in the output BufferedImage) > Fortunately, TIFF files are organized so that they can be read a piece at a > time. TIFF files are divided into either strips or tiles and, if data > compression is used, each piece is compressed individually. Thus each > individual piece has no dependency on the other. > This item proposes to implement two changes: > 1) Allow the TIFF data reader to read the files one piece at a time while > constructing the buffered image. Thus the memory use for reading would be no > larger than the piece size. This would be an internal change, so the > external appearance of the Sanselan getBufferedImage methods would not change. > 2) Provide new API elements that permit applications to read the strips or > tiles from TIFF files individually. This change would support > applications that needed to access very large TIFF files without committing > the memory to store a BufferedImage for the entire file (a 222.5 megapixel > image requires 890 megabytes, which is a lot even by contemporary standards). > There is one minor issue in this implementation that is easily addressed. > Sanselan reads images from ByteSources that can be either random-access files > or sequential-access input streams. In the case of sequential-input streams, > it may be hard to perform a partial read on a TIFF directory. In such a > case, the TIFF access routines might have to resort to reading the entire > source data into memory as it currently does. This would simply be a > limitation of the implementation. > There is one issue that may make this change a bit problematic. The TIFF > processors depend on accessing a class called TiffDataElement that contains a > public array of bytes called "data". The most expeditious way of > implementing the enchancement is to make this element private and add an > accessor that either returns the data from internal memory or else loads it > on-demand. Unfortunately, because the data element is scoped to public, > there is a chance that some existing applications are using it directly. In > hindsight, it is clear that scoping this element as public was a mistake, but > it may be too late to fix it. So care will be required to ensure that > compatibility remains. The most likely solution seems to be to implement a > new class for passing raw data from the source TIFF files to the DataReader > implementations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira