[ 
https://issues.apache.org/jira/browse/IMAGING-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damjan Jovanovic resolved IMAGING-70.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

Thank you! Patch applied to SVN. Resolving fixed.

                
> Reduce memory use of TIFF readers
> ---------------------------------
>
>                 Key: IMAGING-70
>                 URL: https://issues.apache.org/jira/browse/IMAGING-70
>             Project: Apache Commons Imaging
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>             Fix For: 1.0
>
>         Attachments: Tracker_76_Test_5_May_2012.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> This Tracker Item proposes changes to the TIFF file readers to address memory 
> issues when reading very large images from TIFF files.  The TIFF format is 
> used extensively in technical applications such as aerial photographs, 
> satellite images, and digital raster maps which feature very large image 
> sizes.  For example, the public-domain Natural Earth Data set features raster 
> files sized 21,600 by 10,800 pixels (222.5 megapixels).   Although this 
> example is unusually large, image sizes of 25 to 100 megapixels are common 
> for such applications.
> Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as 
> much memory as is necessary.  The reader operates in two stages. First, it 
> reads the entire source file into memory then it builds the output image, 
> also in memory.   In the example file mentioned above, the source data runs 
> from 83.19 to 373 megabytes (depending on compression).   Thus Sanselan would 
> require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for 
> one of these files (allowing 4 bytes per pixel in the output BufferedImage)
> Fortunately, TIFF files are organized so that they can be read a piece at a 
> time.  TIFF files are divided into either strips or tiles and, if data 
> compression is used, each piece is compressed individually.  Thus each 
> individual piece has no dependency on the other. 
> This item proposes to implement two changes:
> 1)  Allow the TIFF data reader to read the files one piece at a time while 
> constructing the buffered image.  Thus the memory use for reading would be no 
> larger than the piece size.  This would be an internal change, so the 
> external appearance of the Sanselan getBufferedImage methods would not change.
> 2) Provide new API elements that permit applications to read the strips or 
> tiles from TIFF files individually.     This change would support 
> applications that needed to access very large TIFF files without committing 
> the memory to store a BufferedImage for the entire file (a 222.5 megapixel 
> image requires 890 megabytes, which is a lot even by contemporary standards).
> There is one minor issue in this implementation that is easily addressed.  
> Sanselan reads images from ByteSources that can be either random-access files 
> or sequential-access input streams.  In the case of sequential-input streams, 
> it may be hard to perform a partial read on a TIFF directory.  In such a 
> case, the TIFF access routines might have to resort to reading the entire 
> source data into memory as it currently does.   This would simply be a 
> limitation of the implementation.
> There is one issue that may make this change a bit problematic.  The TIFF 
> processors depend on accessing a class called TiffDataElement that contains a 
> public array of bytes called "data".   The most expeditious way of 
> implementing the enchancement is to make this element private and add an 
> accessor that either returns the data from internal memory or else loads it 
> on-demand.  Unfortunately, because the data element is scoped to public, 
> there is a chance that some existing applications are using it directly.   In 
> hindsight, it is clear that scoping this element as public was a mistake, but 
> it may be too late to fix it.  So care will be required to ensure that 
> compatibility remains.   The most likely solution seems to be to implement a 
> new class for passing raw data from the source TIFF files to the DataReader 
> implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to