[ 
https://issues.apache.org/jira/browse/PDFBOX-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245308#comment-14245308
 ] 

Tilman Hausherr edited comment on PDFBOX-785 at 12/13/14 12:41 PM:
-------------------------------------------------------------------

java -jar pdfbox-app-2.0.0-SNAPSHOT.jar PDFSplit -endPage 2300 
Default_Table_Formatting-merged.pdf

brings a result file with size 36MB :-(

The content stream is not compressed.  Is there a reason that 
PDDocument.importPage() does not use compression for the content stream? With 
compression (adding dest.addCompression(); ), I get a size of 15MB.


was (Author: tilman):
java -jar pdfbox-app-2.0.0-SNAPSHOT.jar PDFSplit -endPage 2300 
Default_Table_Formatting-merged.pdf

brings a result file with size 36MB :-(

The content stream is not compressed.  Is there a reason that 
PDDocument.importPage() does not use compression? With compression, I get a 
size of 15MB.

> Spliting a PDF creates unnecessarily large files
> ------------------------------------------------
>
>                 Key: PDFBOX-785
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-785
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 0.8.0-incubator, 1.1.0, 1.2.1
>         Environment: Windows XP, openOffice3.0.0, pdfsam
>            Reporter: mathieu radiguet
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: fileSizeIssue.zip
>
>
> Using PDFBox 0.8.0 (also tried on 1.1.0 and 1.2.1) to split files result in 
> bigger parts than the original.
> Concerned files were made from openOffice .odt documents in version 3.0.0 
> using openOffice pdf Export and then merging several copies with pdfsam 
> (http://www.pdfsam.org/)
> In joined eclipse project the test file size is 10 712 749  bytes for 2812 
> pages and the result file sizes after splitting in two at page 2300 are : 8 
> 812 515  bytes and 10 701 142 bytes.
> Using pdfSplit in command line as result we have all single result files 
> bigger than the original. An example is also attached. An error tells the 
> original file is corrupted, but we tried it on a file (using pdfsam and 
> without using it) with no error and with similar result, so I think it's not 
> related. 
> This issue seems similar to PDFBOX-28.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to