[jira] [Resolved] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5881. - Resolution: Fixed > CVE for Lucene libraries > > >

[jira] [Created] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5881: --- Summary: CVE for Lucene libraries Key: PDFBOX-5881 URL: https://issues.apache.org/jira/browse/PDFBOX-5881 Project: PDFBox Issue Type: Bug Affects V

[jira] [Comment Edited] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737 ] Tilman Hausherr edited comment on PDFBOX-4718 at 10/3/24 5:39 PM:

[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737 ] Tilman Hausherr commented on PDFBOX-4718: - Sadly some differences in rendering:

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885635#comment-17885635 ] Tilman Hausherr commented on PDFBOX-5880: - Now it works! > PDF render blank pag

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Attachment: PDFBOX-1094-PDFBOX-269.pdf > PDF render blank page: The end of the stream does

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885251#comment-17885251 ] Tilman Hausherr commented on PDFBOX-5880: - Several differences, e.g. [^PDFBOX-1

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884739#comment-17884739 ] Tilman Hausherr commented on PDFBOX-5852: - All good now, thanks! > Hi CPU and m

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884548#comment-17884548 ] Tilman Hausherr commented on PDFBOX-5880: - proposed change is to add {{stream.se

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884540#comment-17884540 ] Tilman Hausherr commented on PDFBOX-5852: - E.g. with this file: [^CIB-coonsmesh.

[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5852: Attachment: CIB-coonsmesh.pdf > Hi CPU and memory usage when converting a PDF with type 4

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884533#comment-17884533 ] Tilman Hausherr commented on PDFBOX-5852: - Lots of regressions, I need to check

[jira] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852 ] Tilman Hausherr deleted comment on PDFBOX-5852: - was (Author: tilman): No regressions 👍 > Hi CPU and memory usage when converting a PDF with type 4 shading >

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884531#comment-17884531 ] Tilman Hausherr commented on PDFBOX-5880: - The problem is here: {code:java}    

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884528#comment-17884528 ] Tilman Hausherr commented on PDFBOX-5852: - No regressions 👍 > Hi CPU and memory

[jira] [Comment Edited] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expe

2024-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492 ] Tilman Hausherr edited comment on PDFBOX-5880 at 9/25/24 3:55 AM:

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492 ] Tilman Hausherr commented on PDFBOX-5880: - The image has an (incorrect) length o

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Labels: regression (was: ) > PDF render blank page: The end of the stream doesn't point t

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Affects Version/s: 2.0.32 > PDF render blank page: The end of the stream doesn't point to

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Component/s: Parsing (was: Rendering) > PDF render blank page: The en

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327 ] Tilman Hausherr commented on PDFBOX-5879: - I added a simple test for the feature

[jira] [Comment Edited] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327 ] Tilman Hausherr edited comment on PDFBOX-5879 at 9/17/24 9:08 AM:

[jira] [Resolved] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5879. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Updated] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5879: Affects Version/s: 2.0.32 > Regression from PDFBOX-5841: Text extraction with rotation mag

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882240#comment-17882240 ] Tilman Hausherr commented on PDFBOX-5852: - Wow! No regressions. > Hi CPU and m

[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5852: Description: We've observed excessive CPU and memory consumption when converting a PDF to

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879832#comment-17879832 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/6/24 10:35 AM:

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879832#comment-17879832 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/6/24 10:31 AM:

[jira] [Updated] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5878: Attachment: PDFBox5878-flattened.pdf PDFBox5878-saved.pdf > pdf form field

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879822#comment-17879822 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/6/24 9:30 AM: -

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879796#comment-17879796 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/6/24 8:00 AM: -

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879753#comment-17879753 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/6/24 4:04 AM: -

[jira] [Comment Edited] (PDFBOX-5878) pdf form field text gets blurred after flattening

2024-09-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879480#comment-17879480 ] Tilman Hausherr edited comment on PDFBOX-5878 at 9/5/24 8:16 AM: -

[jira] [Reopened] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-04 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened PDFBOX-5876: - > This jpeg2000 takes up a lot of memory, causing overflow. > --

[jira] [Commented] (PDFBOX-5877) After flattening a form pdf, the pdf loses content

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878964#comment-17878964 ] Tilman Hausherr commented on PDFBOX-5877: - Yeah!! There's a log message, so it m

[jira] [Comment Edited] (PDFBOX-5877) After flattening a form pdf, the pdf loses content

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878961#comment-17878961 ] Tilman Hausherr edited comment on PDFBOX-5877 at 9/3/24 5:55 PM: -

[jira] [Commented] (PDFBOX-5877) After flattening a form pdf, the pdf loses content

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878961#comment-17878961 ] Tilman Hausherr commented on PDFBOX-5877: - What's this? pdDocument.setResourceC

[jira] [Commented] (PDFBOX-5877) After flattening a form pdf, the pdf loses content

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878960#comment-17878960 ] Tilman Hausherr commented on PDFBOX-5877: - Are you sure you used 3.0.3 and not 3

[jira] [Commented] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878879#comment-17878879 ] Tilman Hausherr commented on PDFBOX-5876: - No... I used -Xmx4G for a production

[jira] [Commented] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878846#comment-17878846 ] Tilman Hausherr commented on PDFBOX-5876: - Are you sure you are using the new ve

[jira] [Resolved] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5876. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Updated] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5876: Affects Version/s: 2.0.32 > This jpeg2000 takes up a lot of memory, causing overflow. > --

[jira] [Updated] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5876: Component/s: Rendering > This jpeg2000 takes up a lot of memory, causing overflow. > -

[jira] [Commented] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878835#comment-17878835 ] Tilman Hausherr commented on PDFBOX-5876: - The JPX image in that file is 7020 x

[jira] [Updated] (PDFBOX-5875) using font data to process ligatures

2024-08-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5875: Fix Version/s: (was: 3.0.4 PDFBox) > using font data to process ligatures > --

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878089#comment-17878089 ] Tilman Hausherr commented on PDFBOX-5868: - Yes. But consider that Adobe didn't d

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878076#comment-17878076 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/30/24 11:50 AM: ---

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878076#comment-17878076 ] Tilman Hausherr commented on PDFBOX-5868: - Please create a new ticket for the fi

[jira] [Resolved] (PDFBOX-5874) Change Loglevel from Warn to info when rebuilding font cache

2024-08-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5874. - Assignee: Tilman Hausherr Resolution: Fixed Thank you, you're right, there's no ne

[jira] [Updated] (PDFBOX-5874) Change Loglevel from Warn to info when rebuilding font cache

2024-08-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5874: Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 > Change Lo

[jira] [Updated] (PDFBOX-5874) Change Loglevel from Warn to info when rebuilding font cache

2024-08-28 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5874: Affects Version/s: 2.0.32 > Change Loglevel from Warn to info when rebuilding font cache >

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876692#comment-17876692 ] Tilman Hausherr commented on PDFBOX-5868: - In the files I saw /ActualText was of

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876660#comment-17876660 ] Tilman Hausherr commented on PDFBOX-5868: - I haven't resolved this ticket becaus

[jira] [Comment Edited] (PDFBOX-5657) SMaskInData not supported for JPX images

2024-08-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876632#comment-17876632 ] Tilman Hausherr edited comment on PDFBOX-5657 at 8/26/24 8:53 AM:

[jira] [Resolved] (PDFBOX-5657) SMaskInData not supported for JPX images

2024-08-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5657. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Updated] (PDFBOX-5872) Support imageio-jnr / imageio-openjpeg library for JPEG2000 decoding

2024-08-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5872: Affects Version/s: 2.0.32 > Support imageio-jnr / imageio-openjpeg library for JPEG2000 de

[jira] [Resolved] (PDFBOX-5872) Support imageio-jnr / imageio-openjpeg library for JPEG2000 decoding

2024-08-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5872. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Commented] (PDFBOX-5872) Support imageio-jnr / imageio-openjpeg library for JPEG2000 decoding

2024-08-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876045#comment-17876045 ] Tilman Hausherr commented on PDFBOX-5872: - {quote}However, it doesn't appear tha

[jira] [Resolved] (PDFBOX-5869) Checkstyle

2024-08-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5869. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Updated] (PDFBOX-5869) Checkstyle

2024-08-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5869: Affects Version/s: 3.0.3 PDFBox 2.0.32 > Checkstyle > -- >

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875214#comment-17875214 ] Tilman Hausherr commented on PDFBOX-5868: - Another thought I just had was to ext

[jira] [Updated] (PDFBOX-5871) Rendering never finishes

2024-08-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5871: Affects Version/s: 3.0.3 PDFBox 2.0.32 > Rendering never finishes >

[jira] [Updated] (PDFBOX-5871) Rendering never finishes

2024-08-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5871: Attachment: (was: screenshot-1.png) > Rendering never finishes > -

[jira] [Updated] (PDFBOX-5871) Rendering never finishes

2024-08-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5871: Attachment: screenshot-1.png > Rendering never finishes > > >

[jira] [Created] (PDFBOX-5871) Rendering never finishes

2024-08-20 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5871: --- Summary: Rendering never finishes Key: PDFBOX-5871 URL: https://issues.apache.org/jira/browse/PDFBOX-5871 Project: PDFBox Issue Type: Bug Com

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875100#comment-17875100 ] Tilman Hausherr commented on PDFBOX-5868: - Oops, no, it's not that easy. I forgo

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874947#comment-17874947 ] Tilman Hausherr commented on PDFBOX-5868: - Yes this could be possible. All the c

[jira] [Updated] (PDFBOX-5870) [PATCH] Detect CMYK image without relying on metadata

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5870: Affects Version/s: 3.0.3 PDFBox 2.0.32 > [PATCH] Detect CMYK image

[jira] [Resolved] (PDFBOX-5870) [PATCH] Detect CMYK image without relying on metadata

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5870. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 A

[jira] [Updated] (PDFBOX-5870) [PATCH] Detect CMYK image without relying on metadata

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5870: Labels: CMYK (was: ) > [PATCH] Detect CMYK image without relying on metadata > --

[jira] [Updated] (PDFBOX-5870) [PATCH] Detect CMYK image without relying on metadata

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5870: Component/s: Rendering > [PATCH] Detect CMYK image without relying on metadata > -

[jira] [Commented] (PDFBOX-5870) [PATCH] Detect CMYK image without relying on metadata

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874891#comment-17874891 ] Tilman Hausherr commented on PDFBOX-5870: - Could you attach a PDF where this hap

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874801#comment-17874801 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/19/24 7:40 AM:

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874801#comment-17874801 ] Tilman Hausherr commented on PDFBOX-5868: - Here's the excel file with the differ

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Attachment: content_diffs_with_exceptions-ActualText.xlsx > PDFBox not extracting text of

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874700#comment-17874700 ] Tilman Hausherr commented on PDFBOX-5868: - I ran a comparison on several 10

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Attachment: PDFBOX-5868-SI5K4X4Z55SQAUPLAUP6QRRWT3UD3LAA-EmptyActualText_reduced.pdf

[jira] [Commented] (PDFBOX-5869) Checkstyle

2024-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874647#comment-17874647 ] Tilman Hausherr commented on PDFBOX-5869: - It should now work for the trunk, bot

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874501#comment-17874501 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 12:57 PM: ---

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874501#comment-17874501 ] Tilman Hausherr commented on PDFBOX-5868: - It's already done elsewhere: {code} i

[jira] [Closed] (PDFBOX-2740) Text extraction failed on Korean PDF

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-2740. --- Resolution: Not A Problem The /ActualText problem was fixed in PDFBOX-5868. However extracti

[jira] [Reopened] (PDFBOX-2740) Text extraction failed on Korean PDF

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened PDFBOX-2740: - > Text extraction failed on Korean PDF > > >

[jira] [Updated] (PDFBOX-2740) Text extraction failed on Korean PDF

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2740: Labels: ActualText (was: ) > Text extraction failed on Korean PDF > -

[jira] [Closed] (PDFBOX-4532) PDFTextStripper replacing the decimal with white space

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-4532. --- Resolution: Duplicate Fixed in PDFBOX-5868 > PDFTextStripper replacing the decimal with whi

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 > PDFBox no

[jira] [Assigned] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned PDFBOX-5868: --- Assignee: Tilman Hausherr > PDFBox not extracting text of non-latin languages(tamil

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Affects Version/s: 2.0.32 > PDFBox not extracting text of non-latin languages(tamil, benga

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Labels: ActualText (was: ) > PDFBox not extracting text of non-latin languages(tamil, ben

[jira] [Updated] (PDFBOX-3248) Unwanted spaces in text extraction (2)

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-3248: Labels: ActualText (was: ) > Unwanted spaces in text extraction (2) > ---

[jira] [Closed] (PDFBOX-3248) Unwanted spaces in text extraction (2)

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-3248. --- Resolution: Duplicate Fixed in PDFBOX-5868 > Unwanted spaces in text extraction (2) > -

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1787#comment-1787 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 8:09 AM:

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1787#comment-1787 ] Tilman Hausherr commented on PDFBOX-5868: - {quote}why why there is another boole

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 6:30 AM:

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/17/24 4:12 AM:

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/16/24 7:06 PM:

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874369#comment-17874369 ] Tilman Hausherr commented on PDFBOX-5868: - There is a problem that I didn't noti

[jira] [Updated] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5868: Attachment: screenshot-2.png > PDFBox not extracting text of non-latin languages(tamil, be

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874268#comment-17874268 ] Tilman Hausherr commented on PDFBOX-5868: - First I'll need to make the changes I

[jira] [Comment Edited] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874189#comment-17874189 ] Tilman Hausherr edited comment on PDFBOX-5868 at 8/16/24 12:31 PM: ---

[jira] [Commented] (PDFBOX-5868) PDFBox not extracting text of non-latin languages(tamil, bengali) properly but adobe reader's save as text does

2024-08-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874222#comment-17874222 ] Tilman Hausherr commented on PDFBOX-5868: - The code has one flaw, that it doesn'

  1   2   3   4   5   6   7   8   9   10   >