[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-30 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075169#comment-15075169 ] Tilman Hausherr commented on PDFBOX-3062: - This commit also improves the height v

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-30 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075164#comment-15075164 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-11 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053474#comment-15053474 ] John Hewson commented on PDFBOX-3062: - To wrap this one up, revisiting my questions a

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-11 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053468#comment-15053468 ] John Hewson commented on PDFBOX-3062: - {quote} Since PDF was not designed for text ex

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-03 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037704#comment-15037704 ] Maruan Sahyoun commented on PDFBOX-3062: +1 > Text extraction and height differe

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-03 Thread Timo Boehme (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037591#comment-15037591 ] Timo Boehme commented on PDFBOX-3062: - I would like to second the proposal of Tilman

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036261#comment-15036261 ] Tilman Hausherr commented on PDFBOX-3062: - Re reliability, see the test files and

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-02 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036215#comment-15036215 ] John Hewson commented on PDFBOX-3062: - {quote} How is it not reliable? {quote} Why w

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034552#comment-15034552 ] Tilman Hausherr commented on PDFBOX-3062: - {quote} BBox + CapHeight isn't reliabl

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-01 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034507#comment-15034507 ] John Hewson commented on PDFBOX-3062: - BBox + CapHeight isn't reliable either. Either

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034207#comment-15034207 ] Tilman Hausherr commented on PDFBOX-3062: - This isn't really about metrics, this

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-12-01 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034170#comment-15034170 ] John Hewson commented on PDFBOX-3062: - Then the next JIRA issue will be "Text extract

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029257#comment-15029257 ] Maruan Sahyoun commented on PDFBOX-3062: Can't we use the current approach for 2.

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029202#comment-15029202 ] John Hewson commented on PDFBOX-3062: - Technically, the bbox is always correct - it's

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029195#comment-15029195 ] Tilman Hausherr commented on PDFBOX-3062: - If we use the visual bounds to get the

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029088#comment-15029088 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029081#comment-15029081 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029078#comment-15029078 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-26 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029077#comment-15029077 ] Tilman Hausherr commented on PDFBOX-3062: - I ran a test on about 2500 files. The

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027434#comment-15027434 ] Tilman Hausherr commented on PDFBOX-3062: - I tried using CapHeight when available

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013930#comment-15013930 ] Tilman Hausherr commented on PDFBOX-3062: - PDFBOX-3062-H6NIYQXHLPGD3GI6SNIYINRAZB

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-09 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997389#comment-14997389 ] John Hewson commented on PDFBOX-3062: - That's good news. > Text extraction and heigh

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-09 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996376#comment-14996376 ] Tilman Hausherr commented on PDFBOX-3062: - They are already good anyway, probably

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-09 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996365#comment-14996365 ] Maruan Sahyoun commented on PDFBOX-3062: {quote} At the same time, I've noticed y

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-09 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996179#comment-14996179 ] Tilman Hausherr commented on PDFBOX-3062: - But the BBox height is huge... the res

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-08 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996114#comment-14996114 ] John Hewson commented on PDFBOX-3062: - Those BBox values are pretty reasonable though

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-07 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995206#comment-14995206 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-11-07 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995200#comment-14995200 ] Tilman Hausherr commented on PDFBOX-3062: - After the latest changes (probably PDF

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-10-28 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979870#comment-14979870 ] John Hewson commented on PDFBOX-3062: - With my previous comment in mind, I've depreca

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-10-28 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979869#comment-14979869 ] ASF subversion and git services commented on PDFBOX-3062: - Commit

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-10-28 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979318#comment-14979318 ] Tilman Hausherr commented on PDFBOX-3062: - PDFBOX-3062-N2MOQ7YZICIYGTPLQJAWJ4HLN6

[jira] [Commented] (PDFBOX-3062) Text extraction and height different in 2.0

2015-10-27 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977531#comment-14977531 ] John Hewson commented on PDFBOX-3062: - Height isn't calculated in a meaningful way in