Here's what I got with ExtractText command line application:
______
______ 03-09 3,411.69
ELECTRONIC DEPOSIT FDMS-SETTLEMENT DEPOSIT 376249462999
03-10 1,645.22 ELECTRONIC DEPOSIT FDMS-SETTLEMENT
DEPOSIT 376249462999
However I think I understand the cause of your problem, because there's
output like this:
String[461.20358,340.904 fs=1.0 xscale=1.0 height=4.44 space=4.7999997
width=4.799988]6
String[461.20428,340.904 fs=1.0 xscale=1.0 height=6.48 space=7.2
width=7.200012]
i.e. space and a character at the same place. See this content stream:
BT
0 0 0 rg
/F0 1 Tf
1 0 0 1 29.204 460.096 Tm
( ______ ) Tj
1 0 0 1 29.204 451.096 Tm
( ______ ) Tj
/F1 1 Tf
1 0 0 1 29.204 451.096 Tm
( 03-09 3,411.69 ELECTRONIC DEPOSIT FDMS-SETTLEMENT
DEPOSIT 376249462999 ) Tj
1 0 0 1 29.204 442.096 Tm
( 03-10 1,645.22 ELECTRONIC DEPOSIT FDMS-SETTLEMENT
DEPOSIT 376249462999 ) Tj
ET
There are two lines that start at the same position 29.204 451.096, one
with blanks, one with a text. That is a bug by the creator of the file.
Tilman
Am 29.03.2016 um 18:48 schrieb Joel Hirsh:
I thought it was attached to the first email, but it is also available at
https://www.dropbox.com/s/btqwaxfsubt3rwx/extra%20spaces.pdf?dl=0
On Tue, Mar 29, 2016 at 9:13 AM, Tilman Hausherr <[email protected]>
wrote:
Please upload that file somewhere.
Tilman
Am 29.03.2016 um 17:24 schrieb Joel Hirsh:
I have a couple of PDF files that have this problem. These are
multi-page PDF files, and on one page (the first) there are a few lines
that get extra spaces between almost every character as seen from
PrintTextLocations.
Attached is a snippet from one of those files, the first line has the
problem, the second line does not.
In this file, the first line gets a string that is
0 3- 09 3 ,4 1 1. 6 9 EL E CT R ON I C D EP O SI T
F DM S -S E TT L EM E NT D E PO S IT 37 6 24 9 46 2 99 9
While the second line gets the text without any extra spaces.
The two lines also have different spacing values as reported by
PrintTextLocations. In the full file, all the good lines have one value,
the bad lines a different value.
I cannot see any difference between the lines in Acrobat, doing
copy/paste, Nitro editing.
This problem shows up in 2.0.0 and the latest 2.0.1 snapshot, and some
older versions I tried as well (i.e. I don't think it is any kind of
regression)
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]