Trailing Space and Final CRLF Added

flywire Wed, 16 Mar 2022 05:28:25 -0700

Can text be extracted without adding trailing space?

*Text.txt*
def hello_world():
    print("Hello World!")


hello_world()
*File ends line above with no CRLF*

java -jar pdfbox-app-2.0.25.jar TextToPDF -standardFont Courier test.pdf
test.txt
java -jar pdfbox-app-2.0.25.jar ExtractText test.pdf test1.txt

Output file has a space appended to each line and last line has CRLF
appended.

Using test1.txt as input gives matching output.

Using Win10.

java -jar pdfbox-app-2.0.25.jar WriteDecodedDoc test.pdf test-decoded.txt

%PDF-1.4
%צה
1 0 obj
<<
/Type /Catalog
/Version /1.4
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [3 0 R]
/Count 1
>>
endobj
3 0 obj
<<
/Type /Page
/MediaBox [0.0 0.0 612.0 792.0]
/Parent 2 0 R
/Contents 4 0 R
/Resources 5 0 R
>>
endobj
4 0 obj
<<
/Length 178
>>
stream
/F1 10 Tf
BT
40 763.07751 Td
0 -11.0775 Td
(def hello_world\(\): ) Tj
0 -11.0775 Td
(    print\("Hello World!"\) ) Tj
0 -11.0775 Td
( ) Tj
0 -11.0775 Td
(hello_world\(\) ) Tj
ET

endstream
endobj
5 0 obj
<<
/Font 6 0 R
>>
endobj
6 0 obj
<<
/F1 7 0 R
>>
endobj
7 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Courier
/Encoding /WinAnsiEncoding
>>
endobj
xref
0 8
0000000000 65535 f
0000000015 00000 n
0000000078 00000 n
0000000135 00000 n
0000000247 00000 n
0000000478 00000 n
0000000511 00000 n
0000000542 00000 n
trailer
<<
/Root 1 0 R
/ID [<2B2F22A234DF5483D5614CAB282ED31B> <2B2F22A234DF5483D5614CAB282ED31B>]
/Size 8
>>
startxref
637
%%EOF

Trailing Space and Final CRLF Added

Reply via email to