Vitalie Bureanu created PDFBOX-1542:
---------------------------------------

             Summary: Whitespaces between words are not created
                 Key: PDFBOX-1542
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1542
             Project: PDFBox
          Issue Type: Wish
          Components: Text extraction
    Affects Versions: 1.7.1
            Reporter: Vitalie Bureanu
            Priority: Minor


Hello, I extract the text with PDFBox from PDF files. I noticed that extraction 
of text from some pdf files are not so good as expected. I have a seria of pdf 
invoices from which I try to extract the text with coordinates and resultat is 
pretty well, but I noticed very strange thing: when I extract text - the words 
are extracted without whitespaces bettween. Example: if I try to extract "Total 
Amount" the result is "TotalAmount".
But if I open the invoice in Adobe Reader and make "Copy/Past" into Notepad... 
I have the "Total Amount" with whitespaces!
I think the whitespaces are not present in original pdf document... but the 
Adobe Reader in some way "insert" whitespaces between words when it show 
content of the pdf.
 
Guys, can you please suggest me how I can have the strings with spaces after 
the parsing? 

See example of invoice here: http://www.cloudforpeople.com/Invoice1.pdf

PS: I want to try the 1.8.0. version of PDFBox - how I can download it?

Many thanks,
Vitalie

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to