I am analysing running text by trapping the output of PDFBox through
org.apache.pdfbox.util.TextPosition through a subclass of
org.apache.pdfbox.pdfviewer.PageDrawer. I notice that there are explicit
characters for spaces (char 32). Sometimes there can be repeated spaces and
even a "paragraph" consisting only of a space. I was unaware that PDF
supported spaces - are these coming from the original document or are they
generated in PDFBox from calculations of character spacing and width?

TIA for help.

P.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply via email to