> Problem I am having is that some of them has multiple columns. and multiple
> word boxes. Does the xpdf patch extract different columns and wordboxes?
It tells you where each word is. Columns you have to do for yourself.
Bill
> > In UpLib, I use xpdf-3.02pl2 with a patch which gives me positi
Hello Bill,
Problem I am having is that some of them has multiple columns. and multiple
word boxes. Does the xpdf patch extract different columns and wordboxes?
Best,
-C.B.
On Wed, May 14, 2008 at 6:35 PM, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > > the unix program pdf2text can convert keep
> > the unix program pdf2text can convert keeping the text places, but I wanted
> > to ask you guys if you know something better,
>
> AFAIK, PDFBox has a lower-level API that allows you to get hold of text
> positions.
In UpLib, I use xpdf-3.02pl2 with a patch which gives me position and
font in
Cam Bazz wrote:
Hello All,
Any suggestions for extracting text from PDF? I have tried pdfbox, but it
works nice, however if the pdf is structured, it wont provide good results.
For example consider the pdf:
P1 Lorem Ipsum Bla bla P3 Lorem2 Ipsum2
P1 bla bla
Hello All,
Any suggestions for extracting text from PDF? I have tried pdfbox, but it
works nice, however if the pdf is structured, it wont provide good results.
For example consider the pdf:
P1 Lorem Ipsum Bla bla P3 Lorem2 Ipsum2
P1 bla bla
P2 bla bla bla
P