Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
Thank you all. You do a great service. I am up and running. Thanks, Pulkit On Thu, Feb 2, 2017 at 3:19 PM, Tilman Hausherr wrote: > Am 02.02.2017 um 21:12 schrieb Pulkit Kapur: > >> I am getting just the headers: >> "2016 IEEE/RSJ International Conference on Intelligent Robots and Systems >> (

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Tilman Hausherr
Am 02.02.2017 um 21:12 schrieb Pulkit Kapur: I am getting just the headers: "2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Daejeon Convention Center October 9-14, 2016, Daejeon, Korea 978-1-5090-3761-2/16/$31.00 ©2016 IEEE 5324 5325 5326 5327 5328 5329 5330 5331

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
I am getting just the headers: "2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Daejeon Convention Center October 9-14, 2016, Daejeon, Korea 978-1-5090-3761-2/16/$31.00 ©2016 IEEE 5324 5325 5326 5327 5328 5329 5330 5331 " Did use the new file path: javaaddpath('C:\Us

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Tilman Hausherr
Am 02.02.2017 um 20:26 schrieb Pulkit Kapur: Thanks. Thats what i would expect to read. Also thanks for pointing to the latest version. I pointed to the pdfbox-app-2.0.4.jar and the fontbox-2.0.4.jar files. Since i want to read over 1000 pdf documents programmatically in matlab, i am not using t

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
Thanks. Thats what i would expect to read. Also thanks for pointing to the latest version. I pointed to the pdfbox-app-2.0.4.jar and the fontbox-2.0.4.jar files. Since i want to read over 1000 pdf documents programmatically in matlab, i am not using the command line, but using the java library in

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Tilman Hausherr
Am 02.02.2017 um 19:59 schrieb Pulkit Kapur: My apologies. This was very careless of me. I did not realize scribd would want you to register to download. I have uploaded the document here: http://www.filedropper.com/0024iros2016 My code is in Matlab (and not command line interface) and i am usi

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
My apologies. This was very careless of me. I did not realize scribd would want you to register to download. I have uploaded the document here: http://www.filedropper.com/0024iros2016 My code is in Matlab (and not command line interface) and i am using *PDFBox-0.7.3.jar* and *FontBox-0.1.0.jar* I

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Tilman Hausherr
Am 02.02.2017 um 16:10 schrieb Pulkit Kapur: Hi I have uploaded the pdf here: https://www.scribd.com/document/338221804/0024-iros-2016 Hello Pulkit, This site requires registration. This is a "don't" from the list: https://pdfbox.apache.org/support.html I don't want to register. Please find

RE: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Allison, Timothy B.
.com] On Behalf Of Pulkit Kapur Sent: Thursday, February 2, 2017 10:34 AM To: users@pdfbox.apache.org Subject: Re: Fwd: Trouble reading IEEE pdf Thanks Karl for the reply. Thats helpful. What confuses me is this" very likely because usually such an XObject would just be an image" -

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
Karl, Got it. I understand the point about XObjects and how pdfBox might be missing the XObject because typically they are images. I am hoping someone here might have had luck making pdfBox get data from XObject elements that contain text. Thanks, Pulkit On Thu, Feb 2, 2017 at 10:36 AM, Karl He

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Karl Heinz Kremer
Pulpit, I did not say that in your document the XObjects are images, I said that they usually are just images. When you analyze 100 random PDF documents, changes are that that most of them only use the XObject construct for images and vector graphic, not for elements that contain text. Your docume

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
Thanks Karl for the reply. Thats helpful. What confuses me is this" very likely because usually such an XObject would just be an image" -> I am able to select the underlying text in the XObject using acrobat and copy/paste it. Thats why i am confused why pdfbox cannot access the XObject. Perhaps

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Karl Heinz Kremer
The document does not contain layers (or optional content groups as they are called in PDF), the problem seems to be that the actual text of the document is in an XObject - something that is completely legal in a PDF file. I suspect that the text was created in one application, and then a second ap

Re: Fwd: Trouble reading IEEE pdf

2017-02-02 Thread Pulkit Kapur
Hi I have uploaded the pdf here: https://www.scribd.com/document/338221804/0024-iros-2016 I did some more diagnosis last night and it seems that there are two layers on the pdf. One which is the content and the other with headers and footers. Pdf box is only reading the headers and footers. I sus

Re: Fwd: Trouble reading IEEE pdf

2017-02-01 Thread Tilman Hausherr
Am 02.02.2017 um 05:55 schrieb Pulkit Kapur: Hi I am trying to read some past years IEEE conference proceedings i have. I can read the pdf using acrobat and select the text. But when i try to read the text using readText function from the pdfbox library, i only get the headers and footers in t