Re: Unable to convert valid pdf to html

2011-03-16 Thread varun bhansaly
Hi Thomas, It opens fine with acrobat X on Win7 X86_64, acrobat 9 on ubuntu X86_64, evince on ubuntu X86_64. I was actually more surprised by the trace error message "java.io.IOException: Expected='null' actual='nullnullnull'". Anyways, thanks for looking into it. On Wed, Mar 16, 2011 at 1:10 PM,

Re: Unable to convert valid pdf to html

2011-03-16 Thread Thomas Fischer
Hi Varun, whatever it is, there is something wrong with this file. On my Mac, Acrobat Reader 6, Preview and Skim can't open the file, and Acrobat Reader 9 starts with the message "The document is damaged but will be repaired." (translated from German) JHOVE claims it is well-formed and valid. But

Re: Unable to convert valid pdf to html

2011-03-15 Thread varun bhansaly
Hi Thomas, Thanks for the reply, have created a JIRA issue https://issues.apache.org/jira/browse/PDFBOX-982 On Wed, Mar 16, 2011 at 3:38 AM, Thomas Fischer wrote: > Hello Varun, > > I can't tell you much about the error, just want to note that > > > The file in this case is "team21_devel.pdf", p

Re: Unable to convert valid pdf to html

2011-03-15 Thread Thomas Fischer
Hello Varun, I can't tell you much about the error, just want to note that > The file in this case is "team21_devel.pdf", please note this is a valid PDF > as it gets opened in adobe reader. definitely doesn't guarantee that this is a valid PDF file (as in "conforming to given standards"). Sinc

Unable to convert valid pdf to html

2011-03-15 Thread varun bhansaly
Hi, Encountered an exception while converting a pdf to HTML/ text using pdfbox-app-1.5.0. The file in this case is "team21_devel.pdf", please note this is a valid PDF as it gets opened in adobe reader. I have used the command line utility as java -jar pdfbox-app-1.5.0.jar ExtractText -html team21_