Hi. I'm running into the "org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionary" exception parsing quite often certain PDFs with Tika. I noticed that it's been fixed in the trunk of PDFBox (0.8.0):
https://issues.apache.org/jira/browse/PDFBOX-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638409#action_12638409 Unfortunately this version of PDFBox is not a drop-in replacement since they shuffled things around and it now exists under the org.apache.pdfbox package instead of org.pdfbox. Is there a timeline for upgrading to PDFBox 0.8.0? Perhaps the upgrade could be done in a branch that could be merged once 0.8.0 is released? If it's a simple matter of replacing "org.pdfbox" with "org.apache.pdfbox" I could volunteer for that, but if the upgrade is more complicated it may very well be beyond my meager Java skills. thanks, Phil http://technomancy.us
