[ https://issues.apache.org/jira/browse/PDFBOX-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-1091. ----------------------------------- Resolution: Cannot Reproduce Closing this one as no sample file was provided; the bug may have been fixed long ago, the getEncodingFromFont() call no longer exists in PDFont, and type1 fonts are handled differently now, even in the 1.8 version. > NPE in PDFont.getEncodingFromFont > --------------------------------- > > Key: PDFBOX-1091 > URL: https://issues.apache.org/jira/browse/PDFBOX-1091 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Reporter: Robert Russo > > I'm using solr version 3.3 and am trying to index pdf documents. I keep > getting the following error: > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read > content Processing Document # 199 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.ParserDecorator$1@6f340905 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128) > ... 8 more > Caused by: java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.font.PDFont.getEncodingFromFont(PDFont.java:832) > at > org.apache.pdfbox.pdmodel.font.PDFont.determineEncoding(PDFont.java:293) > at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:178) > at > org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:79) > at > org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:139) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:109) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76) > at > org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:441) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:365) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:321) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > ... 10 more -- This message was sent by Atlassian JIRA (v6.2#6252)