Thank you, Tilman! -----Original Message----- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Monday, February 27, 2017 9:38 AM To: us...@pdfbox.apache.org Cc: user@tika.apache.org Subject: Re: Extracting vector graphics from pdf
http://stackoverflow.com/a/38933039/535646 This allows to collect the lines. However it won't output an image. Tilman Am 27.02.2017 um 13:20 schrieb Allison, Timothy B.: > PDFBox Colleagues, > Any recommendations? > > Best, > > Tim > > -----Original Message----- > From: Andisa Dewi [mailto:theknight...@yahoo.com] > Sent: Monday, February 27, 2017 5:32 AM > To: user@tika.apache.org > Subject: Extracting vector graphics from pdf > > Hello guys, > > I'm currently extracting images from a whole lot of pdf files, however some > of images (or figures) are somehow not extracted. I'm thinking it might have > to do with the fact that those images are vector graphics (as usually the > case in a lot of scientific papers). My question is, is it possible to > extract vector graphics from pdfs using Tika? > > I attached an example of the pdf (here for example, all images are extracted > except Figure 2). > > The way I'm extracting the images are the same as in the example code: > > Parser parser = new AutoDetectParser(); Metadata m = new Metadata(); > ParseContext c = new ParseContext(); ContentHandler h = new > BodyContentHandler(-1); PDFParserConfig pdfConfig = new > PDFParserConfig(); pdfConfig.setExtractInlineImages(true); > c.set(PDFParserConfig.class, pdfConfig); c.set(Parser.class, parser); > EmbeddedDocumentExtractor ex = new MyEmbeddedDocumentExtractor(c); > c.set(EmbeddedDocumentExtractor.class, ex); parser.parse(inputstream, > h, m, c); > > > Thanks! > > Regards, > > Eli > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org