On Mon, Apr 2, 2012 at 2:58 PM, Peter Murray-Rust <[email protected]> wrote:
> > > On Mon, Apr 2, 2012 at 2:51 PM, Andrey Kuznetsov <[email protected]> wrote: > >> Peter, you have to pass your own Graphics2D object (with some overridden >> methods) to pdfbox. >> >> I am making good progress in capturing graphics primitives by using the Apache Batik. I have managed to intercept the Graphics2D by generating a Batik SVGGraphics2D: org.w3c.dom.DOMImplementation domImpl = org.apache.batik.dom.GenericDOMImplementation.getDOMImplementation(); String svgNS = "http://www.w3.org/2000/svg"; org.w3c.dom.Document document = domImpl.createDocument(svgNS, "svg", null); SVGGraphics2D svgGraphics2D = new org.apache.batik.svggen.SVGGraphics2D(document); I then pass this into PDFReader and use drawer.drawPage( svgGraphics2D, page, drawDimension ); and Writer svgwriter = new StringWriter(); svgGraphics2D.stream(svgwriter, useCSS); svgwriter.close(); and then analyse the SVGDom in svgwiter.toString(). This works, but with problems. The first implementation created outline fonts in Batik (i.e. closed polylines for glyphs). I have then tried to clean up the code and it now creates <text> objects with characters, but without a font and without a font-size(). Do you have suggestions as to how I can best capture the text info reliably. I don't mind dealing with outline fonts as I have to do that for user-created graphics anyway and have a good store of fonts. But I'd like to know what switches need to be set. And I have a worry that I ma failing to load fonts somewhere. Any help much appreciated, but thanks anyway for progress so far. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

