On Sat, Nov 17, 2012 at 7:10 PM, Duane Nickull < [email protected]> wrote:
> Very cool project! I did not see any EULA on this declaring a GPL or > similar style license. Apache 2. I intended to include a LICENSE but I've probably missed it by mistake. Will add LICENSE today. We don't use GPL as we want AMI2 to be usable in any sort of application. What license are you using? I would like to > introduce this work to some people. > We work in a completely Open manner. Some of us are connected with the Open Knowledge Foundation and its work (http://okfn.org) and we shall be using it for open Content Mining (though of course it can be used for any purpose). The AMI2 project as a whole will probably be coordinated (in a loose sense) there as we are interested in the SciTechMed applications We'd like to know of people's experiences and obviously if anyone can contribute (say) information on fonts and glyphs that's probably the most obviously generic thing at present. Known issues: * bitmap images are bypassed just to save time and space in testing. Should be an hour or two to add them. * some publishers use fonts in unusual colour maps and these have a serious impact on performance (ten times slower). That probably need a small filter in PDFBox. * output is verbose (each character has a clip path). We can normalize this. There's also quite a lot of debug (e.g. <svg:title> which allows mouseover of characters for debugging. One issue is where we normalize characters. Authors and readers of STM documents are not well versed in typesetting and so INCREMENT (U+2206) would be replaced immediately by GREEK CAPITAL LETTER DELTA (U+0394). Similarly we will expand ligatures ("ffl") which most people don't even know exist! P. > > Thank you for sharing! > > So people can share in return. > Duane Nickull > *********************************** > Technoracle Advanced Systems Inc. > Consulting and Contracting; Proven Results! > i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile > b. http://technoracle.blogspot.com > t. @duanechaos > "Don't fear the Graph! Embrace Neo4J" > > > > > > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

