On Sat, Nov 17, 2012 at 7:10 PM, Duane Nickull <
[email protected]> wrote:

> Very cool project!  I did not see any EULA on this declaring a GPL or
> similar style license.


Apache 2. I intended to include a LICENSE  but I've probably missed it by
mistake. Will add LICENSE today. We don't use GPL as we want AMI2 to be
usable in any sort of application.

What license are you using?  I would like to
> introduce this work to some people.
>

We work in a completely Open manner. Some of us are connected with the Open
Knowledge Foundation and its work (http://okfn.org) and we shall be using
it for open Content Mining (though of course it can be used for any
purpose). The AMI2 project as a whole will probably be coordinated (in a
loose sense) there as we are interested in the SciTechMed applications

We'd like to know of people's experiences and obviously if anyone can
contribute (say) information on fonts and glyphs that's probably the most
obviously generic thing at present.

Known issues:
* bitmap images are bypassed just to save time and space in testing. Should
be an hour or two to add them.
* some publishers use fonts in unusual colour maps and these have a serious
impact on performance (ten times slower). That probably need a small filter
in PDFBox.
* output is verbose (each character has a clip path). We can normalize
this. There's also quite a lot of debug (e.g. <svg:title> which allows
mouseover of characters for debugging.

One issue is where we normalize characters. Authors and readers of STM
documents are not well versed in typesetting and so INCREMENT (U+2206)
would be replaced immediately by GREEK CAPITAL LETTER DELTA  (U+0394).
Similarly we will expand ligatures ("ffl") which most people don't even
know exist!

P.

>
> Thank you for sharing!
>
> So people can share in return.


> Duane Nickull
> ***********************************
> Technoracle Advanced Systems Inc.
> Consulting and Contracting; Proven Results!
> i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
> b. http://technoracle.blogspot.com
> t.  @duanechaos
> "Don't fear the Graph!  Embrace Neo4J"
>
>
>
>
>
>
-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply via email to