On 10.03.2008 11:03:07 Jukka Zitting wrote: > Hi, > > On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki > <[EMAIL PROTECTED]> wrote: > > *g* Sounds a lot like what I built in XML Graphics Commons with the XMP > > support: > > XMP is a valid option. I briefly looked at the Adobe XMP library and > JempBox as options, but I'm a bit worried about the complexity of the > API and the fact that there is little guidance on what metadata > properties to use for which purposes.
Take a look at the XMP specification [2]. It contains documentation for a number of metadata schemas. [1] http://www.adobe.com/products/xmp/index.html [2] http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf Of course, some properties might be missing which Tika might need. But they can be defined by Tika in your own schema and you can provide your own adapter class for easy, type-safe access. > I agree that using a standard metadata representation is very useful, > but is it worth the extra complexity? At least we should find a way to > cover requirements 4, 6, and 8 on top of XMP. That's why I added the link to: http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/examples/java/xmp/MetadataFromScratch.java?view=markup See also: http://svn.apache.org/repos/asf/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/xmp/schemas/ You can see how easy it is to access the individual values (type-safe) while still offering generic access to the properties. The documentation (your no 6) can be done through Javadocs on the adapter classes and, if necessary, a separate XML containing the Schema from which you can generate tables as found in the XMP specification. The PDF/A standard even contains a schema expressed in XMP that allows to describe XMP schemas (not that this is very legible, something simpler is probably better). I'm pretty sure that things such as thumbnail can also be mapped. When serialized to an XMP packet that would simply be converted into a RFC2397 data URL. HTH Jeremias Maerki