On 10.03.2008 11:03:07 Jukka Zitting wrote:
> Hi,
> 
> On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki
> <[EMAIL PROTECTED]> wrote:
> > *g* Sounds a lot like what I built in XML Graphics Commons with the XMP
> >  support:
> 
> XMP is a valid option. I briefly looked at the Adobe XMP library and
> JempBox as options, but I'm a bit worried about the complexity of the
> API and the fact that there is little guidance on what metadata
> properties to use for which purposes.

Take a look at the XMP specification [2]. It contains documentation for
a number of metadata schemas.

[1] http://www.adobe.com/products/xmp/index.html
[2] http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf

Of course, some properties might be missing which Tika might need. But
they can be defined by Tika in your own schema and you can provide your
own adapter class for easy, type-safe access.

> I agree that using a standard metadata representation is very useful,
> but is it worth the extra complexity? At least we should find a way to
> cover requirements 4, 6, and 8 on top of XMP.

That's why I added the link to:
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/examples/java/xmp/MetadataFromScratch.java?view=markup
See also:
http://svn.apache.org/repos/asf/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/xmp/schemas/

You can see how easy it is to access the individual values (type-safe) while
still offering generic access to the properties. The documentation (your
no 6) can be done through Javadocs on the adapter classes and, if
necessary, a separate XML containing the Schema from which you can
generate tables as found in the XMP specification. The PDF/A standard
even contains a schema expressed in XMP that allows to describe XMP
schemas (not that this is very legible, something simpler is probably
better).

I'm pretty sure that things such as thumbnail can also be mapped. When
serialized to an XMP packet that would simply be converted into a
RFC2397 data URL.


HTH
Jeremias Maerki

Reply via email to