> > I think that Metadata should be of the form: > > > > [key]=>1...n [value] > > > > Where [key]'s are modifiable (as are [value]'s) > > > > This is what you are expressing, correct? > > Yes, though I'm not sure where the 1...n requirement for metadata > values comes from.
It comes from Nutch, and more generaly from HTTP where a header (a metadata) can be multivalued. > I'm my proposal I'd handle such needs more > generally by allowing structured metadata values, not just strings. If I understand, instead of storing many values for a specified key, I will store a List of values? In the Nutch case you mentioned, I would ask Nutch to understand the > difference between the various forms of HTTP headers and to normalize > that metadata before feeding it to Tika. After all, there's nothing > HTTP-specific in Tika, whereas Nutch knows much more about the > relevant details and actual reality out there. +1 > parsing operation. Since Tika doesn't need to worry about serializing > the metadata, we should IMHO opt for structured data types instead of > strings where appropriate. +1 but it adds a level of complexity in metadata handling for tika users : knowing the type associated to a specific metadata, no? (I agree that it is more or less the case with date or url values serialized) > > > > 7) No two distinct metadata keys should be used for the same metadata > > > semantics. > > > > Could you elaborate on this with an explicit example? > > For example, we currently have both DublinCore.FORMAT and > HttpHeaders.CONTENT_TYPE whose semantics are largely overlapping. Each > such case adds ambiguity and makes automatic metadata processing > harder. -1 HttpHeaders.CONTENT_TYPE and DublinCore.FORMAT have the same semantic but they doesn't come from the same level of information : HTTP is a low level of information and Dublin is a high level => Tika client should have access to both information and then guess what is the more reliable information in their case. Best Regards Jérôme -- Jérôme Charron Directeur Technique @ WebPulse Tel: +33673716743 - [EMAIL PROTECTED] http://blog.shopreflex.com/ http://www.shopreflex.com/ http://www.staragora.com/