RE: [metadata] roadmap proposal available on the wiki

2012-04-27 Thread Joerg Ehrlich
Hi Antoni, The roadmap doesn't give much detail about the intended vocabularies. Dublin core is great, but what else? Joerg? What other kinds of metadata information would you like to extract with Tika, and what vocabularies would you like to use to express them? At Adobe, you'll likely

RE: [metadata] roadmap proposal available on the wiki

2012-04-27 Thread Joerg Ehrlich
+1 This does indeed look like a good combination. Jörg -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Freitag, 27. April 2012 01:33 To: dev@tika.apache.org Subject: Re: [metadata] roadmap proposal available on the wiki Hi Antoni

RE: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Joerg Ehrlich
-Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Mittwoch, 25. April 2012 22:40 To: dev@tika.apache.org Subject: Re: [metadata] roadmap proposal available on the wiki Hi Jörg, On Apr 25, 2012, at 10:27 AM, Joerg Ehrlich wrote: I am not strongly

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
Hi Jörg, Thanks for your email, comments below: On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote: Hi Chris, Those are all valid points and I agree that you could do everything with a Hashmap. Having the parsers fill the Metadata class and its Hashmap with all needed information which

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Ray Gauss II
I think besides the namespaces, one of the issues Jörg is trying to tackle is the structured metadata and the extra time and effort referred to is dealing with serialization of structured data to and from a hashmap. For example I may have metadata similar to: Contact1 |-- First Name |-- Last

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
Message- From: Ray Gauss II [mailto:ray.ga...@alfresco.com] Sent: Donnerstag, 26. April 2012 18:03 To: dev@tika.apache.org Subject: Re: [metadata] roadmap proposal available on the wiki I think besides the namespaces, one of the issues Jörg is trying to tackle is the structured metadata

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Antoni Mylka
2012/04/25 Joerg Ehrlich napisał/wrote: Hi, I have put a proposal of a roadmap for the metadata features in Tika on the wiki: http://wiki.apache.org/tika/MetadataRoadmap The proposal is based on a discussion around this topic I have had with Jukka. Please review and feel free to edit the wiki

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Antoni Mylka
2012/04/26 Mattmann, Chris A (388J) napisał/wrote: Hi Guys, One comment RE: the below too -- this is precisely where I see Any23 coming into play and why there is a strong relationship between it and Tika: http://incubator.apache.org/any23/ I'm the current Champion for the project and the

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
Hi Antoni, Precisely! :) That would be awesome huh. And, my goal there too is to turn Any23 parsers into Tika parsers too as I think they could be one and the same (with an RDF or XMP or RSS ContentHandler transforming the Tika intermediate SAX output the same). Cheers, Chris On Apr 26, 2012,

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, First off, thanks for taking the time to put your thoughts down on the Wiki. I will try to leverage that for helping push these ideas forward. I am +1 on most of the things you proposed. Regarding: {quote} Use XMP instead of Hashmap in Metadata class The idea is to have just one

RE: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Joerg Ehrlich
Hi Chris, Thanks for your comments, I am not strongly supportive of of changing the HashMap internal representation in Metadata out. A couple of things I like about the HashMap: * It's simple. * It doesn't require dependency on any external libraries and helps keep tika-core minimal.

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, On Apr 25, 2012, at 10:27 AM, Joerg Ehrlich wrote: I am not strongly supportive of of changing the HashMap internal representation in Metadata out. A couple of things I like about the HashMap: * It's simple. * It doesn't require dependency on any external libraries and helps