Metadata situation and XMP support in Tika

2012-04-05 Thread Joerg Ehrlich
Hi everyone, I am an engineer in the XMP/Metadata team at Adobe and we would like to leverage Tika in current projects for metadata extraction (and mimetype detection). Our current systems primarily use the XMP data model to manage and interact with metadata. As far as I can see, the support fo

RE: Metadata situation and XMP support in Tika

2012-04-13 Thread Joerg Ehrlich
suggested. Looking forward to your participation! Cheers, Chris On Apr 5, 2012, at 5:58 AM, Joerg Ehrlich wrote: > Hi everyone, > > I am an engineer in the XMP/Metadata team at Adobe and we would like to > leverage Tika in current projects for metadata extraction (and mimetype >

RE: Metadata situation and XMP support in Tika

2012-04-13 Thread Joerg Ehrlich
things. I'd go so far as to say the Tika Metadata interface itself should cherry pick properties from other standards using that same aliasing approach rather than attempting to include the entire standard via implements which can obviously lead to name conflicts without prefixing the pro

RE: Metadata situation and XMP support in Tika

2012-04-13 Thread Joerg Ehrlich
ally is at: https://github.com/Alfresco/tika-exiftool/blob/master/src/main/java/org/apache/tika/parser/exiftool/ExiftoolTikaIptcMapper.java I'm more than happy to coordinate with you on the XMP stuff going forward if you'd like. Ray Gauss II DAM Architect, Alfresco On Apr 5, 2012, at 8:

RE: Metadata situation and XMP support in Tika

2012-04-24 Thread Joerg Ehrlich
er standard to alias keywords from than MSOffice, but I'm just sticking to the current mappings for this example. Ray On Apr 24, 2012, at 7:43 AM, Nick Burch wrote: > On Fri, 13 Apr 2012, Joerg Ehrlich wrote: >> I think it would be more clear if parsers/clients would use the

RE: Metadata situation and XMP support in Tika

2012-04-24 Thread Joerg Ehrlich
e.MS_KEYWORDS; ... we're back to the intended "give me the metadata that best fits the idea of Keywords, as defined by Tika". In this case, DublinCore.DC_SUBJECT is probably a much better standard to alias keywords from than MSOffice, but I'm just sticking to the current mappings

[metadata] roadmap proposal available on the wiki

2012-04-25 Thread Joerg Ehrlich
Hi, I have put a proposal of a roadmap for the metadata features in Tika on the wiki: http://wiki.apache.org/tika/MetadataRoadmap The proposal is based on a discussion around this topic I have had with Jukka. Please review and feel free to edit the wiki for the discussion. I will also update th

RE: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Joerg Ehrlich
Hi Chris, Thanks for your comments, >I am not strongly supportive of of changing the HashMap internal >representation in Metadata out. >A couple of things I like about the HashMap: > >* It's simple. >* It doesn't require dependency on any external libraries and helps keep >tika-core minimal. >

RE: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Joerg Ehrlich
06360 -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Mittwoch, 25. April 2012 22:40 To: Subject: Re: [metadata] roadmap proposal available on the wiki Hi Jörg, On Apr 25, 2012, at 10:27 AM, Joerg Ehrlich wrote: > >> I am not strongly su

RE: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Joerg Ehrlich
ris A (388J) wrote: > Hi Jörg, > > Thanks for your email, comments below: > > On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote: > >> Hi Chris, >> >> Those are all valid points and I agree that you could do everything with a >> Hashmap. >> Having the

RE: [metadata] roadmap proposal available on the wiki

2012-04-27 Thread Joerg Ehrlich
Hi Antoni, > The roadmap doesn't give much detail about the intended vocabularies. > Dublin core is great, but what else? Joerg? What other kinds of metadata > information would you like to extract with Tika, and what vocabularies would > you like to use to express them? > > At Adobe, you'll li

RE: [metadata] roadmap proposal available on the wiki

2012-04-27 Thread Joerg Ehrlich
+1 This does indeed look like a good combination. Jörg -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Freitag, 27. April 2012 01:33 To: Subject: Re: [metadata] roadmap proposal available on the wiki Hi Antoni, Precisely! :) That would be

[metadata] Input on reorganization of Metadata interfaces

2012-05-04 Thread Joerg Ehrlich
Hi, I wanted to start submitting patches for the following and would like your input on that: Create one "Core Properties" interface for the Metadata class which contains just the keys for the properties which should be directly addressable through the Metadata class in the future. Those are a

RE: [metadata] Input on reorganization of Metadata interfaces

2012-05-04 Thread Joerg Ehrlich
On Fri, 4 May 2012, Joerg Ehrlich wrote: >> Create one "Core Properties" interface for the Metadata class which >> contains just the keys for the properties which should be directly >> addressable through the Metadata class in the future. Those are all >> Dubl

RE: [metadata] Input on reorganization of Metadata interfaces

2012-05-08 Thread Joerg Ehrlich
-Original Message- From: Nick Burch [mailto:nick.bu...@alfresco.com] Sent: Freitag, 4. Mai 2012 23:34 To: dev@tika.apache.org Subject: RE: [metadata] Input on reorganization of Metadata interfaces On Fri, 4 May 2012, Joerg Ehrlich wrote: >>>> The keys will always link to p

RE: [metadata] Input on reorganization of Metadata interfaces

2012-05-08 Thread Joerg Ehrlich
Hi Chris, >I'm OK with the code-level implications of that, but I will just have to scope >out the patch and so forth. >Thanks for pushing this. I really appreciate your help here. Sorry, I am not a native speaker: Does that you would like to see a patch of the proposed ideas and make a decisio

RE: A plan to improve the metadata property definitions

2012-05-22 Thread Joerg Ehrlich
Hi Nick and Ray, +1 Thanks, this looks like a great step forward. It definitely helps to clean up the current metadata usage. But I still have no real idea how to represent structured properties with the current Property/Metadata setup going forward. I have done a quick review and have already a

RE: A plan to improve the metadata property definitions

2012-05-23 Thread Joerg Ehrlich
Hi Nick, On Tue, 22 May 2012, Joerg Ehrlich wrote: >> Thanks, this looks like a great step forward. It definitely helps to >> clean up the current metadata usage. But I still have no real idea how >> to represent structured properties with the current Property/Metadata >

RE: [DISCUSS] Apache Tika 1.2 RC?

2012-05-29 Thread Joerg Ehrlich
I second that Ray's comment. The core properties should be set up properly for the next release. Jörg -Original Message- From: Ray Gauss II [mailto:ray.ga...@alfresco.com] Sent: Montag, 28. Mai 2012 15:45 To: dev@tika.apache.org Subject: Re: [DISCUSS] Apache Tika 1.2 RC? It would be nic

XMP conversion module for Tika

2012-06-28 Thread Joerg Ehrlich
Hi, As discussed in earlier threads, I have created a new Tika module ("tika-xmp") which offers conversion of Tika Metadata to the XMP data model and I have added the patch to TIKA-756. The patch also contains integration with Tika-app, hooking the converter up with the "-y" output option and t

RE: XMP conversion module for Tika

2012-06-29 Thread Joerg Ehrlich
FYI: Version 5.1.1 of the XMPCore library which is compatible with JDK 1.5/1.6 is available on Maven Central now. -Original Message- From: Joerg Ehrlich [mailto:jehrl...@adobe.com] Sent: Donnerstag, 28. Juni 2012 12:15 To: dev@tika.apache.org Subject: XMP conversion module for Tika Hi

RE: JAX-RS overhead in tika-server

2012-07-02 Thread Joerg Ehrlich
I would also say that TIKA-930 should be resolved before a release is cut. Please see my last comment in that issue. And it would be great if the tika-xmp module would make it in 1.2 as well :) But it does not have any impact on the ability to release like TIKA-930, of course, as it only adds a

RE: Build failed in Jenkins: Tika-trunk #888

2012-07-03 Thread Joerg Ehrlich
A new version of XMPCore compiled for JDK 1.5 has been uploaded to Maven Central: 5.1.2 Regards Jörg -Original Message- From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] Sent: Dienstag, 3. Juli 2012 00:49 To: dev@tika.apache.org Subject: Re: Build failed in Jenkins: Tika-trunk #888

RE: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Joerg Ehrlich
+1 --- Jörg Ehrlich | Computer Scientist | XMP Technology | Adobe Systems | joerg.ehrl...@adobe.com | work: +49(40)306360 -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Dienstag, 10. Juli 2012 22:30 To: dev@tika.apache.org Cc: u...@tika.a

RE: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Joerg Ehrlich
Hi, Unfortunately I am currently in a whole week workshop. I will try to have a look at it as soon as possible. Regards jörg -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Dienstag, 17. Juli 2012 07:39 To: dev@tika.apache.org Subject: Can

RE: [ANNOUNCE] Welcome Jörg Ehrlich as new Tika PMC member and committer

2012-07-31 Thread Joerg Ehrlich
Hi everyone, First of all thank you very much. I am really looking forward to working with all of you on this interesting project! I am an engineer at Adobe located in Hamburg, Germany and I am working in a larger team which provides components and solutions for metadata management and automat