Re: Packages and attributes

2010-08-02 Thread Mattmann, Chris A (388J)
Thanks Paul! On 8/2/10 1:18 PM, "Paul Jakubik" wrote: I have added Juka Zitting's recursive metadata example to the Tika wiki at http://wiki.apache.org/tika/RecursiveMetadata. I also added some notes on what I did so I could get the metadata for a nested document along with the text for that do

Re: Packages and attributes

2010-08-02 Thread Paul Jakubik
I have added Juka Zitting's recursive metadata example to the Tika wiki at http://wiki.apache.org/tika/RecursiveMetadata. I also added some notes on what I did so I could get the metadata for a nested document along with the text for that document. Finally, I modified the http://wiki.apache.org/ti

Re: Packages and attributes

2010-07-16 Thread Mattmann, Chris A (388J)
Hi Paul, Sure. Feel free to sign up for an account (it's free and pretty simple) and then you can just copy/paste and start a wiki page on your own. We welcome your contribution! Cheers, Chris On 7/16/10 8:29 AM, "Paul Jakubik" wrote: Thank you for this example! Is there any chance this exa

Re: Packages and attributes

2010-07-16 Thread Paul Jakubik
Thank you for this example! Is there any chance this example could be added to the Tika wiki? On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting wrote: > Hi, > > On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik > wrote: > > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting >wrote: > >> The way I recommen

Re: Packages and attributes

2010-07-16 Thread Jukka Zitting
Hi, On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik wrote: > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting wrote: >> The way I recommend is to pass a custom Parser implementation through >> the ParseContext. This gives you detailed access to each component >> document. > > I looked at the code a l

Re: Packages and attributes

2010-07-15 Thread Paul Jakubik
On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting wrote: > The way I recommend is to pass a custom Parser implementation through > the ParseContext. This gives you detailed access to each component > document. > > I looked at the code a little further, and I don't see exactly how I can do this. I am

Re: Packages and attributes

2010-07-15 Thread Paul Jakubik
On Thu, Jul 15, 2010 at 6:30 AM, Nick Burch wrote: > > Having looked through your proposed solutions, I can't see easy ways to > implement these use cases: > * enumerate all the Metadata objects at this depth > eg top level has one Metadata object (for the parent file), 1 level > down may have

Re: Packages and attributes

2010-07-15 Thread Paul Jakubik
On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting wrote: > The way I recommend is to pass a custom Parser implementation through > the ParseContext. This gives you detailed access to each component > document. > > You noted that this approach wouldn't work for recursive metadata. Why? > > I didn't th

Re: Packages and attributes

2010-07-15 Thread Jukka Zitting
Hi, On Thu, Jul 15, 2010 at 1:14 AM, Paul Jakubik wrote: > I'm hoping that the developers can quickly reach a consensus on how to > change the metadata handling so users can get to metadata for nested > documents. The way I recommend is to pass a custom Parser implementation through the ParseCon

Re: Packages and attributes

2010-07-15 Thread Nick Burch
On Wed, 14 Jul 2010, Paul Jakubik wrote: I created a wiki page for this discussion ( http://wiki.apache.org/tika/MetadataDiscussion). I don't know if that is what you were thinking of. Looks good to me! Having looked through your proposed solutions, I can't see easy ways to implement these us

Re: Packages and attributes

2010-07-14 Thread Paul Jakubik
On Mon, Jul 12, 2010 at 10:37 AM, Nick Burch wrote: > Assuming I've got all of the above correct, it might be worth creating a > wiki page for this (probably + referencing jira entry), and start trying to > work up a proposed solution that'll handle all the above problems and use > cases. > I cre

Re: Packages and attributes

2010-07-12 Thread Paul Jakubik
On Mon, Jul 12, 2010 at 12:59 PM, Alex Ott wrote: > > May be it worth to separate metadata of top-level objects from metadata of > embedded objects? And allow to traverse through hierarchy of embedded > objects? And provide several implementations, something like: collector of > metadata for all

Re: Packages and attributes

2010-07-12 Thread Alex Ott
Re Paul Jakubik at "Mon, 12 Jul 2010 11:26:16 -0500" wrote: PJ> On Mon, Jul 12, 2010 at 10:37 AM, Nick Burch wrote: PJ> I've tried to summarize the various use cases mentioned in your email. PJ> Please let me know if I have correctly captured everything. PJ> - *Containers that are conceptua

Re: Packages and attributes

2010-07-12 Thread Nick Burch
On Mon, 12 Jul 2010, Paul Jakubik wrote: I've tried to summarize the various use cases mentioned in your email. Please let me know if I have correctly captured everything. You seem to have got all the cases I can think of, but it's quite possible that someone else will think up another one :)

Re: Packages and attributes

2010-07-12 Thread Paul Jakubik
On Mon, Jul 12, 2010 at 10:37 AM, Nick Burch wrote: > On Mon, 12 Jul 2010, Paul Jakubik wrote: > >> I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like >> to get access to the metadata for the individual files inside of the >> package. >> > > I believe there are two differen

Re: Packages and attributes

2010-07-12 Thread Nick Burch
On Mon, 12 Jul 2010, Paul Jakubik wrote: I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like to get access to the metadata for the individual files inside of the package. I believe there are two different tika enhancements for container formats needed. The first is fo

Packages and attributes

2010-07-12 Thread Paul Jakubik
Hi, I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like to get access to the metadata for the individual files inside of the package. It looks like there has been some discussion about how to provide the metadata, and from looking at the code I don't think any of the propos