Re: Customizing Metadata Keys

2014-10-09 Thread Nick Burch
On Wed, 8 Oct 2014, Can Duruk wrote: My question is regarding setting the metadata keys coming from the parsers to my own keys. For my application, I am using Tika to extract the metadata for a bunch of files. I am using the embedded HTTP server which I modified for my needs to return instead of

Formatted Content Extraction and Title Detection

2014-10-09 Thread imyuka
Hi all, Here is my problem: I have extracted plain texts from a serious of doc(x) documents and their titles via the "dc:title" label of metadata, but I'm not sure this is the right way to attain a title of a document. In many cases, a title inside a document could be of the largest font-s

Re: Formatted Content Extraction and Title Detection

2014-10-09 Thread Nick Burch
On Thu, 9 Oct 2014, imyuka wrote: Here is my problem: I have extracted plain texts from a serious of doc(x) documents and their titles via the "dc:title" label of metadata, but I'm not sure this is the right way to attain a title of a document. In many cases, a title inside a document could be

Re:Re: Formatted Content Extraction and Title Detection

2014-10-09 Thread imyuka
Thanks Nick, I really appreciate it. In this case, does it suppose that formatted context extraction can only be processed by producing corresponding XHTML file as output? I roughly checked up the book and found the instruction about transforming a document to a XHTML file with command line, w

Re:Re: Formatted Content Extraction and Title Detection

2014-10-09 Thread Nick Burch
On Thu, 9 Oct 2014, imyuka wrote: I roughly checked up the book and found the instruction about transforming a document to a XHTML file with command line, while I have no idea about the Java coding implementation. Are there any instructions or tutorials I can refer to? We have quite a few ex

Re: Customizing Metadata Keys

2014-10-09 Thread Chris Mattmann
Perhaps a re-mapping downstream ContentHandler that takes in the Metadata object and will reformat the Reply-To: Date: Thursday, October 9, 2014 at 12:32 PM To: Subject: Re: Customizing Metadata Keys >On Wed, 8 Oct 2014, Can Duruk wrote: >> My question is regarding setting the metadata keys com

Re: Customizing Metadata Keys

2014-10-09 Thread Can Duruk
>I'd suggest you do the mapping from Tika keys to your keys in the server. >All the parsers should return consistent keys, so the "output" side is >the >best place to map. That seems to be the now-obvious solution, thanks for the suggestion. > Perhaps a re-mapping downstream ContentHandler > that

RE: Customizing Metadata Keys

2014-10-09 Thread Allison, Timothy B.
I agree with Nick’s recommendation on post-parsing key mapping, and I’d like to put in a plug for the RecursiveParserWrapper, which may be of use for you. I’ve been intending to add that to the app commandline and to server…how are you handling embedded document metadata? Would the wrapper be

Re: Customizing Metadata Keys

2014-10-09 Thread Can Duruk
> I agree with Nick’s recommendation on post-parsing key mapping, and I’d like to put in a plug for the RecursiveParserWrapper, which may be of use for you. I’ve been intending to add that to the app commandline and to server…how are you handling embedded document metadata? Would the wrapper be o