Re: ... all major file formats

2012-01-02 Thread Nick Burch
On Mon, 2 Jan 2012, Albretch Mueller wrote: How can someone know that the heading for a PDF file corresponds to the heading of a MS Word and or RTF file or the title on an HTML file corresponds to the title of a media file? They can't - both formats allow you to make something look like a head

Re: ... all major file formats

2012-01-02 Thread Alex Ott
t;  End-of-central-directory signature not found.  Either this file is not >  a zipfile, or it constitutes one disk of a multi-part archive.  In the >  latter case the central directory and zipfile comment will be found on >  the last disk(s) of this archive. > note:  apache-tika-1.0-

Re: ... all major file formats

2012-01-01 Thread Albretch Mueller
rchive: apache-tika-1.0-src.zip > End-of-central-directory signature not found. Either this file is not > a zipfile, or it constitutes one disk of a multi-part archive. In the > latter case the central directory and zipfile comment will be found on > the last disk(s) o

Re: ... all major file formats

2012-01-01 Thread Albretch Mueller
ache-tika-1.0-src.zip.ZIP, period. ~ On 1/1/12, Nick Burch wrote: > On Sat, 31 Dec 2011, Albretch Mueller wrote: >> I think "all major file formats" should be somehow functionally >> specified through something like >> ~ >> core.tika.formatHandlers.get

Re: ... all major file formats

2011-12-31 Thread Nick Burch
On Sat, 31 Dec 2011, Albretch Mueller wrote: I think "all major file formats" should be somehow functionally specified through something like ~ core.tika.formatHandlers.getAll[DefinedFormat]Handlers In code: TikaConfig config = TikaConfig.getDefaultConfig(); Set

... all major file formats

2011-12-31 Thread Albretch Mueller
~ http://projects.apache.org/projects/tika.html ~ http://tika.apache.org/1.0/formats.html ~ say that it: " ... easily detect(s) and extract(s) metadata and content from all major file formats" ~ I think "all major file formats" should be somehow functionally specified th