On Mon, 2 Jan 2012, Albretch Mueller wrote:
How can someone know that the heading for a PDF file corresponds to the
heading of a MS Word and or RTF file or the title on an HTML file
corresponds to the title of a media file?
They can't - both formats allow you to make something look like a head
t; End-of-central-directory signature not found. Either this file is not
> a zipfile, or it constitutes one disk of a multi-part archive. In the
> latter case the central directory and zipfile comment will be found on
> the last disk(s) of this archive.
> note: apache-tika-1.0-
rchive: apache-tika-1.0-src.zip
> End-of-central-directory signature not found. Either this file is not
> a zipfile, or it constitutes one disk of a multi-part archive. In the
> latter case the central directory and zipfile comment will be found on
> the last disk(s) o
ache-tika-1.0-src.zip.ZIP, period.
~
On 1/1/12, Nick Burch wrote:
> On Sat, 31 Dec 2011, Albretch Mueller wrote:
>> I think "all major file formats" should be somehow functionally
>> specified through something like
>> ~
>> core.tika.formatHandlers.get
On Sat, 31 Dec 2011, Albretch Mueller wrote:
I think "all major file formats" should be somehow functionally
specified through something like
~
core.tika.formatHandlers.getAll[DefinedFormat]Handlers
In code:
TikaConfig config = TikaConfig.getDefaultConfig();
Set
~
http://projects.apache.org/projects/tika.html
~
http://tika.apache.org/1.0/formats.html
~
say that it: " ... easily detect(s) and extract(s) metadata and
content from all major file formats"
~
I think "all major file formats" should be somehow functionally
specified th