Mime-type is added via the index-more plugin. By default it creates multiple 
values e.g. text/html, text and html for a HTML page. It can also be configured 
to output only text/html pair (see nutch-default for an example).

I've never indexed multimedia data so i can't help there, but what's not 
working in Tika? I know Tika will do mp3 and jpeg but not video's (except 
Flash). Haven't seen ogg around as well.

Nutch passes unmapped mime types to Tika.

> Hi everyone,
> 
> I'm trying to index images (jpeg, exif data), videos and audio (mp3,
> ogg, id3 data) but tika is not working.
> 
> How can I index those files and create the respective fields ?
> Also I don't found how to store the mime type of the files indexed.
> 
> Basically I need to index sites with multimedia.
> 
> Thanks,

Reply via email to