Mime-type is added via the index-more plugin. By default it creates multiple values e.g. text/html, text and html for a HTML page. It can also be configured to output only text/html pair (see nutch-default for an example).
I've never indexed multimedia data so i can't help there, but what's not working in Tika? I know Tika will do mp3 and jpeg but not video's (except Flash). Haven't seen ogg around as well. Nutch passes unmapped mime types to Tika. > Hi everyone, > > I'm trying to index images (jpeg, exif data), videos and audio (mp3, > ogg, id3 data) but tika is not working. > > How can I index those files and create the respective fields ? > Also I don't found how to store the mime type of the files indexed. > > Basically I need to index sites with multimedia. > > Thanks,

