Hi Markus, I've read that tika is not parsing mp3 because of the copyright. Currently there is a patch to parse mp3 ?
Regards, On Tue, 2011-05-10 at 00:01 +0200, Markus Jelsma wrote: > Mime-type is added via the index-more plugin. By default it creates multiple > values e.g. text/html, text and html for a HTML page. It can also be > configured > to output only text/html pair (see nutch-default for an example). > > I've never indexed multimedia data so i can't help there, but what's not > working in Tika? I know Tika will do mp3 and jpeg but not video's (except > Flash). Haven't seen ogg around as well. > > Nutch passes unmapped mime types to Tika. > > > Hi everyone, > > > > I'm trying to index images (jpeg, exif data), videos and audio (mp3, > > ogg, id3 data) but tika is not working. > > > > How can I index those files and create the respective fields ? > > Also I don't found how to store the mime type of the files indexed. > > > > Basically I need to index sites with multimedia. > > > > Thanks,

