Thanks for your response. So, for text/plain guessed as application/octet-stream I suppose the problem comes from couchdb itself, as: "t...@blackberry:~/couchdb-lucene$ file test test: UTF-8 Unicode text".
On the other hand, for "text/x-patch" and "text/whatether", Metadata.CONTENT_TYPE could be filled in tika calls with "text/plain" via a matching table. 'Just an idea... :) Robert Newson wrote:
couchdb-lucene uses the content-type stored in couchdb when parsing attachments. couchdb-lucene then uses Apache Tika to parse the attachments, and it is there that support for new MIME types should be requested. A list of currently supported MIME types is available at; http://github.com/rnewson/couchdb-lucene B.
