Hi, i am playing around with the solrj mode of the solr output connector, to avoid running tika extraction in solr.
My problem is, that the ingestion of web pages gets rejected with the message "Solr connector rejected document due to mime type restrictions: (text/html; charset=UTF-8)" My pipeline looks like this: 1) Webcrawler Connector (Repository Connection) 2) Tika Extractor (Transformation) 3) Solr Connector (Output Connection) The webserver returns content type "text/html; charset=UTF-8" for the pages. The "Use extracting request handler" option is disabled in the solr output connection. The mimetype inclusions in the solr output connector are: text/plain;charset=utf-8 text/html text/html; charset=UTF-8 I think the ingestion gets rejected by the HttpPoster, because it performs a hard check that the mime type has to be a "text/plain*" type (see acceptableMimeTypes in HttpPoster). The TikaExtractor asks if downstream pipeline accepts "text/plain;charset=utf-8" as this is the result of the extraction. But the sent RepositoryDocument still carries the original mimetype before the extraction. Is this a bug or am i missing something? Many thanks in advance Markus