[
https://issues.apache.org/jira/browse/SOLR-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651839#action_12651839
]
Shalin Shekhar Mangar commented on SOLR-887:
--------------------------------------------
bq. There is another usecase where the data may come directly from a
HttpDataSource/FileDataSource . How can we directly ingest that data?
Do you mean directly reading from the Reader given by HttpDataSource and
FileDataSource and stripping off HTML from it without needing to create an
in-memory Map?
> HTMLStripTransformer for DIH
> ----------------------------
>
> Key: SOLR-887
> URL: https://issues.apache.org/jira/browse/SOLR-887
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Reporter: Ahmed Hammad
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 1.4
>
> Attachments: patch-887.patch, SOLR-887.patch
>
>
> A Transformer implementation for DIH which strip off HTML tags using the Solr
> class org.apache.solr.analysis.HTMLStripReader
> This is useful in case you don't need this HTML tags anyway.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.