[jira] Commented: (SOLR-887) HTMLStripTransformer for DIH

Shalin Shekhar Mangar (JIRA) Sun, 30 Nov 2008 07:40:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651839#action_12651839
 ]


Shalin Shekhar Mangar commented on SOLR-887:
--------------------------------------------

bq. There is another usecase where the data may come directly from a 
HttpDataSource/FileDataSource . How can we directly ingest that data?
Do you mean directly reading from the Reader given by HttpDataSource and 
FileDataSource and stripping off HTML from it without needing to create an 
in-memory Map?

> HTMLStripTransformer for DIH
> ----------------------------
>
>                 Key: SOLR-887
>                 URL: https://issues.apache.org/jira/browse/SOLR-887
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Ahmed Hammad
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: patch-887.patch, SOLR-887.patch
>
>
> A Transformer implementation for DIH which strip off HTML tags using the Solr 
> class org.apache.solr.analysis.HTMLStripReader
> This is useful in case you don't need this HTML tags anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-887) HTMLStripTransformer for DIH

Reply via email to