hi yes I dont have html as documents I have data saved in sql data base in HTML format and I want to index it on solr but not as complete string that is with tags but just want to index the actual text in it...that is strip off the tags.
regards Rohan On Wed, Feb 20, 2013 at 6:40 PM, Gora Mohanty <g...@mimirtech.com> wrote: > On 20 February 2013 18:31, Rohan Thakur <rohan.i...@gmail.com> wrote: > > hi all > > > > I have data stored in HTML format in a column in sql database and want to > > index the data from that field to solr how can I do that any one has idea > > please help. right now i am treating it as a string which is indexing > > complete HTML with tags as one string to solr. > > How do you want to process the HTML? If you simply want to > strip HTML tags, please take a look at the HTMLStripTransformer > http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer > > Your title implies that you want to parse the HTML in some > fashion. If so, you will need to do that on your own, e.g., by > using a transformer. > > Regards, > Gora >