Hi- I can't seem to make any of the transfomers work, I am using the DataImporter to pull in data from a wordpress instance (see below). Neither REGEX or HTMLStrip seems to do anything to my content.
Do I have to include a separate jar with the transformers? Are the transformers in 1.4 (particularly the HTMLStrip)? James On Wed, Mar 10, 2010 at 10:47 PM, James Ostheimer <james.osthei...@gmail.com > wrote: > HI- > > I am working a contract to index some wordpress data. For the posts I of > course have html in the content of the column, I'd like to strip it out. > Here is my data importer config > > <dataConfig> > <dataSource driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://localhost:3306/econetsm" user="*******" password="*******" > /> > <document> > <entity name="post" transformer="HTMLStripTransformer" > query="SELECT id, post_content, post_title FROM elinstmkting_posts e" > onError="abort" > deltaQuery="SELECT * FROM elinstmkting_posts e where > post_modified_gmt > '${dataimporter.last_index_time}'"> > <field column="POST_TITLE" name="post_title" > stripHTML="false"/> > <field column="POST_CONTENT" name="post_content" > stripHTML="true" /> > </entity> > </document> > </dataConfig> > > Looks perfect according to the wiki docs, but the html is found when I > search for "strong" (<strong> tag) and html is returned in the field. > > I assume I am doing something stupid wrong, I am using the latest stable > solr (1.4.0). > > Does it matter that the post data is not a complete html document (it > doesn't have a <html> start tag or a <body> tag)? > > James >