DIH has special handling for upper & lower case field names. It is possible your config is running afoul of this.
Try using different names for the Solr fields than the database fields. On 3/11/10, James Ostheimer <james.osthei...@gmail.com> wrote: > Hi- > > I can't seem to make any of the transfomers work, I am using the > DataImporter to pull in data from a wordpress instance (see below). Neither > REGEX or HTMLStrip seems to do anything to my content. > > Do I have to include a separate jar with the transformers? Are the > transformers in 1.4 (particularly the HTMLStrip)? > > James > > On Wed, Mar 10, 2010 at 10:47 PM, James Ostheimer <james.osthei...@gmail.com >> wrote: > >> HI- >> >> I am working a contract to index some wordpress data. For the posts I of >> course have html in the content of the column, I'd like to strip it out. >> Here is my data importer config >> >> <dataConfig> >> <dataSource driver="com.mysql.jdbc.Driver" >> url="jdbc:mysql://localhost:3306/econetsm" user="*******" >> password="*******" >> /> >> <document> >> <entity name="post" transformer="HTMLStripTransformer" >> query="SELECT id, post_content, post_title FROM elinstmkting_posts e" >> onError="abort" >> deltaQuery="SELECT * FROM elinstmkting_posts e where >> post_modified_gmt > '${dataimporter.last_index_time}'"> >> <field column="POST_TITLE" name="post_title" >> stripHTML="false"/> >> <field column="POST_CONTENT" name="post_content" >> stripHTML="true" /> >> </entity> >> </document> >> </dataConfig> >> >> Looks perfect according to the wiki docs, but the html is found when I >> search for "strong" (<strong> tag) and html is returned in the field. >> >> I assume I am doing something stupid wrong, I am using the latest stable >> solr (1.4.0). >> >> Does it matter that the post data is not a complete html document (it >> doesn't have a <html> start tag or a <body> tag)? >> >> James >> > -- Lance Norskog goks...@gmail.com