DIH has special handling for upper & lower case field names. It is
possible your config is running afoul of this.

Try using different names for the Solr fields than the database fields.


On 3/11/10, James Ostheimer <james.osthei...@gmail.com> wrote:
> Hi-
>
> I can't seem to make any of the transfomers work, I am using the
> DataImporter to pull in data from a wordpress instance (see below).  Neither
> REGEX or HTMLStrip seems to do anything to my content.
>
> Do I have to include a separate jar with the transformers?  Are the
> transformers in 1.4 (particularly the HTMLStrip)?
>
> James
>
> On Wed, Mar 10, 2010 at 10:47 PM, James Ostheimer <james.osthei...@gmail.com
>> wrote:
>
>> HI-
>>
>> I am working a contract to index some wordpress data.  For the posts I of
>> course have html in the content of the column, I'd like to strip it out.
>>  Here is my data importer config
>>
>> <dataConfig>
>>     <dataSource driver="com.mysql.jdbc.Driver"
>> url="jdbc:mysql://localhost:3306/econetsm" user="*******"
>> password="*******"
>> />
>>     <document>
>>             <entity name="post" transformer="HTMLStripTransformer"
>> query="SELECT id, post_content, post_title FROM elinstmkting_posts e"
>> onError="abort"
>>                 deltaQuery="SELECT * FROM elinstmkting_posts e where
>> post_modified_gmt > '${dataimporter.last_index_time}'">
>>            <field column="POST_TITLE" name="post_title"
>> stripHTML="false"/>
>>             <field column="POST_CONTENT" name="post_content"
>> stripHTML="true"  />
>>         </entity>
>>     </document>
>> </dataConfig>
>>
>> Looks perfect according to the wiki docs, but the html is found when I
>> search for "strong" (<strong> tag) and html is returned in the field.
>>
>> I assume I am doing something stupid wrong, I am using the latest stable
>> solr (1.4.0).
>>
>> Does it matter that the post data is not a complete html document (it
>> doesn't have a <html> start tag or a <body> tag)?
>>
>> James
>>
>


-- 
Lance Norskog
goks...@gmail.com

Reply via email to