Thanks for the quick reply.  Unfortunately, I don't believe my company
would want me sharing our exact production schema in a public forum,
although I realize it makes it harder to diagnose the problem.  The
sub-entity is a multi-valued field that indeed does have a relationship to
the outer entity.  I just left off the 'where' clause from the sub-entity,
as I didn't believe it was helpful in the context of this problem.  We use
the convention of..

SELECT dbColumnName AS solrFieldName

...so that we can relate the database column name to what we what it to be
named in the Solr index.

I don't think any of this helps you identify my problem, but I tried to
address your questions.

Thanks,
Andy

On Tue, Jul 2, 2013 at 9:14 AM, Gora Mohanty <g...@mimirtech.com> wrote:

> On 2 July 2013 20:29, Andy Pickler <andy.pick...@gmail.com> wrote:
> > Solr 4.1.0
> >
> > We've been using the DIH to pull data in from a MySQL database for quite
> > some time now.  We're now wanting to strip all the HTML content out of
> many
> > fields using the HTMLStripTransformer (
> > http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer).
> >  Unfortunately, while it seems to be working fine for "top-level"
> entities,
> > we can't seem to get it to work for sub-entities:
> >
> > (not exact schema, reduced for example purposes)
>
> Please do not do that. This DIH configuration file does
> not make sense (please see comments below), and we
> are left guessing in the dark. If the file is too large,
> you can share it on something like pastebin.com
>
> > <entity name="blocks" dataSource="database"
> > transformer="HTMLStripTransformer" query="
> >   SELECT
> >     id as blockId,
> >     name as blockTitle,
> >     content as content
> >   FROM engagement_block
> >   ">
> >   <field column="content" stripHTML="true" />  *THIS WORKS!*
> >   <entity name="blockReplies" dataSource="database"
> > transformer="HTMLStripTransformer" query="
> >     SELECT
> >       br.other_content AS replyContent
> >     FROM block_reply
> >     ">
> >     <field column="other_content" stripHTML="true" /> *THIS DOESN'T
> WORK!*
> [...]
>
> (a) You SELECT replyContent, but the column attribute
>      in the field is named "other_content". Nothing should
>      be getting indexed into the field.
> (b) Why are your entities nested if the inner entity has no
>      relationship to the outer one?
>
> Regards,
> Gora
>

Reply via email to