Re: XSLT transform before update?

Shalin Shekhar Mangar Fri, 18 Apr 2008 12:16:43 -0700

Also, concatenation of values is also quite common. We should have a way of
doing this without forcing everybody to write code.
I think we should add a ConcatenateTransformer in DataImportHandler itself
which can take care of basic use-cases. A syntax like this may be good
enough:


<field column="myField" concatenate="field1, field2, field3,..."
separateBy=" " />

What do you think?

On Sat, Apr 19, 2008 at 12:41 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> Hi David,
> Actually you can concatenate values, however you'll have to write a bit of
> code. You can write this in javascript (if you're using Java 6) or in Java.
>
> Basically, you need to write a Transformer to do it. Look at
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>
> For example, lets say you get fields first-name and last-name in the XML.
> But in the schema.xml you have a field called "name" in which you need to
> concatenate the values of first-name and last-name (with a space in
> between). Create a Java class:
>
> public class ConcatenateTransformer { public Object
> transformRow(Map<String, Object> row) { String firstName =
> row.get("first-name"); String lastName = row.get("last-name");
> row.put("name", firstName + " " + lastName); return row; } }
>
> Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
>
> The data-config.xml should like this:
> <entity name="myEntity" processor="XPathEntityProcessor" url="
> http://myurl/example.xml";
> transformer="com.yourpackage.ConcatenateTransformer"> <field
> column="first-name" xpath="/record/first-name" /> <field column="last-name"
> xpath="/record/last-name" /> <field column="name" /> </entity>
>
> This will call ConcatenateTransformer.transformRow method for each row and
> you can concatenate any field with any field (or constant). Note that solr
> document will keep only those fields which are in the schema.xml, the rest
> are thrown away.
>
> If you don't want to write this in Java, you can use JavaScript by using
> the built-in ScriptTransformer, for an example look at
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>
> However, I'm beginning to realize that XSLT is a common need, let me see
> how best we can accomodate it in DataImportHandler. Which XSLT processor
> will you prefer?
>
> On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org <
> [EMAIL PROTECTED]> wrote:
>
> >
> > I'm in the same situation as you Daniel.  The DataImportHandler is
> > pretty
> > awesome but I'd also prefer it had the power of XSLT.  The XPath support
> > in
> > it doesn't suffice for me.  And I can't do very basic things like
> > concatenate one value with another, say a constant even.  It's too bad
> > there
> > isn't a mode that XSLT can be put in to to not build the whole file into
> > memory to do the transform.  I've been looking into this and have turned
> > up
> > nothing.  It would be neat if there was a STaX to multi-document
> > adapter, at
> > which point XSLT could be applied to the smaller fixed-size documents
> > instead of the entire data stream.  I haven't found anything like this
> > so
> > it'd need to be built.  For now my documents aren't too big to XSLT
> > in-memory.
> >
> > ~ David
> >
> >
> > Daniel Papasian wrote:
> > >
> > > Shalin Shekhar Mangar wrote:
> > >> Hi Daniel,
> > >>
> > >> Maybe if you can give us a sample of how your XML looks like, we can
> > >> suggest
> > >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
> > >> use-cases
> > >> we have yet encountered are solvable using the XPathEntityProcessor
> > in
> > >> DataImportHandler without using XSLT, for details look at
> > >>
> > http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
> > >
> > > I think even if it is possible to use SOLR-469 for my needs, I'd still
> > > prefer the XSLT approach, because it's going to be a bit of
> > > configuration either way, and I'd rather it be an XSLT stylesheet than
> > > solrconfig.xml.  In addition, I haven't yet decided whether I want to
> > > apply any patches to the version that we will deploy, but if I do go
> > > down the route of the XSLT transform patch, if I end up having to back
> > > it out the amount of work that it would be for me to do the transform
> > at
> > > the XML source would be negligible, where it would be quite a bit of
> > > work ahead of me to go from using the DataImportHandler to not using
> > it
> > > at all.
> > >
> > > Because both the solr instance and the XML source are in house, I have
> > > the ability to apply the XSLT at the source instead of at solr.
> > > However, there are different teams of people that control the XML
> > source
> > > and solr, so it would require a bit more office coordination to do it
> > on
> > > the backend.
> > >
> > > The data is a filemaker XML export (DTD fmresultset) and it looks
> > > roughly like this:
> > > <fmresultset>
> > >    <resultset>
> > >      <field name="ID"><data>125</data></field>
> > >      <field name="organization"><data>Ford Foundation</data></field>
> > >      ...
> > >      <relatedset table="Employees">
> > >        <record>
> > >          <field name="ID"><data>Y5-A</data></field>
> > >          <field name="Name"><data>John Smith</data></field>
> > >        </record>
> > >        <record>
> > >          <field name="ID"><data>Y5-B</data></field>
> > >          <field name="Name"><data>Jane Doe</data></field>
> > >        </record>
> > >      </relatedset>
> > > </fmresultset>
> > >
> > > I'm taking the product of the resultset and the relatedset, using both
> > > IDs concatenated as a unique identifier, like so:
> > >
> > > <doc>
> > > <field name="ID">125Y5-A</field>
> > > <field name="organization">Ford Foundation</field>
> > > <field name="Name">John Smith</field>
> > > </doc>
> > > <doc>
> > > <field name="ID">125Y5-B</field>
> > > <field name="organization">Ford Foundation</field>
> > > <field name="Name">Jane Doe</field>
> > > </doc>
> > >
> > > I can do the transform pretty simply with XSLT.  I suppose it is
> > > possible to get the DataImportHandler to do this, but I'm not yet
> > > convinced that it's easier.
> > >
> > > Daniel
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: XSLT transform before update?

Reply via email to