Re: XSLT transform before update?

David Smiley @MITRE.org Mon, 21 Apr 2008 05:28:21 -0700

Cool.  So you're saying that this xslt file will operate on the entire XML
document that was fetched from the URL and just pass it on to solr?  Thanks
for supporting this.  The XML files I have coming from the my data source
are big but not not too big to risk an out-of-memory error.  And I've found
xslt to perform fast for me.  I like your proposed TemplateTransformer
too... I'm tempted to use that in place of XSLT.  Great job Paul.


It'd be neat to have an XSLT transformer for your framework that operates on
a single entity (that addresses the memory usage problem).  I know your
entities are HashMap based instead of XML, however.

~ David


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> We are planning to incorporate both your requests in the next patch.
> The implementation is going to be as follows.mention the xsl file
> location as follows
> <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl">
> ....
> </entity>
> So the processing will be done after the XSL transformation. If after
> your XSL transformation it produces a valid 'add' document not even
> fields is necessary. Otherwise you will need to write all the fields
> and their xpaths like any other xml
> 
> <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"
> useSolrAddXml="true"/>
> 
> So it will assume that the schema is same as that of the add xml and
> does the needful.
> 
> Another feature is going to be a TemplateTransformer  which takes in a
> Template as follows
> 
> <entity name="e" transformer="TemplateTransformer" ....>
> <field column="field1_2"  template="${e.field1} ${e.field2}/>
> </entity>
> 
> Please let us know what u think about this.
> 
> And keep giving us these great use-cases so that we can make the tool
> better.
> --Noble
> 
> 
> 
> On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org
> <[EMAIL PROTECTED]> wrote:
>>
>>  Thanks Shalin.
>>
>>  The particular XSLT processor used is not relevant; it's a spec.  Just
>> use
>>  the standard Java APIs.  If I want a particular processor, then I can
>> get
>>  that to happen by using a system property and/or you could offer a
>>  configuration input for the standard factory class implementation for a
>>  processor of my choice.
>>
>>  ~ David
>>
>>
>>
>>
>>  Shalin Shekhar Mangar wrote:
>>  >
>>  > Hi David,
>>  > Actually you can concatenate values, however you'll have to write a
>> bit of
>>  > code. You can write this in javascript (if you're using Java 6) or in
>>  > Java.
>>  >
>>  > Basically, you need to write a Transformer to do it. Look at
>>  >
>> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>>  >
>>  > For example, lets say you get fields first-name and last-name in the
>> XML.
>>  > But in the schema.xml you have a field called "name" in which you need
>> to
>>  > concatenate the values of first-name and last-name (with a space in
>>  > between). Create a Java class:
>>  >
>>  > public class ConcatenateTransformer { public Object
>>  > transformRow(Map<String,
>>  > Object> row) { String firstName = row.get("first-name"); String
>> lastName =
>>  > row.get("last-name"); row.put("name", firstName + " " + lastName);
>> return
>>  > row; } }
>>  >
>>  > Add this class to solr's classpath by putting its jar in
>> solr/WEB-INF/lib
>>  >
>>  > The data-config.xml should like this:
>>  > <entity name="myEntity" processor="XPathEntityProcessor" url="
>>  > http://myurl/example.xml";
>>  > transformer="com.yourpackage.ConcatenateTransformer"> <field
>>  > column="first-name" xpath="/record/first-name" /> <field
>>  > column="last-name"
>>  > xpath="/record/last-name" /> <field column="name" /> </entity>
>>  >
>>  > This will call ConcatenateTransformer.transformRow method for each row
>> and
>>  > you can concatenate any field with any field (or constant). Note that
>> solr
>>  > document will keep only those fields which are in the schema.xml, the
>> rest
>>  > are thrown away.
>>  >
>>  > If you don't want to write this in Java, you can use JavaScript by
>> using
>>  > the
>>  > built-in ScriptTransformer, for an example look at
>>  >
>> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>>  >
>>  > However, I'm beginning to realize that XSLT is a common need, let me
>> see
>>  > how
>>  > best we can accomodate it in DataImportHandler. Which XSLT processor
>> will
>>  > you prefer?
>>  >
>>  > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
>>  > <[EMAIL PROTECTED]>
>>  > wrote:
>>  >
>>  >>
>>  >> I'm in the same situation as you Daniel.  The DataImportHandler is
>> pretty
>>  >> awesome but I'd also prefer it had the power of XSLT.  The XPath
>> support
>>  >> in
>>  >> it doesn't suffice for me.  And I can't do very basic things like
>>  >> concatenate one value with another, say a constant even.  It's too
>> bad
>>  >> there
>>  >> isn't a mode that XSLT can be put in to to not build the whole file
>> into
>>  >> memory to do the transform.  I've been looking into this and have
>> turned
>>  >> up
>>  >> nothing.  It would be neat if there was a STaX to multi-document
>> adapter,
>>  >> at
>>  >> which point XSLT could be applied to the smaller fixed-size documents
>>  >> instead of the entire data stream.  I haven't found anything like
>> this so
>>  >> it'd need to be built.  For now my documents aren't too big to XSLT
>>  >> in-memory.
>>  >>
>>  >> ~ David
>>  >>
>>  >>
>>  >> Daniel Papasian wrote:
>>  >> >
>>  >> > Shalin Shekhar Mangar wrote:
>>  >> >> Hi Daniel,
>>  >> >>
>>  >> >> Maybe if you can give us a sample of how your XML looks like, we
>> can
>>  >> >> suggest
>>  >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>>  >> >> use-cases
>>  >> >> we have yet encountered are solvable using the
>> XPathEntityProcessor in
>>  >> >> DataImportHandler without using XSLT, for details look at
>>  >> >>
>>  >>
>> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>>  >> >
>>  >> > I think even if it is possible to use SOLR-469 for my needs, I'd
>> still
>>  >> > prefer the XSLT approach, because it's going to be a bit of
>>  >> > configuration either way, and I'd rather it be an XSLT stylesheet
>> than
>>  >> > solrconfig.xml.  In addition, I haven't yet decided whether I want
>> to
>>  >> > apply any patches to the version that we will deploy, but if I do
>> go
>>  >> > down the route of the XSLT transform patch, if I end up having to
>> back
>>  >> > it out the amount of work that it would be for me to do the
>> transform
>>  >> at
>>  >> > the XML source would be negligible, where it would be quite a bit
>> of
>>  >> > work ahead of me to go from using the DataImportHandler to not
>> using it
>>  >> > at all.
>>  >> >
>>  >> > Because both the solr instance and the XML source are in house, I
>> have
>>  >> > the ability to apply the XSLT at the source instead of at solr.
>>  >> > However, there are different teams of people that control the XML
>>  >> source
>>  >> > and solr, so it would require a bit more office coordination to do
>> it
>>  >> on
>>  >> > the backend.
>>  >> >
>>  >> > The data is a filemaker XML export (DTD fmresultset) and it looks
>>  >> > roughly like this:
>>  >> > <fmresultset>
>>  >> >    <resultset>
>>  >> >      <field name="ID"><data>125</data></field>
>>  >> >      <field name="organization"><data>Ford
>> Foundation</data></field>
>>  >> >      ...
>>  >> >      <relatedset table="Employees">
>>  >> >        <record>
>>  >> >          <field name="ID"><data>Y5-A</data></field>
>>  >> >          <field name="Name"><data>John Smith</data></field>
>>  >> >        </record>
>>  >> >        <record>
>>  >> >          <field name="ID"><data>Y5-B</data></field>
>>  >> >          <field name="Name"><data>Jane Doe</data></field>
>>  >> >        </record>
>>  >> >      </relatedset>
>>  >> > </fmresultset>
>>  >> >
>>  >> > I'm taking the product of the resultset and the relatedset, using
>> both
>>  >> > IDs concatenated as a unique identifier, like so:
>>  >> >
>>  >> > <doc>
>>  >> > <field name="ID">125Y5-A</field>
>>  >> > <field name="organization">Ford Foundation</field>
>>  >> > <field name="Name">John Smith</field>
>>  >> > </doc>
>>  >> > <doc>
>>  >> > <field name="ID">125Y5-B</field>
>>  >> > <field name="organization">Ford Foundation</field>
>>  >> > <field name="Name">Jane Doe</field>
>>  >> > </doc>
>>  >> >
>>  >> > I can do the transform pretty simply with XSLT.  I suppose it is
>>  >> > possible to get the DataImportHandler to do this, but I'm not yet
>>  >> > convinced that it's easier.
>>  >> >
>>  >> > Daniel
>>  >> >
>>  >> >
>>  >>
>>  >> --
>>  >> View this message in context:
>>  >>
>> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
>>  >> Sent from the Solr - User mailing list archive at Nabble.com.
>>  >>
>>  >>
>>  >
>>  >
>>  > --
>>  > Regards,
>>  > Shalin Shekhar Mangar.
>>  >
>>  >
>>
>>  --
>>  View this message in context:
>> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
>>
>>
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/XSLT-transform-before-update--tp16738227p16807488.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XSLT transform before update?

Reply via email to