hi , There is this new patch which implements these features. I shall update the wiki with the documentation
I guess we do not need to be too worried about the memory consumption. A few MB of memory should be fine (unless your are using a file which is in 10's of MB ). Consider using XPathEntityProcessor (if possible ) it uses Stax and it is pretty efficient. thanks for your support --Noble A few MB of memory for an xml must be fine. The XPathEnt On Mon, Apr 21, 2008 at 5:57 PM, David Smiley @MITRE.org <[EMAIL PROTECTED]> wrote: > > Cool. So you're saying that this xslt file will operate on the entire XML > document that was fetched from the URL and just pass it on to solr? Thanks > for supporting this. The XML files I have coming from the my data source > are big but not not too big to risk an out-of-memory error. And I've found > xslt to perform fast for me. I like your proposed TemplateTransformer > too... I'm tempted to use that in place of XSLT. Great job Paul. > > It'd be neat to have an XSLT transformer for your framework that operates on > a single entity (that addresses the memory usage problem). I know your > entities are HashMap based instead of XML, however. > > ~ David > > > > > Noble Paul നോബിള് नोब्ळ् wrote: > > > > We are planning to incorporate both your requests in the next patch. > > The implementation is going to be as follows.mention the xsl file > > location as follows > > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"> > > .... > > </entity> > > So the processing will be done after the XSL transformation. If after > > your XSL transformation it produces a valid 'add' document not even > > fields is necessary. Otherwise you will need to write all the fields > > and their xpaths like any other xml > > > > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl" > > useSolrAddXml="true"/> > > > > So it will assume that the schema is same as that of the add xml and > > does the needful. > > > > Another feature is going to be a TemplateTransformer which takes in a > > Template as follows > > > > <entity name="e" transformer="TemplateTransformer" ....> > > <field column="field1_2" template="${e.field1} ${e.field2}/> > > </entity> > > > > Please let us know what u think about this. > > > > And keep giving us these great use-cases so that we can make the tool > > better. > > --Noble > > > > > > > > On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org > > <[EMAIL PROTECTED]> wrote: > >> > >> Thanks Shalin. > >> > >> The particular XSLT processor used is not relevant; it's a spec. Just > >> use > >> the standard Java APIs. If I want a particular processor, then I can > >> get > >> that to happen by using a system property and/or you could offer a > >> configuration input for the standard factory class implementation for a > >> processor of my choice. > >> > >> ~ David > >> > >> > >> > >> > >> Shalin Shekhar Mangar wrote: > >> > > >> > Hi David, > >> > Actually you can concatenate values, however you'll have to write a > >> bit of > >> > code. You can write this in javascript (if you're using Java 6) or in > >> > Java. > >> > > >> > Basically, you need to write a Transformer to do it. Look at > >> > > >> > http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9 > >> > > >> > For example, lets say you get fields first-name and last-name in the > >> XML. > >> > But in the schema.xml you have a field called "name" in which you need > >> to > >> > concatenate the values of first-name and last-name (with a space in > >> > between). Create a Java class: > >> > > >> > public class ConcatenateTransformer { public Object > >> > transformRow(Map<String, > >> > Object> row) { String firstName = row.get("first-name"); String > >> lastName = > >> > row.get("last-name"); row.put("name", firstName + " " + lastName); > >> return > >> > row; } } > >> > > >> > Add this class to solr's classpath by putting its jar in > >> solr/WEB-INF/lib > >> > > >> > The data-config.xml should like this: > >> > <entity name="myEntity" processor="XPathEntityProcessor" url=" > >> > http://myurl/example.xml" > >> > transformer="com.yourpackage.ConcatenateTransformer"> <field > >> > column="first-name" xpath="/record/first-name" /> <field > >> > column="last-name" > >> > xpath="/record/last-name" /> <field column="name" /> </entity> > >> > > >> > This will call ConcatenateTransformer.transformRow method for each row > >> and > >> > you can concatenate any field with any field (or constant). Note that > >> solr > >> > document will keep only those fields which are in the schema.xml, the > >> rest > >> > are thrown away. > >> > > >> > If you don't want to write this in Java, you can use JavaScript by > >> using > >> > the > >> > built-in ScriptTransformer, for an example look at > >> > > >> > http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 > >> > > >> > However, I'm beginning to realize that XSLT is a common need, let me > >> see > >> > how > >> > best we can accomodate it in DataImportHandler. Which XSLT processor > >> will > >> > you prefer? > >> > > >> > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org > >> > <[EMAIL PROTECTED]> > >> > wrote: > >> > > >> >> > >> >> I'm in the same situation as you Daniel. The DataImportHandler is > >> pretty > >> >> awesome but I'd also prefer it had the power of XSLT. The XPath > >> support > >> >> in > >> >> it doesn't suffice for me. And I can't do very basic things like > >> >> concatenate one value with another, say a constant even. It's too > >> bad > >> >> there > >> >> isn't a mode that XSLT can be put in to to not build the whole file > >> into > >> >> memory to do the transform. I've been looking into this and have > >> turned > >> >> up > >> >> nothing. It would be neat if there was a STaX to multi-document > >> adapter, > >> >> at > >> >> which point XSLT could be applied to the smaller fixed-size documents > >> >> instead of the entire data stream. I haven't found anything like > >> this so > >> >> it'd need to be built. For now my documents aren't too big to XSLT > >> >> in-memory. > >> >> > >> >> ~ David > >> >> > >> >> > >> >> Daniel Papasian wrote: > >> >> > > >> >> > Shalin Shekhar Mangar wrote: > >> >> >> Hi Daniel, > >> >> >> > >> >> >> Maybe if you can give us a sample of how your XML looks like, we > >> can > >> >> >> suggest > >> >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the > >> >> >> use-cases > >> >> >> we have yet encountered are solvable using the > >> XPathEntityProcessor in > >> >> >> DataImportHandler without using XSLT, for details look at > >> >> >> > >> >> > >> > http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 > >> >> > > >> >> > I think even if it is possible to use SOLR-469 for my needs, I'd > >> still > >> >> > prefer the XSLT approach, because it's going to be a bit of > >> >> > configuration either way, and I'd rather it be an XSLT stylesheet > >> than > >> >> > solrconfig.xml. In addition, I haven't yet decided whether I want > >> to > >> >> > apply any patches to the version that we will deploy, but if I do > >> go > >> >> > down the route of the XSLT transform patch, if I end up having to > >> back > >> >> > it out the amount of work that it would be for me to do the > >> transform > >> >> at > >> >> > the XML source would be negligible, where it would be quite a bit > >> of > >> >> > work ahead of me to go from using the DataImportHandler to not > >> using it > >> >> > at all. > >> >> > > >> >> > Because both the solr instance and the XML source are in house, I > >> have > >> >> > the ability to apply the XSLT at the source instead of at solr. > >> >> > However, there are different teams of people that control the XML > >> >> source > >> >> > and solr, so it would require a bit more office coordination to do > >> it > >> >> on > >> >> > the backend. > >> >> > > >> >> > The data is a filemaker XML export (DTD fmresultset) and it looks > >> >> > roughly like this: > >> >> > <fmresultset> > >> >> > <resultset> > >> >> > <field name="ID"><data>125</data></field> > >> >> > <field name="organization"><data>Ford > >> Foundation</data></field> > >> >> > ... > >> >> > <relatedset table="Employees"> > >> >> > <record> > >> >> > <field name="ID"><data>Y5-A</data></field> > >> >> > <field name="Name"><data>John Smith</data></field> > >> >> > </record> > >> >> > <record> > >> >> > <field name="ID"><data>Y5-B</data></field> > >> >> > <field name="Name"><data>Jane Doe</data></field> > >> >> > </record> > >> >> > </relatedset> > >> >> > </fmresultset> > >> >> > > >> >> > I'm taking the product of the resultset and the relatedset, using > >> both > >> >> > IDs concatenated as a unique identifier, like so: > >> >> > > >> >> > <doc> > >> >> > <field name="ID">125Y5-A</field> > >> >> > <field name="organization">Ford Foundation</field> > >> >> > <field name="Name">John Smith</field> > >> >> > </doc> > >> >> > <doc> > >> >> > <field name="ID">125Y5-B</field> > >> >> > <field name="organization">Ford Foundation</field> > >> >> > <field name="Name">Jane Doe</field> > >> >> > </doc> > >> >> > > >> >> > I can do the transform pretty simply with XSLT. I suppose it is > >> >> > possible to get the DataImportHandler to do this, but I'm not yet > >> >> > convinced that it's easier. > >> >> > > >> >> > Daniel > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > -- > >> > Regards, > >> > Shalin Shekhar Mangar. > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html > >> > >> > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16807488.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul