On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> wrote: > Hello list, > > is it possible to load only selected documents with XPathEntityProcessor? > While loading docs I want to drop/skip/ignore documents with missing URL. > > Example: > <documents> > <document> > <title>first title</title> > <id>identifier_01</id> > <link>http://www.foo.com/path/bar.html</link> > </document> > <document> > <title>second title</title> > <id>identifier_02</id> > <link></link> > </document> > </documents> > > The first document should be loaded, the second document should be ignored > because it has an empty link (should also work for missing link field). [...]
You can use a ScriptTransformer, along with $skipRow/$skipDoc. E.g., something like this for your data import configuration file: <dataConfig> <script><![CDATA[ function skipRow(row) { var link = row.get( 'link' ); if( link == null || link == '' ) { row.put( '$skipRow', 'true' ); } return row; } ]]></script> <dataSource type="FileDataSource" /> <document> <entity name="f" processor="FileListEntityProcessor" baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="null"> <entity name="top" processor="XPathEntityProcessor" forEach="/documents/document" url="${f.fileAbsolutePath}" transformer="script:skipRow"> <field column="link" xpath="/documents/document/link"/> <field column="title" xpath="/documents/document/title"/> <field column="id" xpath="/documents/document/id"/> </entity> </entity> </document> </dataConfig> Regards, Gora