this need to be reverted . there was data loss On Wed, Dec 9, 2009 at 8:46 PM, Apache Wiki <wikidi...@apache.org> wrote: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "Solr Wiki" for change > notification. > > The "DataImportHandler" page has been changed by DNaber. > http://wiki.apache.org/solr/DataImportHandler?action=diff&rev1=220&rev2=221 > > -------------------------------------------------- > > <dataConfig> > <dataSource type="FileDataSource" /> > <document> > + <entity name="f" processor="FileListEntityProcessor" > baseDir="/some/path/tongle implicit field called 'plainText'. The content is > not parsed in any way, however you may add transformers to manipulate the > data within 'plainText' as needed or to create other additional fields. > - <entity name="f" processor="FileListEntityProcessor" > baseDir="/some/path/to/files" fileName=".*xml" newerThan="'NOW-3DAYS'" > recursive="true" rootEntity="false" dataSource="null"> > - <entity name="x" processor="XPathEntityProcessor" > forEach="/the/record/xpath" url="${f.fileAbsolutePath}"> > - <field column="full_name" xpath="/field/xpath"/> > - </entity> > - </entity> > - </document> > - </dataConfig> > - }}} > - Do not miss the `rootEntity` attribute. The implicit fields generated by > the !FileListEntityProcessor are `fileAbsolutePath, fileSize, > fileLastModified, fileName` and these are available for use within the entity > X as shown above. It should be noted that !FileListEntityProcessor returns a > list of pathnames and that the subsequent entity must use the !FileDataSource > to fetch the files content. > > + example: > - === CachedSqlEntityProcessor === > - <<Anchor(cached)>> > - > - This is an extension of the !SqlEntityProcessor. This !EntityProcessor > helps reduce the no: of DB queries executed by caching the rows. It does not > help to use it in the root most entity because only one sql is run for the > entity. > - > - Example 1. > {{{ > - <entity name="x" query="select * from x"> > - <entity name="y" query="select * from y where xid=${x.id}" > processor="CachedSqlEntityProcessor"> > - </entity> > + <entity processor="PlainTextEntityProcessor" name="x" > url="http://abc.com/a.txt" dataSource="data-source-name"> > + <!-- copies the text to a field called 'text' in Solr--> > + <field column="plainText" name="text"/> > - <entity> > + </entity> > }}} > > - The usage is exactly same as the other one. When a query is run the results > are stored and if the same query is run again it is fetched from the cache > and returned > + Ensure that the dataSource is of type !DataSource<Reader> (!FileDataSource, > URL!DataSource) > > - Example 2: > - {{{ > - <entity name="x" query="select * from x"> > - <entity name="y" query="select * from y" > processor="CachedSqlEntityProcessor" where="xid=x.id"> > - </entity> > - <entity> > - }}} > - > - The difference with the previous one is the 'where' attribute. In this case > the query fetches all the rows from the table and stores all the rows in the > cache. The magic is in the 'where' value. The cache stores the values with > the 'xid' value in 'y' as the key. The value for 'x.id' is evaluated every > time the entity has to be run and the value is looked up in the cache an the > rows are returned. > - > - In the where the lhs (the part before '=') is the column in y and the rhs > (the part after '=') is the value to be computed for looking up the cache. > - > - === PlainTextEntityProcessor === > + === LineEntityProcessor === > - <<Anchor(plaintext)>> > + <<Anchor(LineEntityProcessor)>> > <!> [[Solr1.4]] > > - This !EntityProcessor reads all content from the data source into an single > implicit field called 'plainText'. The content is not parsed in any way, > however you may add transformers to manipulate the data within 'plainText' as > needed or to create other additional fields. > + This !EntityProcessor reads all content from the data source on a line by > line basis, a field called 'rawLine' is returned for each line read. The > content is not parsed in any way, however you may add transformers to > manipulate the data within 'rawLine' or to create other additional fields. > + > + The lines read can be filtered by two regular expressions > '''acceptLineRegex''' and '''omitLineRegex'''. > + This entities additional attributes are: > + * '''`url`''' : a required attribute that specifies the location of the > input file in a way that is compatible with the configured datasource. If > this value is relative and you are using !FileDataSource or URL!DataSource, > it assumed to be relative to '''baseLoc'''. > + * '''`acceptLineRegex`''' :an optional attribute that if present discards > any line which does not match the regExp. > + * '''`omitLineRegex`''' : an optional attribute that is applied after any > acceptLineRegex and discards any line which matches this regExp. > + example: > + {{{ > + <entity name="jc" > + processor="LineEntityProcessor" > + acceptLineRegex="^.*\.xml$" > + omitLineRegex="/obsolete" > + url="file:///Volumes/ts/files.lis" > + rootEntity="false" > + dataSource="myURIreader1" > + transformer="RegexTransformer,DateFormatTransformer" > + > > + ... > + }}} > + While there are use cases where you might need to create a solr document > per line read from a file, it is expected that in most cases that the lines > read will consist of a pathname which is in turn consumed by another > !EntityProcessor > + such as X!PathEntityProcessor. > + > + == DataSource == > + <<Anchor(datasource)>> > + A class can extend `org.apache.solr.handler.dataimport.DataSource` . > [[http:/%ngle implicit field called 'plainText'. The content is not parsed in > any way, however you may add transformers to manipulate the data within > 'plainText' as needed or to create other additional fields. > > example: > {{{ > @@ -1026, +1026 @@ > > {{attachment:interactive-dev-dataimporthandler.PNG}} > > = Where to find it? = > - DataImportHandler is a new addition to Solr. You can either: > + DataImportHandler was added to Solr in Solr 1.3. You can either: > - * Download a nightly build of Solr from > [[http://lucene.apache.org/solr/|Solr website]], or > + * Download a build of Solr from [[http://lucene.apache.org/solr/|Solr > website]], or > * Use the steps given in Full Import Example to try it out. > > For a history of development discussion related to DataImportHandler, please > see [[http://issues.apache.org/jira/browse/SOLR-469|SOLR-469]] in the Solr > JIRA. >
-- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com