[ https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908084#comment-13908084 ]
Manjunath commented on SOLR-2943: --------------------------------- Nice feature to have :-). Waiting to use it :-) > DIHCacheWriter & DIHCacheProcessor (entity processor) > ----------------------------------------------------- > > Key: SOLR-2943 > URL: https://issues.apache.org/jira/browse/SOLR-2943 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler > Affects Versions: 4.0-ALPHA > Reporter: James Dyer > Priority: Minor > Fix For: 4.7 > > Attachments: SOLR-2943.patch, SOLR-2943.patch, SOLR-2943.patch > > > This is a spin-off of SOLR-2382. > Currently DIH requires users to retrieve, join and index all data for a full > or delta update in one big step. This issue is to allow us to break this > into individual steps. The idea is to have multiple "data-config.xml" files, > some of which retrieve and cache data while others join and index data. > This is useful when Solr Records are a conglomeration of several data > sources. With this feature, each data source can be retrieved and cached > separately. Once all data sources have been retrieved, they can be joined > and indexed in a final step. When doing a delta update, only the data > sources that change need to have their caches updated (or frequently-changing > data can remain un-cached while caching the more static data). This is > particularly useful in light of the fact that Lucene/Solr cannot do a true > "update" operation. DIH Caches also provide a handy way to archive source > data for which there is no stable system-of-record. > Implementation Details: > - The DIHCacheWriter allows us to write the final (root entity) DIH output to > a DIHCache rather than to Solr. Caches can be created from scratch > ("full-update") or existing caches can be modified ("delta-update"). > - The DIHCacheProcessor is an Entity Processor that reads a DIHCache. This > Entity Processor can be used for both Root Entities and Child Entities. > Cached data can be read back, joined to other Entities and indexed. > - Both DIHCacheWriter and DIHCacheProcessor support partitioning. > DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can > read back a particular partition. This can be handy when indexing to > multiple shards. > - This patch is 100% stand-alone from the rest of DIH, so while users can > patch and rebuild the DIH .jar file to include these classes, it is > unnecessary. To use this functionality, simply include the code here in the > classpath. (ex: in SOLR_HOME/lib) > - In addition to this patch, a persistent cache implementation is required. > - See SOLR-2948 for a DIH Cache Implementation built on Lucene (no > additional dependencies). > - See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE (we use > this in Production). > - Other Cache Implementations (hopefully) will be developed in the future > and become available for general use. > - This patch includes extensive unit tests. A MockDIHCache that supports > persistence and delta updates facilitates the tests. Do not attempt to use > MockDIHCache for anything other than testing or as a reference for developing > your own DIHCache implementations. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org