[
https://issues.apache.org/jira/browse/STANBOL-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287496#comment-13287496
]
Suat Gonul commented on STANBOL-498:
------------------------------------
With my last commit, I have checked in the file based implementation for the
Store interface. That commit also includes a Apache Derby based revision
management mechanism and serialization/deserialization mechanism for content
parts. Comments are very welcome for the implementation.
>From now on, I'm planning to write some tests and afterwards continue with the
>indexing part.
> Contenthub: Enhanced ContentItem Store
> --------------------------------------
>
> Key: STANBOL-498
> URL: https://issues.apache.org/jira/browse/STANBOL-498
> Project: Stanbol
> Issue Type: Sub-task
> Components: Content Hub
> Reporter: Rupert Westenthaler
>
> Simple Storage interface for enhanced ContentItems.
> This Store is used to
> 1. save the ContentItems after they are enhanced by the Enahncer
> * The Blobs (original content and transcoded versions)
> * The Metadata (Enhancement Results)
> 2. retrieve ContentItems while semantic indexing
> * Iterator over the IDs
> * Get ContentItem by ID
> This store is NOT intended to be used for search! It is only used for ID
> based lookup.
> Implementations:
> -----------------------
> * CMS Adapter: An implementation based on the CMS Adapter provides the
> possibility to store the Enhancement Results directly within the CMS.
> Typically this will be the CMS also sending the request to the Contenthub,
> but this is no requirement.
> * Clerezza based implementation: Clerezza - as RDF based CMS - provides the
> required functionality to store both the content AND the metadata of the
> contentItem
> * File based: Simple file based storage without any external dependencies.
> This could be used as default and for testing
> Interface:
> -------------
> The interface will be based-on/replace the
> [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java)
> interface already present in the Contenthub. However the suggestion is to
> remove the "getEnhancementGraph()" as this is not required by the usecases
> (1) and (2) mentioned above. In addition the store interface should be
> extended with a remove method to allow manual deletion of ContentItems.
> /** stores the parsed ContentItem */
> + put(ContentItem ci) : UriRef
> /** Getter for the ContentItem with the parsed ID */
> + get(UriRef id) : ContentItem
> ### Revisions
> Revisions are used to re-synchronize semantic indexes with the enhanced
> ContentItems managed by this store. Every time the ContentHub indexes
> enhanced ContentItem - as managed by this store - to a SemanticIndex it
> provides the highest revision. SemanticIndexes MUST persist such revisions
> and MUST ensure they are even available after a re-start because this number
> will be later used by the ContentHub to apply changes to enhances
> ContentItmes.
> In detail a revision is defined as a change (add, update, removal) to one or
> more ContentItems managed by the Store. Every such change MUST BE result in
> an increase of the revision. Revisions MUST only use positive numbers.
> Implementers might use <code>System.currentTimeMillis()</code> as revision
> but this is no requirement.
> The store interface provides a method that returns an Iterator over all
> changed ContentItems that where changed (added, updated, removed) since a
> given revision.
> /** Iterator over all contentItems added/removed after revision */
> + changes(long revision, int offset, int batchSize) : ChangeSet
> class ChangeSet {
> /** the lowest included revision */
> + from() : long
> /** the id of changed ContentItems */
> + changed() : Map<UriRef>
> /** the highest included revision */
> + to() : long
> }
> Calls to chages(..) MUST return only changes with a higher revision as the
> provided number. ChangeSet with the parsed revision number MUST BE excluded.
> Note that ChangeSet does not provide information about the type of the
> change. This will be only available after a call to Store#get(..).
> The revisions MUST NOT to keep a history of changes. Only the revision of the
> latest change MUST be kept. This ensures that rebuilding a semantic index
> (from revsion -1) does only perform indexing steps corresponding to
> historical state of the Store. Note also that the revisions do not provide
> information about the type of the change. If a ContentItem is still present
> (added, updated) or was removed will be indicated by the get(..) method of
> the store returning a ContentItem instance or <code>null</code>
> #### Example:
> e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is
> updated and 3 is deleted and in a third step contentitem 3 and 4 are added
> this would result in the following revision data
> After step 1:
> :::text
> 1 : urn:contentItem.1 //added
> 1 : urn:contentItem.2 //added
> 1 : urn:contentItem.3 //added
> After step 2:
> :::text
> 1 : urn:contentItem.1 //added
> 2 : urn:contentItem.2 //updated
> 2 : urn:contentItem.3 //removed
> After step 3:
> :::text
> 1 : urn:contentItem.1 //added
> 2 : urn:contentItem.2 //updated
> 3 : urn:contentItem.3 //added
> 3 : urn:contentItem.4 //added
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira