[ 
https://issues.apache.org/jira/browse/STANBOL-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-499:
----------------------------------------

    Summary: Contenthub: Semantic Indexes  (was: Semantic Intexes)
    
> Contenthub: Semantic Indexes
> ----------------------------
>
>                 Key: STANBOL-499
>                 URL: https://issues.apache.org/jira/browse/STANBOL-499
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Content Hub
>            Reporter: Rupert Westenthaler
>
> The SemanticIndex is the Interface used by the ContentHub to semantically 
> index ContentItems (2nd level store). It is anticipated that a ContentHub 
> will manage multiple semantic indexes of possible different implementations.
> Expected Implementations of this Interface include
> * The current Solr/LDPath based semantic index component
> * The current Contenthub default index (also Solr based)
> * A SPARQL based variant implemented by a Triple Store
> The remaining Specification includes the definition of the SemanticIndex 
> interface as well as the SemanticIndexManager.
> SemanticIndex
> --------------------
> The Java interface for semantic indexes as used by the Apache Stanbol 
> Contenthub
> ### Identification
>     :::java
>     /** The name of the Index */
>     + getName()
>     /** An optional free text description */
>     + getDescription()
> The name of the semantic index is intended to be used for simple lookups as 
> well as relative paths within the RESTful interfaces. However it MUST NOT be 
> considered as unique. See section [Semantic Index 
> Management](#Semantic_Index_Management) for details on how to resolve name 
> conflicts.
> ### Indexing
> First the interface defines methods for indexing/removing documents to the 
> semantic index
>     :::java
>     /** Indexes the parsed ContentItem */
>     + index(ContentItem ci) : boolean
>     /** Deletes the ContentItme with the parsed di */
>     + remove(String ciUri)
>     /** Ensures that changes to the index are persisted */
>     + persist(long revision)
>     /** Getter for the highest successfully persisted revision */
>     + getRevision() : long
> The boolean returned by the index method allows to indicate if the parsed 
> ContentItem was actually included to the Semantic Index. Seamtic index may 
> define filters on the content items to be included in the semantic index.
> The persist Method is intended to be used to indicate the Semantic Index that 
> indexing has been finished. This allows the semantic index to form batches 
> over multiple calls to index(..) and remove(..) what may improve performance 
> when indexing multiple ContentItems.
> In addition it is used to parse the highest revision of a indexed content 
> item. If no revision was yet announced to a Semantic index - persist(..) was 
> never called - than getRevision() shall return a negative number.
> The revision will be used by the ContentHub to re-synchronize the contents of 
> a semantic index enhanced ContentItems present in [Store](store.html) when it 
> becomes active. Usually the long value will represent the time in 
> milliseconds such as returned by <code>System.currentTimeMillis()</code> but 
> this is no requirement. It is only important that after each change of the 
> Store interface results in an increase of this number.
> All above methods may throw an SemanticIndexingException. This is a sub class 
> of ContenthubException.
> ### Index State
> Semantic Indexes do provide the following state information
>     
>     /** The state of the semantic index */
>     + getState() : IndexState
> The IndexState is a simple Java enum that defines the following states:
> * <code>UNINIT</code> : The index was defined, the configuration is ok, but 
> the contents are not yet indexed and the indexing has not yet started. 
> (Intended to be used as default state after creations)
> * <code>INDEXING</code>: The (initial) indexing of content items is currently 
> in progress. This indicates that the index is currently NOT active.
> * <code>ACTIVE</code>: The semantic index is available and in sync
> * <code>REINDEXING</code>: The (re)-indexing of content times is currently in 
> progress. This indicates that the configuration of the semantic index was 
> changed in a way that requires to rebuild the whole semantic index. This 
> still requires the index to be active - meaning the searches can be performed 
> normally - but recent updates/changes to ContentItems might not be reflected. 
> This also indicates that the index will be replaced by a different version 
> (maybe with changed fields) in the near future.
> Note that there are no states for INACTIVE and ERROR. This is because such 
> kind of states are already convert by the normal OSGI component live-cycle. 
> All the above IndexStates require the SemanticIndex component to be active.
> ### Index Inspection
> The semantic index interface provides a very simple API to inspect the 
> configuration of the semantic index. This part of the Interface is considered 
> to be optional. Implementations that can not provide such information shall 
> return <code>null</code> to calls of the below methods.
>     :::java
>     /** The names of all fields defined by this Index */
>     + getFieldsNames() : List<String>
>     /** Getter for the field properties */
>     + getFieldProperties(String name) : Map<String,Object>
> Keys for well known properties shall be defined by the services API of the 
> ContentHub. This includes the following:
>     :::java
>     /** The xsd:dataType for the values of a field */
>     DATATYPE
> Implementation specific keys shall be defined by the implementations of the 
> semantic index interface. Here are possible keys for a LDPath based Semantic 
> Index implementation
>     :::java
>     /** The LDPath rule used for a field */
>     LDPATH
> ### Search
> The semantic index does NOT define methods to search it's contents as the 
> intension is to directly use the search APIs of the technologies/framewoks 
> used to hold the semantic index such as
> * [Apache Solr](http://lucene.apache.org/solr) RESTful API
> * SPARQL in case a TripleStore is used as Semantic index.
> * Contenthub featured search interface
> However the semantic index should return the URI and the type of the endpoint
>     :::java
>     /** Getter for all supported search endpoints */
>     getSearchEndpoints() : Map<String,String>
> This method returns as keys the type of the search Endpoint and as value the 
> URL of the RESTful service endpoint.
> e.g. the valued for the semantic index with the name "default" supporting 
> SOLR and Contenthub featured search.
>     :::text
>     "CONTENTHUB" : "http://localhost:8080/contenthub/search/featured";
>     "SOLR" : "http://localhost:8080/solr/contenthub/default";
> An other example for an index with the name "knowledgebase" that supports an 
> SPARQL endpoint
>     :::text
>     "SPARQL" : "http://localhost:8080/sparql/contenthub/knowledgebase";
> Semantic Index Management
> -------------------------
> Semantic Indexes are registered as OSGI component implementing the 
> "SemanticIndex" interface as described above. All active semantic indexes are 
> managed by the SemanticIndexManager component as follows:
> ### Interface
> Provides an Java API that allows to lookup of all active semantic indexes. 
> This includes indexes in the UNINT, INDEXING, ACTIVE and REINDEXING state.
> Lookup of semantic index is supported based on name, and search endpoint type.
>     :::java
>     + getIndex(String name) : SemanticIndex
>     + getIndexes(String name) : List<SemanticIndex>
>     + getIndex(String endpointType) : SemanticIndex
>     + getIndexes(String endpointType) : List<SemanticIndex>
>     + getIndex(String name, String endpointType) : SemanticIndex
>     + getIndexes(String name, String endpointType) : List<SemanticIndex>
> A typical query would be for an index with the name "simple" with the "SOLR" 
> endpoint.
>     :::java
>     SemanticIndexManager indexManager;
>     SemanticIndex index = indexManager.getIndex("simple", EndpointType.SOLR)
>     String solrEndpoint = index.getSearchEndpoints().get(EndpointType.SOLR);
> The methods returning a single Index need to resolve cases with multiple 
> matches by returning the SemanticIndex service
> 1. with the highest "service.ranking" and
> 2. the lowest "service.id
> This ensures the behavior to be consistent with the typical rules for service 
> selection as defined by the OSGI specification.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to