Hi Fabian,

You are right, there are a bit too changes. Then, I shall remind this
new structure from earlier this year by making a summary of it. From now
on, we can still follow an incremental approach as we are still in
earlier steps.

As introduced in STANBOL-471, Contenthub will be composed of two layers:
Store and Index. The Store layer is intended to keep the Content Items
as they are (not to index them). On the other hand, Index part is
expected to track the changes in the Store layer and update the
underlying index according to index-scoped configurations. A diagram of
this structure can be found in [1].

To get a better idea on the explanations below, I suggest to read
STANBOL-498 and STANBOL-499 first.

File Based Store Implementation (contenthub/store/file)
====================
The initial implementation for the Store layer is a file based one. It
serializes a Content Item into a zip file such that the main part,
enhancement and content parts would be an entry in the zip file.
However, currently only Blob and TripleCollection typed parts are
supported. When a Content Item is stored or deleted its revision number
is updated as the system time. So, given a revision, it is possible to
get the URIs of changed Content Items. The revisions of the content
items are stored in a Derby database.

LDPath Based Index Implementation (contenthub/index)
======================
Already given in the first mail in this thread.

Again let me remind you that, this work is carried on under the
"contenthub-two-layered-structure" branch. Sorry for the bulk update,

Best,
Suat

[1]
https://issues.apache.org/jira/secure/attachment/12512102/contenthub-2layered-storage.jpg

On 07/20/2012 11:01 AM, Fabian Christ wrote:
> Hi Suat,
>
> while I truly appreciate the new developments it would have been nice
> to have some more information on what you guys are doing on this list.
> Maybe I missed something. The community has to keep informed and get a
> chance to follow what is happening. Next time I would suggest to try a
> more incremental approach instead of submitting a big patch with tons
> of changes at once. This is just about the process not about the great
> contributions you did :)
>
> I will also have a closer look next week.
>
> Best,
>  - Fabian
>
> 2012/7/19 Rupert Westenthaler <[email protected]>:
>> Hi Suat,
>>
>> Great news! I will have a detailed look next week.
>>
>> best
>> Rupert
>>
>> On Thu, Jul 19, 2012 at 4:15 PM, Suat Gonul <[email protected]> wrote:
>>> By the way, STANBOL-471 is the initial issue dedicated to this structure.
>>>
>>>
>>> On 07/19/2012 05:12 PM, Suat Gonul wrote:
>>>> Hi everyone,
>>>>
>>>> I have just committed the initial implementation of the index part of
>>>> the 2-layered structure of Contenthub. So, we have initial
>>>> implementations for both Store and Index layers now. Currently, this
>>>> work is carried on under the "contenthub-two-layered-structure" branch.
>>>> So, to try out this new structure, contenthub module under this branch
>>>> should be built.
>>>>
>>>> I would be very glad to hear your feedbacks. Below, you can see the logs
>>>> from the commit:
>>>>
>>>> Best,
>>>> Suat
>>>>
>>>> Logs:
>>>> Initial version of the default implementation of the SemanticIndex
>>>> interface which is defined in STANBOL-499.
>>>>
>>>> SemanticIndex is one part of the 2-layered structure of Contenthub. The
>>>> other part is the Store which is defined in STANBOL-498.
>>>>
>>>> Default implementation of the SemanticIndex interface
>>>> (LDPathSemanticIndex) is based on the LDPath language. A new
>>>> LDPathSemanticIndex can be created by providing name, description and
>>>> LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
>>>> program is used in two ways which will be explained later in this log.
>>>>
>>>> Each instance of this implementation checks the changes in the Store at
>>>> regular intervals in a separate thread and the interval length is
>>>> configurable. After processing the changes in the Store, the last
>>>> revision is stored persistently. In this way, when the index is
>>>> restarted it will check the the changes as of the latest persisted
>>>> revision. However, when the LDPath is changed the LDPathSemanticIndex
>>>> will index the ContentItems from scratch. In this period the index will
>>>> be REINDEXING state, and during this period, it does not allow other
>>>> index or remove operations. After reindexing is completed, the state of
>>>> the index will be ACTIVE.
>>>>
>>>> LDPath usages in LDPathSemanticIndex
>>>> ====================================
>>>> a) It is used to configure the underlying Solr core. With an LDPath the
>>>> index fields are determined and Solr specific properties such as
>>>> "multiValued", "termVectors" can be configured.
>>>>
>>>> b) When indexing of a ContentItem is in progress, each named entity
>>>> contained in the enhancements of the ContentItem will be queried through
>>>> the Entityhub. Then, the values obtained from Entityhub will be indexed
>>>> along with the actual content as additional metadata. And the additional
>>>> metadata will be completely compatible with the underlying Solr core.
>>>>
>>>> This ability to create customized indexes allows compatibility with
>>>> different domains or use-cases.
>>>>
>>>> Creating,Retrieving LDPathSemanticIndex instances
>>>> =================================================
>>>> {stanbol_host}/index endpoint can be used to retrieve already registered
>>>> SemanticIndexes. An LDPathSemantic index can be created through the
>>>> RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
>>>> Console by configuring a "Apache Stanbol Contenthub LDPath Based
>>>> Semantic Index".
>>>>
>>>> Each instance of LDPathSemanticIndex is registered as an OSGi component.
>>>> So, they can be obtained through ServiceTracker/@Reference.
>>>> Name(Semantic-Index-Name) and description(Semantic-Index-Name)
>>>> properties can be used to retrieve specific instances of
>>>> LDPathSemanticIndex from OSGi environment. Also, the
>>>> SemanticIndexManager service, provides retrieval of indexes according to
>>>> their names and EndpointTypes.
>>>>
>>>> Search over the LDPathSemanticIndex
>>>> ===================================
>>>> The previous search functionality of the Contenthub has not changed.
>>>> They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
>>>> OSGi based Java endpoints. There are two RESTful endpoints which are
>>>> SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
>>>> underlying Solr core. CONTENTHUB endpoint offers a search option of
>>>> which results contain additional information in addition to the
>>>> resultant documents. Those additional information are facets regarding
>>>> the resultant documents and related keywords about the original query
>>>> term. This endpoint is more experimental one which is open to changes.
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>
>

Reply via email to