Re: New features for Contenthub

Fabian Christ Thu, 26 Jan 2012 10:34:12 -0800

Hi Suat,

this is a really impressive list of changes and features. Do you have
plans regarding documentation, demos, tutorials?


Best,
 - Fabian

Am 26. Januar 2012 16:46 schrieb Ali Anil Sinaci <[email protected]>:
> Dear Stanbolers,
>
> I have committed major changes related to Contenthub. Below, you can find
> some explanations about the changes. I have grouped them under two major
> issues in Jira (STANBOL-469 and STANBOL-470) although there are several
> sub-issues. Later improvements will be issued under their specific topics.
>
> Contenthub includes two main parts: store and search. Solr is the back-end
> for all store and retrieve operations of content items (SolrContentItem
> extends ContentItem). Major improvements are as follows:
>
> - Store maintains a default Solr core (called "contenthub") through the
> EmbeddedSolrServer. This default core indexes several semantic properties of
> entities in case they are retrieved from the referenced sites. (Current
> dbpedia index does not include most of these properties. We have a larger
> index for this)
>
> - LDPath has been integrated into Contenthub.
>    * Several Solr cores can be managed through LDProgramManager of
> Contenthub.
>    * Each LDPath program corresponds to a unique Solr core. LDPath programs
> (hence Solr cores) are uniquely identified through their names.
> LDProgramManager and SolrCoreManager provides the required synchronization
> between Solr cores and LDPath programs.
>    * Submitted LDPath programs are saved into separate files and accessed
> via a simple cache mechanism.
>    * CRD operations for LDPath programs are provided through
> LDProgramManager
>    * ClerezzaBackend is implemented as an LDPath backend.
>    * LDProgramManager has a special method (executeProgram) to execute the
> LDPath programs on Clerezza MGraphs.
>    * REST services are ready for LDProgramManager functionalities.
>    * Contenthub Store and Search parts (all interfaces and REST APIs) are
> adjusted so that they can operate with LDPath programs.
>
> - Web GUI of Contenthub only operates on the default Solr index
> ("contenthub"). Enabling other cores (generated through LDPath programs) is
> in the TODO list.
>
> - Search logic has been implemented from scratch.
>    * Search engine pattern has been removed for document search.
>    * Content items are indexed through Solr cores. Therefore all search on
> the content items are performed through Solr indexes.
>    * Search interface has been splitted into there different interfaces:
> SolrSearch, RelatedKeywordSearch and FeaturedSearch.
>    * SolrSearch is compatible with SolrJ. That is, clients who have already
> been using SolrJ can easily switch to SolrSearch API of Contenthub. As a
> result of LDPath integration, additional methods exist in this interface to
> accept LDPath program names (Solr core names). There is a single
> implementation of this interface in Contenthub.
>    * RelatedKeywordSearch exposes a "search engine" pattern, but only to
> search for related keywords. RelatedKeywordSearchManager is the manager to
> handle several implementations of this interface (engines).
>    * In addition to the search results retrieved from SolrSearch, users can
> now send their search keywords (query terms) to RelatedKeywordSearchManager
> to retrieve related keywords from different sources. This can be performed
> as a separate process from SolrSearch.
>    * RelatedKeywordSearch has been implemented by WordnetSearch,
> OntologyResourceSearch and ReferencedSiteSearch. As their names indicate,
> they look for related keywords within their resources. (WordnetSearch can be
> excluded until the license issue is resolved or a new client library is
> used)
>    * FeaturedSearch combines the capabilities of SolrSearch and
> RelatedKeywordSearch in case a client wants to retrieve all results (content
> items and related keywords) from Contenthub search.
>    * FeaturedSearch provides a similar interface to SolrSearch with
> additional methods. However, behaviour is different, it is "featured" in
> this implementation.
>    * FeaturedSearch provides a special method: tokenizeEntities. This method
> takes a query string and finds out whether there exists any entities in the
> query or not. Based on the discovered entities, FeaturedSearch prepares Solr
> queries in special formats to boost the results related with the entities.
> However, this method should be improved to cover a massive number of
> possible cases which can occur during keyword searches.
>    * FeaturedSearch provides special methods to ease the faceted search. Web
> GUI of Contenthub makes use of this interface to enable faceted search.
>
> Some minor improvements are as follows:
>
> - Web resources of Contenthub has been adjusted according to the latest
> improvements.
>
> - Contenthub/core bundle has been removed. Refactoring Contenthub has leaded
> to a more efficient use of several classes, hence currently there is no need
> for a separate core bundle.
>
> - Contenthub parent pom has been adjusted. All dependencies has been moved
> into Stanbol parent.
>
> - helper/cnn-importer repacked under crawler/cnn
>
> - api repacked under servicesapi
>
> - Sling based unit and integration tests are on the way.
>
> All the best,
> Anil.



-- 
Fabian
http://twitter.com/fctwitt

Re: New features for Contenthub

Reply via email to