Hi Suat, this is a really impressive list of changes and features. Do you have plans regarding documentation, demos, tutorials?
Best, - Fabian Am 26. Januar 2012 16:46 schrieb Ali Anil Sinaci <[email protected]>: > Dear Stanbolers, > > I have committed major changes related to Contenthub. Below, you can find > some explanations about the changes. I have grouped them under two major > issues in Jira (STANBOL-469 and STANBOL-470) although there are several > sub-issues. Later improvements will be issued under their specific topics. > > Contenthub includes two main parts: store and search. Solr is the back-end > for all store and retrieve operations of content items (SolrContentItem > extends ContentItem). Major improvements are as follows: > > - Store maintains a default Solr core (called "contenthub") through the > EmbeddedSolrServer. This default core indexes several semantic properties of > entities in case they are retrieved from the referenced sites. (Current > dbpedia index does not include most of these properties. We have a larger > index for this) > > - LDPath has been integrated into Contenthub. > * Several Solr cores can be managed through LDProgramManager of > Contenthub. > * Each LDPath program corresponds to a unique Solr core. LDPath programs > (hence Solr cores) are uniquely identified through their names. > LDProgramManager and SolrCoreManager provides the required synchronization > between Solr cores and LDPath programs. > * Submitted LDPath programs are saved into separate files and accessed > via a simple cache mechanism. > * CRD operations for LDPath programs are provided through > LDProgramManager > * ClerezzaBackend is implemented as an LDPath backend. > * LDProgramManager has a special method (executeProgram) to execute the > LDPath programs on Clerezza MGraphs. > * REST services are ready for LDProgramManager functionalities. > * Contenthub Store and Search parts (all interfaces and REST APIs) are > adjusted so that they can operate with LDPath programs. > > - Web GUI of Contenthub only operates on the default Solr index > ("contenthub"). Enabling other cores (generated through LDPath programs) is > in the TODO list. > > - Search logic has been implemented from scratch. > * Search engine pattern has been removed for document search. > * Content items are indexed through Solr cores. Therefore all search on > the content items are performed through Solr indexes. > * Search interface has been splitted into there different interfaces: > SolrSearch, RelatedKeywordSearch and FeaturedSearch. > * SolrSearch is compatible with SolrJ. That is, clients who have already > been using SolrJ can easily switch to SolrSearch API of Contenthub. As a > result of LDPath integration, additional methods exist in this interface to > accept LDPath program names (Solr core names). There is a single > implementation of this interface in Contenthub. > * RelatedKeywordSearch exposes a "search engine" pattern, but only to > search for related keywords. RelatedKeywordSearchManager is the manager to > handle several implementations of this interface (engines). > * In addition to the search results retrieved from SolrSearch, users can > now send their search keywords (query terms) to RelatedKeywordSearchManager > to retrieve related keywords from different sources. This can be performed > as a separate process from SolrSearch. > * RelatedKeywordSearch has been implemented by WordnetSearch, > OntologyResourceSearch and ReferencedSiteSearch. As their names indicate, > they look for related keywords within their resources. (WordnetSearch can be > excluded until the license issue is resolved or a new client library is > used) > * FeaturedSearch combines the capabilities of SolrSearch and > RelatedKeywordSearch in case a client wants to retrieve all results (content > items and related keywords) from Contenthub search. > * FeaturedSearch provides a similar interface to SolrSearch with > additional methods. However, behaviour is different, it is "featured" in > this implementation. > * FeaturedSearch provides a special method: tokenizeEntities. This method > takes a query string and finds out whether there exists any entities in the > query or not. Based on the discovered entities, FeaturedSearch prepares Solr > queries in special formats to boost the results related with the entities. > However, this method should be improved to cover a massive number of > possible cases which can occur during keyword searches. > * FeaturedSearch provides special methods to ease the faceted search. Web > GUI of Contenthub makes use of this interface to enable faceted search. > > Some minor improvements are as follows: > > - Web resources of Contenthub has been adjusted according to the latest > improvements. > > - Contenthub/core bundle has been removed. Refactoring Contenthub has leaded > to a more efficient use of several classes, hence currently there is no need > for a separate core bundle. > > - Contenthub parent pom has been adjusted. All dependencies has been moved > into Stanbol parent. > > - helper/cnn-importer repacked under crawler/cnn > > - api repacked under servicesapi > > - Sling based unit and integration tests are on the way. > > All the best, > Anil. -- Fabian http://twitter.com/fctwitt
