Author: sinaci
Date: Tue Feb  7 09:47:22 2012
New Revision: 1241396

URL: http://svn.apache.org/viewvc?rev=1241396&view=rev
Log:
contenthub 5 minutes tutorial

Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.mdtext?rev=1241396&r1=1241395&r2=1241396&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.mdtext
 Tue Feb  7 09:47:22 2012
@@ -0,0 +1,119 @@
+Contenthub (5 minutes tutorial)
+The Apache Stanbol Contenthub is an Apache Solr based document repository 
which enables storage of text-based documents and customizable semantic search 
facilities. Contenthub exposes an efficient Java API together with the 
corresponding RESTful services. 
+
+Contenthub is basically a document repository. A document within Contenthub is 
referred as a "Content Item". A content item consists of metadata of the 
document in addition to the text-based content of the document. Contenthub has 
two main subcomponents, namely Store and Search. As their names indicate, Store 
is specifically responsible for persistent storage of content items. And Search 
provides strong semantic search facilities on top of the content items.
+
+Contenthub Store
+----------------
+
+It is the part of Contenthub which actually stores the documents and their 
metadata persistently. In current implementation only text/plain documents are 
supported.
+
+The storage part of the Contenthub provide basic methods such as create, put, 
get and delete. When a document is submitted, it delegates the textual content 
to Stanbol Enhancer to retrieve its enhancements. (Enhancements of a content 
item are called its metadata within the terminology) While submitting the 
document, it is also possible to specify external metadata (in addition to the 
enhancements retrieved from Enhancer) as field:value pairs along with the 
document.
+
+The document itself and all metadata are indexed through an embedded Apache 
Solr core/index which is created specifically for Contenthub. Since documents 
are given unique IDs while indexing, using its unique ID, a document can be 
retrieved or deleted from Contenthub. Contenthub provides an HTML interface for 
its functionalities under the following endpoint, which is available after 
running the full launcher of Apache Stanbol:
+
+    http://localhost:8080/contenthub
+
+Apache Solr can manage several cores (indexes) within the same running 
instance, and Contenthub makes use of this facility to manage different those 
cores. This management performed by LDPath programs[1].
+
+LDPath is a simple path-based query language similar to XPath or SPARQL 
Property Paths that is particularly well-suited for querying and retrieving 
resources from the Linked Data Cloud by following RDF links between resources 
and servers. For example, the following path query would select the names of 
objects (people) who is known by the context resource (the resource on which 
this path is being executed):  
+
+    foaf:knows / foaf:name 
+
+An LDPath program is a collection of path queries. For example, following 
LDPath program can be executed on the resources which can be retrieved from 
Stanbol Enhancer as a result of the enhancement process. An LDPath program can 
be executed on any semantic collection of resources to extract specific 
information.
+
+    @prefix rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#>;
+    @prefix rdfs : <http://www.w3.org/2000/01/rdf-schema#>;
+    @prefix db-ont : <http://dbpedia.org/ontology/>;
+    title = rdfs:label :: xsd:string;
+    dbpediatype = rdf:type :: xsd:anyURI;
+    population = db-ont:populationTotal :: xsd:int;
+
+Given an LDPath program, Contenthub can create a corresponding Solr core to 
index the content items through that core. When you submit a document to 
Contenthub Store by providing an LDPath program, this means the content item 
(the document content and its metadata/enhancements) will be indexed according 
to the fields determined by the LDPath program. For instance, the example 
LDPath program above will lead to a Solr core including the following fields 
(in addition to default configuration and several default fields)
+
+    <field name="title" type="string" stored="true" indexed="true" 
multiValued="true"/>
+    <field name="dbpediatype" type="uri" stored="true" indexed="true" 
multiValued="true"/>
+    <field name="population" type="int" stored="true" indexed="true" 
multiValued="true"/>
+
+To submit an LDPath program, you can use the following command through the 
REST API of Contenthub
+
+    curl -i -X POST -d "name=myindex&program=@prefix rdf : 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>; @prefix rdfs : 
<http://www.w3.org/2000/01/rdf-schema#>; @prefix db-ont : 
<http://dbpedia.org/ontology/>; title = rdfs:label :: xsd:string; dbpediatype = 
rdf:type :: xsd:anyURI; population = db-ont:populationTotal :: xsd:int;" 
http://localhost:8080/contenthub/ldpath/program
+
+You can retrieve the list of managed LDPath programs in JSON format with the 
following command. This is also the list of available Solr cores (except the 
default Solr core)
+
+    curl -i -X GET http://localhost:8080/contenthub/ldpath
+
+LDPath related management is performed through SemanticIndexManager of 
Contenthub. To take advantage of semantic indexes while storing content items, 
you need to specify the name of the index in the path of the url while 
submitting the document. Default index for contenthub is named as "contenthub". 
Hence, following command submits document to the default index:
+
+    curl -i -X POST -H "Content-Type:application/x-www-form-urlencoded" -d 
"title=about me&content=I live in Istanbul.&" 
http://localhost:8080/contenthub/contenthub/store
+
+Following command will store the content item into Solr core names with 
"myindex". Therefore, the indexing will be performed through the field 
properties indicated with the LDPath program named with "myindex".
+
+    curl -i -X POST -H "Content-Type:application/x-www-form-urlencoded" -d 
"title=about me&content=I live in Istanbul.&" 
http://localhost:8080/contenthub/myindex/store
+
+Contenthub Search
+-----------------
+
+Contenthub provides three search interfaces so that capabilities of Stanbol 
can be adopted by the users through different levels of complexities. These 
interfaces are;
+
+  - **SolrSearch**: provides native Solr interface to the outside world.
+    Retrieved the resulting content items (documents) from the Solr
+    backend. SolrJ users can easily make use of this interface. Search
+    is performed on the corresponding Solr index and results are
+    returned in "org.apache.solr.client.solrj.response.QueryResponse"
+    format.
+  - **RelatedKeywordSearch**: provides supporting functionalities for search
+    facilities. Given a keyword, services of this interface finds other
+    related keywords from several sources. Wordnet, domain ontologies
+    and referenced sites are the data sources for these services to
+    retrieve the related keywords.
+  - **FeaturedSearch**: Combines the services of SolrSearch and
+    RelatedKeywordSearch for the users who want the results of a query
+    term all in one interface. Featured search not only returns
+    resulting documents, but also related keywords retrieved from
+    various resources (if the resources are available within the running
+    Stanbol instance) Given a query term, returns the resultant
+    documents from the queried Solr core/index and related keywords from
+    different sources.
+
+       
+Following request retrieves all documents from the default index (whose name 
is "contenthub") of Solr:
+
+    http://localhost:8080/solr/default/contenthub/select?q=*:*
+
+Following request retrieves all documents from the Solr index named as 
"myindex":
+
+    http://localhost:8080/solr/default/myindex/select?q=*:*
+
+RelatedKeywordSearch is performed by three independent search engines within 
the Stanbol system, namely: 
+
+  - **OntologyResourceSearch**: If an ontology is already registered to
+    Stanbol (e.g. a domain ontology), it can be used to look for similar
+    keywords, given a keyword. A SPARQL query based on a LARQ index is
+    executed on the specified ontology to find individual and class
+    resources related with the keyword.
+  - **ReferencedSiteSearch**: Referenced sites are used to retrieve the
+    enhancements of a content item. Stanbol Enhancer handles all
+    enhancement operations through the referenced sites. This interface
+    makes use of the referenced sites to look for similar keywords,
+    given a keyword.
+  - **WordnetSearch**: If a Wordnet database is registered to the system
+    (through the OSGi console), this service is ready for use. Looks for
+    several relations among keywords (such as synonyms, hyponyms etc...)
+    and retrieves a list of related keywords from the Wordnet database.
+
+       
+Following command will retrieve related keywords about "turkey" from 
referenced sites and wordnet (ReferencedSiteSearch and WordnetSearch). Since no 
ontology is specified, OntologyResourceSearch will not execute.
+
+    curl -i -X GET -H "Accept: application/json" 
http://localhost:8080/contenthub/contenthub/search/related?keyword=turkey
+
+If URI of an ontology is also specified with the keyword as follows, result of 
the service will include related keywords found through the specified ontology 
in addition to referenced site and wordnet data. Following command will add the 
related keywords of "turkey" which are retrieved from the ontology identified 
with "uri-dummy" to the search result of related keyword service.
+
+    curl -i -X GET -H "Accept: application/json" 
http://localhost:8080/contenthub/contenthub/search/related?keyword=turkey&ontologyURI=uri-dummy
+
+Lastly, Contenthub provides a featured search interface which combines the 
services of SolrSearch and RelatedKeywordSearch. Results of the services of 
FeaturedSearch interface includes resultant documents and related keywords of 
the given query term. Following query will retrieve the documents whose indexed 
fileds includes the term "turkey" and related keywords from several sources 
about "turkey".
+
+    curl -i -X GET -H "Accept: application/json" -H "Content-Type:text/plain" 
http://localhost:8080/contenthub/contenthub/search/featured?queryTerm=turkey
+
+  [1]: http://code.google.com/p/ldpath/
+  [2]: http://code.google.com/p/ldpath/wiki/PathLanguage
\ No newline at end of file


Reply via email to