Author: buildbot
Date: Tue Feb 7 09:47:29 2012
New Revision: 804087
Log:
Staging update by buildbot for stanbol
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
(original)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
Tue Feb 7 09:47:29 2012
@@ -57,7 +57,124 @@
<div id="content">
<h1 class="title"></h1>
-
+ <p>Contenthub (5 minutes tutorial)
+The Apache Stanbol Contenthub is an Apache Solr based document repository
which enables storage of text-based documents and customizable semantic search
facilities. Contenthub exposes an efficient Java API together with the
corresponding RESTful services. </p>
+<p>Contenthub is basically a document repository. A document within Contenthub
is referred as a "Content Item". A content item consists of metadata of the
document in addition to the text-based content of the document. Contenthub has
two main subcomponents, namely Store and Search. As their names indicate, Store
is specifically responsible for persistent storage of content items. And Search
provides strong semantic search facilities on top of the content items.</p>
+<h2 id="contenthub-store">Contenthub Store</h2>
+<p>It is the part of Contenthub which actually stores the documents and their
metadata persistently. In current implementation only text/plain documents are
supported.</p>
+<p>The storage part of the Contenthub provide basic methods such as create,
put, get and delete. When a document is submitted, it delegates the textual
content to Stanbol Enhancer to retrieve its enhancements. (Enhancements of a
content item are called its metadata within the terminology) While submitting
the document, it is also possible to specify external metadata (in addition to
the enhancements retrieved from Enhancer) as field:value pairs along with the
document.</p>
+<p>The document itself and all metadata are indexed through an embedded Apache
Solr core/index which is created specifically for Contenthub. Since documents
are given unique IDs while indexing, using its unique ID, a document can be
retrieved or deleted from Contenthub. Contenthub provides an HTML interface for
its functionalities under the following endpoint, which is available after
running the full launcher of Apache Stanbol:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="o">/</span><span class="n">contenthub</span>
+</pre></div>
+
+
+<p>Apache Solr can manage several cores (indexes) within the same running
instance, and Contenthub makes use of this facility to manage different those
cores. This management performed by LDPath programs[1].</p>
+<p>LDPath is a simple path-based query language similar to XPath or SPARQL
Property Paths that is particularly well-suited for querying and retrieving
resources from the Linked Data Cloud by following RDF links between resources
and servers. For example, the following path query would select the names of
objects (people) who is known by the context resource (the resource on which
this path is being executed):<br />
+</p>
+<div class="codehilite"><pre><span class="err">foaf:knows</span> <span
class="err">/</span> <span class="err">foaf:name</span>
+</pre></div>
+
+
+<p>An LDPath program is a collection of path queries. For example, following
LDPath program can be executed on the resources which can be retrieved from
Stanbol Enhancer as a result of the enhancement process. An LDPath program can
be executed on any semantic collection of resources to extract specific
information.</p>
+<div class="codehilite"><pre><span class="nv">@prefix</span> <span
class="n">rdf</span> <span class="p">:</span> <span
class="sr"><http://www.w3.org/1999/02/22-rdf-syntax-ns#></span><span
class="p">;</span>
+<span class="nv">@prefix</span> <span class="n">rdfs</span> <span
class="p">:</span> <span
class="sr"><http://www.w3.org/2000/01/rdf-schema#></span><span
class="p">;</span>
+<span class="nv">@prefix</span> <span class="n">db</span><span
class="o">-</span><span class="n">ont</span> <span class="p">:</span> <span
class="sr"><http://dbpedia.org/ontology/></span><span class="p">;</span>
+<span class="n">title</span> <span class="o">=</span> <span
class="n">rdfs:label</span> <span class="o">::</span> <span
class="n">xsd:string</span><span class="p">;</span>
+<span class="n">dbpediatype</span> <span class="o">=</span> <span
class="n">rdf:type</span> <span class="o">::</span> <span
class="n">xsd:anyURI</span><span class="p">;</span>
+<span class="n">population</span> <span class="o">=</span> <span
class="n">db</span><span class="o">-</span><span
class="n">ont:populationTotal</span> <span class="o">::</span> <span
class="n">xsd:int</span><span class="p">;</span>
+</pre></div>
+
+
+<p>Given an LDPath program, Contenthub can create a corresponding Solr core to
index the content items through that core. When you submit a document to
Contenthub Store by providing an LDPath program, this means the content item
(the document content and its metadata/enhancements) will be indexed according
to the fields determined by the LDPath program. For instance, the example
LDPath program above will lead to a Solr core including the following fields
(in addition to default configuration and several default fields)</p>
+<div class="codehilite"><pre><span class="o"><</span><span
class="n">field</span> <span class="n">name</span><span class="o">=</span><span
class="s">"title"</span> <span class="n">type</span><span
class="o">=</span><span class="s">"string"</span> <span
class="n">stored</span><span class="o">=</span><span
class="s">"true"</span> <span class="n">indexed</span><span
class="o">=</span><span class="s">"true"</span> <span
class="n">multiValued</span><span class="o">=</span><span
class="s">"true"</span><span class="o">/></span>
+<span class="o"><</span><span class="n">field</span> <span
class="n">name</span><span class="o">=</span><span
class="s">"dbpediatype"</span> <span class="n">type</span><span
class="o">=</span><span class="s">"uri"</span> <span
class="n">stored</span><span class="o">=</span><span
class="s">"true"</span> <span class="n">indexed</span><span
class="o">=</span><span class="s">"true"</span> <span
class="n">multiValued</span><span class="o">=</span><span
class="s">"true"</span><span class="o">/></span>
+<span class="o"><</span><span class="n">field</span> <span
class="n">name</span><span class="o">=</span><span
class="s">"population"</span> <span class="n">type</span><span
class="o">=</span><span class="s">"int"</span> <span
class="n">stored</span><span class="o">=</span><span
class="s">"true"</span> <span class="n">indexed</span><span
class="o">=</span><span class="s">"true"</span> <span
class="n">multiValued</span><span class="o">=</span><span
class="s">"true"</span><span class="o">/></span>
+</pre></div>
+
+
+<p>To submit an LDPath program, you can use the following command through the
REST API of Contenthub</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span
class="n">d</span> <span class="s">"name=myindex&program=@prefix rdf :
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>; @prefix rdfs :
<http://www.w3.org/2000/01/rdf-schema#>; @prefix db-ont :
<http://dbpedia.org/ontology/>; title = rdfs:label :: xsd:string;
dbpediatype = rdf:type :: xsd:anyURI; population = db-ont:populationTotal ::
xsd:int;"</span> <span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/contenthub/</span><span class="n">ldpath</span><span
class="o">/</span><span class="n">program</span>
+</pre></div>
+
+
+<p>You can retrieve the list of managed LDPath programs in JSON format with
the following command. This is also the list of available Solr cores (except
the default Solr core)</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">GET</span> <span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/contenthub/</span><span class="n">ldpath</span>
+</pre></div>
+
+
+<p>LDPath related management is performed through SemanticIndexManager of
Contenthub. To take advantage of semantic indexes while storing content items,
you need to specify the name of the index in the path of the url while
submitting the document. Default index for contenthub is named as "contenthub".
Hence, following command submits document to the default index:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span
class="n">H</span> <span
class="s">"Content-Type:application/x-www-form-urlencoded"</span>
<span class="o">-</span><span class="n">d</span> <span
class="s">"title=about me&content=I live in
Istanbul.&"</span> <span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/contenthub/co</span><span class="n">ntenthub</span><span
class="o">/</span><span class="n">store</span>
+</pre></div>
+
+
+<p>Following command will store the content item into Solr core names with
"myindex". Therefore, the indexing will be performed through the field
properties indicated with the LDPath program named with "myindex".</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span
class="n">H</span> <span
class="s">"Content-Type:application/x-www-form-urlencoded"</span>
<span class="o">-</span><span class="n">d</span> <span
class="s">"title=about me&content=I live in
Istanbul.&"</span> <span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/contenthub/m</span><span class="n">yindex</span><span
class="o">/</span><span class="n">store</span>
+</pre></div>
+
+
+<h2 id="contenthub-search">Contenthub Search</h2>
+<p>Contenthub provides three search interfaces so that capabilities of Stanbol
can be adopted by the users through different levels of complexities. These
interfaces are;</p>
+<ul>
+<li><strong>SolrSearch</strong>: provides native Solr interface to the outside
world.
+ Retrieved the resulting content items (documents) from the Solr
+ backend. SolrJ users can easily make use of this interface. Search
+ is performed on the corresponding Solr index and results are
+ returned in "org.apache.solr.client.solrj.response.QueryResponse"
+ format.</li>
+<li><strong>RelatedKeywordSearch</strong>: provides supporting functionalities
for search
+ facilities. Given a keyword, services of this interface finds other
+ related keywords from several sources. Wordnet, domain ontologies
+ and referenced sites are the data sources for these services to
+ retrieve the related keywords.</li>
+<li><strong>FeaturedSearch</strong>: Combines the services of SolrSearch and
+ RelatedKeywordSearch for the users who want the results of a query
+ term all in one interface. Featured search not only returns
+ resulting documents, but also related keywords retrieved from
+ various resources (if the resources are available within the running
+ Stanbol instance) Given a query term, returns the resultant
+ documents from the queried Solr core/index and related keywords from
+ different sources.</li>
+</ul>
+<p>Following request retrieves all documents from the default index (whose
name is "contenthub") of Solr:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/solr/</span><span class="n">default</span><span
class="sr">/contenthub/s</span><span class="n">elect</span><span
class="p">?</span><span class="n">q</span><span class="o">=*</span><span
class="p">:</span><span class="o">*</span>
+</pre></div>
+
+
+<p>Following request retrieves all documents from the Solr index named as
"myindex":</p>
+<div class="codehilite"><pre><span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/solr/</span><span class="n">default</span><span
class="sr">/myindex/s</span><span class="n">elect</span><span
class="p">?</span><span class="n">q</span><span class="o">=*</span><span
class="p">:</span><span class="o">*</span>
+</pre></div>
+
+
+<p>RelatedKeywordSearch is performed by three independent search engines
within the Stanbol system, namely: </p>
+<ul>
+<li><strong>OntologyResourceSearch</strong>: If an ontology is already
registered to
+ Stanbol (e.g. a domain ontology), it can be used to look for similar
+ keywords, given a keyword. A SPARQL query based on a LARQ index is
+ executed on the specified ontology to find individual and class
+ resources related with the keyword.</li>
+<li><strong>ReferencedSiteSearch</strong>: Referenced sites are used to
retrieve the
+ enhancements of a content item. Stanbol Enhancer handles all
+ enhancement operations through the referenced sites. This interface
+ makes use of the referenced sites to look for similar keywords,
+ given a keyword.</li>
+<li><strong>WordnetSearch</strong>: If a Wordnet database is registered to the
system
+ (through the OSGi console), this service is ready for use. Looks for
+ several relations among keywords (such as synonyms, hyponyms etc...)
+ and retrieves a list of related keywords from the Wordnet database.</li>
+</ul>
+<p>Following command will retrieve related keywords about "turkey" from
referenced sites and wordnet (ReferencedSiteSearch and WordnetSearch). Since no
ontology is specified, OntologyResourceSearch will not execute.</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span
class="n">H</span> <span class="s">"Accept: application/json"</span>
<span class="n">http:</span><span class="sr">//</span><span
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span
class="n">ntenthub</span><span class="sr">/search/</span><span
class="n">related</span><span class="p">?</span><span
class="n">keyword</span><span class="o">=</span><span class="n">turkey</span>
+</pre></div>
+
+
+<p>If URI of an ontology is also specified with the keyword as follows, result
of the service will include related keywords found through the specified
ontology in addition to referenced site and wordnet data. Following command
will add the related keywords of "turkey" which are retrieved from the ontology
identified with "uri-dummy" to the search result of related keyword service.</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span
class="n">H</span> <span class="s">"Accept: application/json"</span>
<span class="n">http:</span><span class="sr">//</span><span
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span
class="n">ntenthub</span><span class="sr">/search/</span><span
class="n">related</span><span class="p">?</span><span
class="n">keyword</span><span class="o">=</span><span
class="n">turkey</span><span class="o">&</span><span
class="n">ontologyURI</span><span class="o">=</span><span
class="n">uri</span><span class="o">-</span><span class="n">dummy</span>
+</pre></div>
+
+
+<p>Lastly, Contenthub provides a featured search interface which combines the
services of SolrSearch and RelatedKeywordSearch. Results of the services of
FeaturedSearch interface includes resultant documents and related keywords of
the given query term. Following query will retrieve the documents whose indexed
fileds includes the term "turkey" and related keywords from several sources
about "turkey".</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">i</span> <span class="o">-</span><span
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span
class="n">H</span> <span class="s">"Accept: application/json"</span>
<span class="o">-</span><span class="n">H</span> <span
class="s">"Content-Type:text/plain"</span> <span
class="n">http:</span><span class="sr">//</span><span
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span
class="n">ntenthub</span><span class="sr">/search/</span><span
class="n">featured</span><span class="p">?</span><span
class="n">queryTerm</span><span class="o">=</span><span class="n">turkey</span>
+</pre></div>
</div>
<div id="footer">