contenthub5min.html

buildbot Tue, 07 Feb 2012 01:47:59 -0800

Author: buildbot
Date: Tue Feb  7 09:47:29 2012
New Revision: 804087

Log:
Staging update by buildbot for stanbol


Modified:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html
 Tue Feb  7 09:47:29 2012
@@ -57,7 +57,124 @@
   
   <div id="content">
     <h1 class="title"></h1>
-    
+    <p>Contenthub (5 minutes tutorial)
+The Apache Stanbol Contenthub is an Apache Solr based document repository 
which enables storage of text-based documents and customizable semantic search 
facilities. Contenthub exposes an efficient Java API together with the 
corresponding RESTful services. </p>
+<p>Contenthub is basically a document repository. A document within Contenthub 
is referred as a "Content Item". A content item consists of metadata of the 
document in addition to the text-based content of the document. Contenthub has 
two main subcomponents, namely Store and Search. As their names indicate, Store 
is specifically responsible for persistent storage of content items. And Search 
provides strong semantic search facilities on top of the content items.</p>
+<h2 id="contenthub-store">Contenthub Store</h2>
+<p>It is the part of Contenthub which actually stores the documents and their 
metadata persistently. In current implementation only text/plain documents are 
supported.</p>
+<p>The storage part of the Contenthub provide basic methods such as create, 
put, get and delete. When a document is submitted, it delegates the textual 
content to Stanbol Enhancer to retrieve its enhancements. (Enhancements of a 
content item are called its metadata within the terminology) While submitting 
the document, it is also possible to specify external metadata (in addition to 
the enhancements retrieved from Enhancer) as field:value pairs along with the 
document.</p>
+<p>The document itself and all metadata are indexed through an embedded Apache 
Solr core/index which is created specifically for Contenthub. Since documents 
are given unique IDs while indexing, using its unique ID, a document can be 
retrieved or deleted from Contenthub. Contenthub provides an HTML interface for 
its functionalities under the following endpoint, which is available after 
running the full launcher of Apache Stanbol:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="o">/</span><span class="n">contenthub</span>
+</pre></div>
+
+
+<p>Apache Solr can manage several cores (indexes) within the same running 
instance, and Contenthub makes use of this facility to manage different those 
cores. This management performed by LDPath programs[1].</p>
+<p>LDPath is a simple path-based query language similar to XPath or SPARQL 
Property Paths that is particularly well-suited for querying and retrieving 
resources from the Linked Data Cloud by following RDF links between resources 
and servers. For example, the following path query would select the names of 
objects (people) who is known by the context resource (the resource on which 
this path is being executed):<br />
+</p>
+<div class="codehilite"><pre><span class="err">foaf:knows</span> <span 
class="err">/</span> <span class="err">foaf:name</span>
+</pre></div>
+
+
+<p>An LDPath program is a collection of path queries. For example, following 
LDPath program can be executed on the resources which can be retrieved from 
Stanbol Enhancer as a result of the enhancement process. An LDPath program can 
be executed on any semantic collection of resources to extract specific 
information.</p>
+<div class="codehilite"><pre><span class="nv">@prefix</span> <span 
class="n">rdf</span> <span class="p">:</span> <span 
class="sr">&lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;</span><span 
class="p">;</span>
+<span class="nv">@prefix</span> <span class="n">rdfs</span> <span 
class="p">:</span> <span 
class="sr">&lt;http://www.w3.org/2000/01/rdf-schema#&gt;</span><span 
class="p">;</span>
+<span class="nv">@prefix</span> <span class="n">db</span><span 
class="o">-</span><span class="n">ont</span> <span class="p">:</span> <span 
class="sr">&lt;http://dbpedia.org/ontology/&gt;</span><span class="p">;</span>
+<span class="n">title</span> <span class="o">=</span> <span 
class="n">rdfs:label</span> <span class="o">::</span> <span 
class="n">xsd:string</span><span class="p">;</span>
+<span class="n">dbpediatype</span> <span class="o">=</span> <span 
class="n">rdf:type</span> <span class="o">::</span> <span 
class="n">xsd:anyURI</span><span class="p">;</span>
+<span class="n">population</span> <span class="o">=</span> <span 
class="n">db</span><span class="o">-</span><span 
class="n">ont:populationTotal</span> <span class="o">::</span> <span 
class="n">xsd:int</span><span class="p">;</span>
+</pre></div>
+
+
+<p>Given an LDPath program, Contenthub can create a corresponding Solr core to 
index the content items through that core. When you submit a document to 
Contenthub Store by providing an LDPath program, this means the content item 
(the document content and its metadata/enhancements) will be indexed according 
to the fields determined by the LDPath program. For instance, the example 
LDPath program above will lead to a Solr core including the following fields 
(in addition to default configuration and several default fields)</p>
+<div class="codehilite"><pre><span class="o">&lt;</span><span 
class="n">field</span> <span class="n">name</span><span class="o">=</span><span 
class="s">&quot;title&quot;</span> <span class="n">type</span><span 
class="o">=</span><span class="s">&quot;string&quot;</span> <span 
class="n">stored</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span> <span class="n">indexed</span><span 
class="o">=</span><span class="s">&quot;true&quot;</span> <span 
class="n">multiValued</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span><span class="o">/&gt;</span>
+<span class="o">&lt;</span><span class="n">field</span> <span 
class="n">name</span><span class="o">=</span><span 
class="s">&quot;dbpediatype&quot;</span> <span class="n">type</span><span 
class="o">=</span><span class="s">&quot;uri&quot;</span> <span 
class="n">stored</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span> <span class="n">indexed</span><span 
class="o">=</span><span class="s">&quot;true&quot;</span> <span 
class="n">multiValued</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span><span class="o">/&gt;</span>
+<span class="o">&lt;</span><span class="n">field</span> <span 
class="n">name</span><span class="o">=</span><span 
class="s">&quot;population&quot;</span> <span class="n">type</span><span 
class="o">=</span><span class="s">&quot;int&quot;</span> <span 
class="n">stored</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span> <span class="n">indexed</span><span 
class="o">=</span><span class="s">&quot;true&quot;</span> <span 
class="n">multiValued</span><span class="o">=</span><span 
class="s">&quot;true&quot;</span><span class="o">/&gt;</span>
+</pre></div>
+
+
+<p>To submit an LDPath program, you can use the following command through the 
REST API of Contenthub</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span 
class="n">d</span> <span class="s">&quot;name=myindex&amp;program=@prefix rdf : 
&lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;; @prefix rdfs : 
&lt;http://www.w3.org/2000/01/rdf-schema#&gt;; @prefix db-ont : 
&lt;http://dbpedia.org/ontology/&gt;; title = rdfs:label :: xsd:string; 
dbpediatype = rdf:type :: xsd:anyURI; population = db-ont:populationTotal :: 
xsd:int;&quot;</span> <span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/contenthub/</span><span class="n">ldpath</span><span 
class="o">/</span><span class="n">program</span>
+</pre></div>
+
+
+<p>You can retrieve the list of managed LDPath programs in JSON format with 
the following command. This is also the list of available Solr cores (except 
the default Solr core)</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">GET</span> <span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/contenthub/</span><span class="n">ldpath</span>
+</pre></div>
+
+
+<p>LDPath related management is performed through SemanticIndexManager of 
Contenthub. To take advantage of semantic indexes while storing content items, 
you need to specify the name of the index in the path of the url while 
submitting the document. Default index for contenthub is named as "contenthub". 
Hence, following command submits document to the default index:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span 
class="n">H</span> <span 
class="s">&quot;Content-Type:application/x-www-form-urlencoded&quot;</span> 
<span class="o">-</span><span class="n">d</span> <span 
class="s">&quot;title=about me&amp;content=I live in 
Istanbul.&amp;&quot;</span> <span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/contenthub/co</span><span class="n">ntenthub</span><span 
class="o">/</span><span class="n">store</span>
+</pre></div>
+
+
+<p>Following command will store the content item into Solr core names with 
"myindex". Therefore, the indexing will be performed through the field 
properties indicated with the LDPath program named with "myindex".</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">POST</span> <span class="o">-</span><span 
class="n">H</span> <span 
class="s">&quot;Content-Type:application/x-www-form-urlencoded&quot;</span> 
<span class="o">-</span><span class="n">d</span> <span 
class="s">&quot;title=about me&amp;content=I live in 
Istanbul.&amp;&quot;</span> <span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/contenthub/m</span><span class="n">yindex</span><span 
class="o">/</span><span class="n">store</span>
+</pre></div>
+
+
+<h2 id="contenthub-search">Contenthub Search</h2>
+<p>Contenthub provides three search interfaces so that capabilities of Stanbol 
can be adopted by the users through different levels of complexities. These 
interfaces are;</p>
+<ul>
+<li><strong>SolrSearch</strong>: provides native Solr interface to the outside 
world.
+    Retrieved the resulting content items (documents) from the Solr
+    backend. SolrJ users can easily make use of this interface. Search
+    is performed on the corresponding Solr index and results are
+    returned in "org.apache.solr.client.solrj.response.QueryResponse"
+    format.</li>
+<li><strong>RelatedKeywordSearch</strong>: provides supporting functionalities 
for search
+    facilities. Given a keyword, services of this interface finds other
+    related keywords from several sources. Wordnet, domain ontologies
+    and referenced sites are the data sources for these services to
+    retrieve the related keywords.</li>
+<li><strong>FeaturedSearch</strong>: Combines the services of SolrSearch and
+    RelatedKeywordSearch for the users who want the results of a query
+    term all in one interface. Featured search not only returns
+    resulting documents, but also related keywords retrieved from
+    various resources (if the resources are available within the running
+    Stanbol instance) Given a query term, returns the resultant
+    documents from the queried Solr core/index and related keywords from
+    different sources.</li>
+</ul>
+<p>Following request retrieves all documents from the default index (whose 
name is "contenthub") of Solr:</p>
+<div class="codehilite"><pre><span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/solr/</span><span class="n">default</span><span 
class="sr">/contenthub/s</span><span class="n">elect</span><span 
class="p">?</span><span class="n">q</span><span class="o">=*</span><span 
class="p">:</span><span class="o">*</span>
+</pre></div>
+
+
+<p>Following request retrieves all documents from the Solr index named as 
"myindex":</p>
+<div class="codehilite"><pre><span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/solr/</span><span class="n">default</span><span 
class="sr">/myindex/s</span><span class="n">elect</span><span 
class="p">?</span><span class="n">q</span><span class="o">=*</span><span 
class="p">:</span><span class="o">*</span>
+</pre></div>
+
+
+<p>RelatedKeywordSearch is performed by three independent search engines 
within the Stanbol system, namely: </p>
+<ul>
+<li><strong>OntologyResourceSearch</strong>: If an ontology is already 
registered to
+    Stanbol (e.g. a domain ontology), it can be used to look for similar
+    keywords, given a keyword. A SPARQL query based on a LARQ index is
+    executed on the specified ontology to find individual and class
+    resources related with the keyword.</li>
+<li><strong>ReferencedSiteSearch</strong>: Referenced sites are used to 
retrieve the
+    enhancements of a content item. Stanbol Enhancer handles all
+    enhancement operations through the referenced sites. This interface
+    makes use of the referenced sites to look for similar keywords,
+    given a keyword.</li>
+<li><strong>WordnetSearch</strong>: If a Wordnet database is registered to the 
system
+    (through the OSGi console), this service is ready for use. Looks for
+    several relations among keywords (such as synonyms, hyponyms etc...)
+    and retrieves a list of related keywords from the Wordnet database.</li>
+</ul>
+<p>Following command will retrieve related keywords about "turkey" from 
referenced sites and wordnet (ReferencedSiteSearch and WordnetSearch). Since no 
ontology is specified, OntologyResourceSearch will not execute.</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span 
class="n">H</span> <span class="s">&quot;Accept: application/json&quot;</span> 
<span class="n">http:</span><span class="sr">//</span><span 
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span 
class="n">ntenthub</span><span class="sr">/search/</span><span 
class="n">related</span><span class="p">?</span><span 
class="n">keyword</span><span class="o">=</span><span class="n">turkey</span>
+</pre></div>
+
+
+<p>If URI of an ontology is also specified with the keyword as follows, result 
of the service will include related keywords found through the specified 
ontology in addition to referenced site and wordnet data. Following command 
will add the related keywords of "turkey" which are retrieved from the ontology 
identified with "uri-dummy" to the search result of related keyword service.</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span 
class="n">H</span> <span class="s">&quot;Accept: application/json&quot;</span> 
<span class="n">http:</span><span class="sr">//</span><span 
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span 
class="n">ntenthub</span><span class="sr">/search/</span><span 
class="n">related</span><span class="p">?</span><span 
class="n">keyword</span><span class="o">=</span><span 
class="n">turkey</span><span class="o">&amp;</span><span 
class="n">ontologyURI</span><span class="o">=</span><span 
class="n">uri</span><span class="o">-</span><span class="n">dummy</span>
+</pre></div>
+
+
+<p>Lastly, Contenthub provides a featured search interface which combines the 
services of SolrSearch and RelatedKeywordSearch. Results of the services of 
FeaturedSearch interface includes resultant documents and related keywords of 
the given query term. Following query will retrieve the documents whose indexed 
fileds includes the term "turkey" and related keywords from several sources 
about "turkey".</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">i</span> <span class="o">-</span><span 
class="n">X</span> <span class="n">GET</span> <span class="o">-</span><span 
class="n">H</span> <span class="s">&quot;Accept: application/json&quot;</span> 
<span class="o">-</span><span class="n">H</span> <span 
class="s">&quot;Content-Type:text/plain&quot;</span> <span 
class="n">http:</span><span class="sr">//</span><span 
class="n">localhost:8080</span><span class="sr">/contenthub/co</span><span 
class="n">ntenthub</span><span class="sr">/search/</span><span 
class="n">featured</span><span class="p">?</span><span 
class="n">queryTerm</span><span class="o">=</span><span class="n">turkey</span>
+</pre></div>
   </div>
   
   <div id="footer">

svn commit: r804087 - /websites/staging/stanbol/trunk/content/stanbol/docs/trunk/contenthub/contenthub5min.html

Reply via email to