Author: buildbot
Date: Fri Feb 17 10:29:52 2012
New Revision: 805166

Log:
Staging update by buildbot for stanbol

Added:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancerrest.html
Modified:
    websites/staging/stanbol/trunk/   (props changed)

Propchange: websites/staging/stanbol/trunk/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Feb 17 10:29:52 2012
@@ -1 +1 @@
-1244923
+1245375

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancerrest.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancerrest.html
 (added)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancerrest.html
 Fri Feb 17 10:29:52 2012
@@ -0,0 +1,353 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Stanbol Enhancer RESTful API</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" 
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" 
height="101" border="0" 
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL";>Issue 
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Stanbol Enhancer RESTful API</h1>
+    <p>The RESTful service endpoint provided by the Stanbol Enhancer is a 
stateless interface that allows the caller to submit content and get the 
resulting enhancements formatted as RDF at once without storing anything on the 
server-side. More advanced options also allow to parse pre-existing metadata, 
parse and request alternate content versions and additional metadata created by 
the Enhancer or specific Enhancement Engines.</p>
+
+<p>The here described RESTful interface is provided on several Endpoints</p>
+<ul>
+<li><strong>'/enhancer':</strong> The main Endpoint of the Stanbol Enhancer. 
Parsed content will get enhanced by using the default enhancement chain.</li>
+<li><strong>'/enhancer/chain/{chain-name}'</strong>: The Stanbol Enhancer 
supports the configuration of multiple <a href="chains">Enhancement Chains</a>. 
Users can lookup active chains by requests to the 'enhancer/chain' 
endpoint.</li>
+<li><strong>'/engines':</strong> Same as '/enhancer' this ensures backward 
compatibility to older Stanbol versions.</li>
+</ul>
+<h2 id="enhancement_request">Enhancement Request</h2>
+<p>This sections describes how to parse Content to the Stanbol Enhancer that 
gets than analyzed. Results are sent back in the form of a serialized RDF 
graph.</p>
+<p>The content to analyze should be sent in a POST request with the mimetype 
specified in
+the <code>Content-type</code> header. The response will hold the RDF 
enhancement serialized in the format specified in the <code>Accept</code> 
header:</p>
+<div class="codehilite"><pre>curl -X POST -H <span class="s2">&quot;Accept: 
text/turtle&quot;</span> -H <span class="s2">&quot;Content-type: 
text/plain&quot;</span> <span class="se">\</span>
+     --data <span class="s2">&quot;John Smith was born in London.&quot;</span> 
<span class="k">${</span><span class="nv">it</span><span 
class="p">.serviceUrl</span><span class="k">}</span>
+</pre></div>
+
+
+<p>The list of mimetypes accepted as inputs depends on the deployed engines. 
By default most Enhancement Engines can only process plain text content. 
However EnhancementEngines like <a href="engines/metaxaengine.html">Metaxa</a> 
can be used to create 'text/plain' versions of parsed content. This allows also 
to enhance contents with mime types such as html, pdf and MS office documents 
(see the Metaxa documentation for details)</p>
+<p>Stanbol enhancer is able to serialize the response in the following RDF 
formats:</p>
+<div class="codehilite"><pre>application/json (JSON-LD)
+application/rdf+xml (RDF/XML)
+application/rdf+json (RDF/JSON)
+text/turtle (Turtle)
+text/rdf+nt (N-TRIPLES)
+</pre></div>
+
+
+<h3 id="additional_supported_queryparameters">Additional supported 
QueryParameters:</h3>
+<ul>
+<li><strong>uri={content-item-uri}:</strong> By default the URI of the content 
item being enhanced is a local, non de-referencable URI automatically built out 
of a hash digest of the binary content. Sometimes it might be helpful to 
provide the URI of the <a href="contentitem.html">ContentItem</a> to be used in 
the enhancements RDF graph.</li>
+<li><strong>executionmetadata=true/false:</strong> Allows the include of <a 
href="executionmetadata.html">execution metadata</a> in the enhancement 
metadata of the response. Such data include also the <a 
href="chains/executionplan.html">execution plan</a> used to enhance the parsed 
content. This information is typically only useful to clients that want to know 
how the parsed content was processed by the enhancer. NOTE that the execution 
metadata can also be requested by using the multi-part content item API 
described below.</li>
+</ul>
+<p>The following example shows how to send an enhancement request with a
+custom content item URI that will include the execution metadata in the
+response.</p>
+<div class="codehilite"><pre>curl -X POST -H <span class="s2">&quot;Accept: 
text/turtle&quot;</span> -H <span class="s2">&quot;Content-type: 
text/plain&quot;</span> <span class="se">\</span>
+    --data <span class="s2">&quot;John Smith was born in London.&quot;</span> 
<span class="se">\</span>
+    <span 
class="s2">&quot;${it.serviceUrl}?uri=urn:fise-example-content-item&amp;executionmetadata=true&quot;</span>
+</pre></div>
+
+
+<h2 id="multi-part_contentitem_support">Multi-part ContentItem support</h2>
+<p>The multi-part ContentItem extensions to the RESTful API (introduced by <a 
href="https://issues.apache.org/jira/browse/STANBOL-481";>STANBOL-481</a>) are 
considered an advanced usage of the Stanbol Enhancer. </p>
+<p>Users will want to use this extensions if they need to:</p>
+<ul>
+<li>parse multiple versions of the content: Most CMS already do have support 
for converting content to plain text. This API allows to parse both the 
original AND multiple transcoded versions of the content to the Enhancer.</li>
+<li>parse pre-existing metadata: Typically CMS do have already some metadata 
about content parsed to the Stanbol Enhancer (e.g. User provided Tags, 
Categories …). The multi-part extensions do allow to parse such data in 
addition to the content. </li>
+<li>request transcoded versions of the parsed content: This API extensions 
allows to include transcoded (e.g. the 'plain/text') version of parsed content 
in the response. It also allows requests that directly returns transcoded 
content by omitting extracted metadata.</li>
+<li>request additional metadata that are normally not included within the 
metadata of the Enhancement response: This can to request the <a 
href="executionmetadata.html">execution metadata</a> in an own RDF graph, but 
it can also be used to request metadata of specific enhancement engines (TODO: 
add example)</li>
+</ul>
+<h3 id="queryparameters">QueryParameters</h3>
+<p>The following QueryParameters are defined by the multi-part content item 
extension:</p>
+<ul>
+<li>
+<p><strong>outputContentType=[mediaType]:</strong> Allows to specify the 
Mimetypes of content included within the Response of the Stanbol Enhancer. This 
parameter supports wild cards (e.g. '<em>' ... all, 'text/</em>'' ... all text 
versions,  'text/plain' ... only the plain text version). This parameter can be 
used multiple times.</p>
+<p>Responses to requests with this parameter will be encoded as 
<code>multipart/from-data</code>. If the "Accept" header of the request is not 
compatible to <code>multipart/from-data</code> it is assumed as a <code>400 
BAD_REQUEST</code>. For details see the documentation of the <a 
href="contentitem.html#multipart_mime_serialization">Multipart MIME format for 
ContentItems</a>.</p>
+</li>
+<li>
+<p><strong>omitParsed=[true/false]:</strong> Makes only sense in combination 
with  the <code>outputContentType</code> parameter. This allows to exclude all 
content included in the request from the response. A typical combination is 
<code>outputContentType=<em>/</em>&amp;omitParsed=true</code>. The default 
value of this parameter is <code>false</code></p>
+</li>
+<li>
+<p><strong>outputContentPart=[uri/'*']:</strong> This parameter allows to 
explicitly include content parts with a specific URI in the response. Currently 
this only supports <a href="contentitem.html#content_parts">ContentParts</a> 
that are stored as RDF graphs. </p>
+<p>Responses to requests with this parameter will be encoded as 
<code>multipart/from-data</code>. If the "Accept" header of the request is not 
compatible to <code>multipart/from-data</code> it is assumed as a <code>400 
BAD_REQUEST</code>. The selected content parts will be included as MIME parts 
in the returned <a 
href="contentitem.html#multipart_mime_serialization">Multipart MIME formated 
ContentItems</a>. The URI of the part will be used as name. Such parts will be 
added after the "metadata" and the "content" (if present).</p>
+</li>
+<li>
+<p><strong>omitMetadata=[true/false]:</strong> This allows to enable/disable 
the inclusion of the metadata in the response. The default is 
<code>false</code>.</p>
+<p>Typically <code>omitMetadata=true</code> is used when users want to use the 
Stanbol Enhancer just to get one or more ContentParts as an response. Note that
+Requests that use an <code>Accept: {mimeType}</code> header AND 
<code>omitMetadata=true</code> will directly return the content verison of 
<code>{mimeType}</code> and NOT wrap the result as 
<code>multipart/from-data</code>. See also the example further down this 
documentation.</p>
+</li>
+<li>
+<p><strong>rdfFormat=[rdfMimeType]:</strong> This allows for requests that 
result in <code>multipart/from-data</code> encoded responses to specify the 
used RDF serialization format. Supported formats and defaults are the same as 
for normal Enhancer Requests. </p>
+</li>
+</ul>
+<h3 id="parsing_multiple_contentparts">Parsing multiple ContentParts</h3>
+<p>Requests to the Stanbol Enahcer with the <code>Content-Type: 
multipart/from-data</code> are considered to contain a ContentItem serialized 
as MultiPart MIME. The exact specification of the <a 
href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for 
ContentItems</a> is provided by the documentation of the ContentItem.</p>
+<p>The combination of <code>multipart/from-data</code> encoded requests with 
QueryParameters as described above allow for the usage of <a 
href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for 
ContentItems</a> for both request and resonse.</p>
+<h3 
id="example_usages_of_the_multi-part_content_item_restful_api_extensions">Example
 usages of the multi-part content item RESTful API extensions</h3>
+<p>The following examples show some typical usages of the multi-part content 
item RESTful API. Note that for better readability the values of the query 
parameters are
+not URLEncoded.</p>
+<p>Return Metadata and transformed Content versions</p>
+<div class="codehilite"><pre>curl -v -X POST -H <span class="s2">&quot;Accept: 
multipart/from-data&quot;</span> <span class="se">\</span>
+    -H <span class="s2">&quot;Content-type: text/html; 
charset=UTF-8&quot;</span>  <span class="se">\</span>
+    --data <span 
class="s2">&quot;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;John 
Smith was born in 
London.&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&quot;</span>
 <span class="se">\</span>
+    <span 
class="s2">&quot;${it.serviceUrl}?outputContent=*/*&amp;omitParsed=true&amp;rdfFormat=application/rdf+xml&quot;</span>
+</pre></div>
+
+
+<p><strong>Example 1: Return metadata and content</strong></p>
+<p>This will result in an Response with the mime type <code>"Content-Type: 
multipart/from-data; charset=UTF-8; boundary=contentItem"</code> and the 
Metadata as well as the plain text version of the parsed HTML document as 
content.</p>
+<div class="codehilite"><pre>--contentItem
+Content-Disposition: form-data; name=&quot;metadata&quot;; 
filename=&quot;urn:content-item-sha1-76e44d4b51c626bbed38ce88370be88702de9341&quot;
+Content-Type: application/rdf+xml; charset=UTF-8;
+Content-Transfer-Encoding: 8bit
+
+&amp;lt;rdf:RDF
+    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;
+[..the metadata formatted as RDF+XML..]
+&amp;lt;/rdf:RDF&amp;gt;
+
+--contentItem
+Content-Disposition: form-data; name=&quot;content&quot;
+Content-Type: multipart/alternate; boundary=contentParts; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+--contentParts
+Content-Disposition: form-data; 
name=&quot;urn:metaxa:plain-text:2daba9dc-21f6-7ea1-70dd-a2b0d5c6cd08&quot;
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+John Smith was born in London.
+--contentParts--
+
+--contentItem--
+</pre></div>
+
+
+<p><strong>Example 2: Directly return the plain text version of parsed 
content</strong></p>
+<p>This shows how the Apache Stanbol Enhancer can be used to transcode parsed 
content.</p>
+<div class="codehilite"><pre>curl -v -X POST -H &quot;Accept: text/plain&quot; 
\
+    -H &quot;Content-type: text/html; charset=UTF-8&quot; \
+    --data &quot;<span class="ni">&amp;lt;</span>html<span 
class="ni">&amp;gt;&amp;lt;</span>body<span 
class="ni">&amp;gt;&amp;lt;</span>p<span class="ni">&amp;gt;</span>John Smith 
was born in London.<span class="ni">&amp;lt;</span>/p<span 
class="ni">&amp;gt;&amp;lt;</span>/body<span 
class="ni">&amp;gt;&amp;lt;</span>/html<span class="ni">&amp;gt;</span>&quot; \
+    &quot;<span class="cp">${</span><span class="n">it</span><span 
class="o">.</span><span class="n">serviceUrl</span><span 
class="cp">}</span>?omitMetadata=true&quot;
+</pre></div>
+
+
+<p>The response will use <code>Content-Type: text/plain</code> and contain the 
string</p>
+<div class="codehilite"><pre>John Smith was born in London.
+</pre></div>
+
+
+<p>To make this work the requested <a href="chains">Enhancement Chain</a> will 
need to include an engine (e.g. <a href="engines/metaxaengine.html">Metaxa</a>) 
that supports transcoding the parsed content. Not that because the metadata are 
omitted by responses to such requests it is also recommended to configure/use a 
chain that does no further processing on the transcoded content. </p>
+<p><strong>Example 3: Parse multiple content versions</strong></p>
+<p>This example will use the "httpmime" part of the Apache commons 
httpcomponents to create the Multipart MIME sent to the Stanbol enhancer.</p>
+<div class="codehilite"><pre><span class="nt">&lt;dependency&gt;</span>
+    <span class="nt">&lt;groupId&gt;</span>org.apache.httpcomponents<span 
class="nt">&lt;/groupId&gt;</span>
+    <span class="nt">&lt;artifactId&gt;</span>httpmime<span 
class="nt">&lt;/artifactId&gt;</span>
+    <span class="nt">&lt;version&gt;</span>4.1.2<span 
class="nt">&lt;/version&gt;</span>
+<span class="nt">&lt;/dependency&gt;</span>
+</pre></div>
+
+
+<p>The created Multipart MIME content MUST follow the specifications as 
defined by the <a 
href="contentitem.html#multipart_mime_serialization">MultiPart MIME format for 
ContentItems</a>.</p>
+<div class="codehilite"><pre><span class="n">InputStream</span> <span 
class="n">wordIn</span><span class="o">;</span> <span class="c1">//The MS Word 
version of the Content</span>
+<span class="n">InputStream</span> <span class="n">plainIn</span><span 
class="o">;</span> <span class="c1">//The plain text version of the 
Content</span>
+<span class="n">HttpClient</span> <span class="n">httpClient</span><span 
class="o">;</span> <span class="c1">//The client used to execute the 
request</span>
+
+<span class="c1">//create the multipart/from-data container for the 
ContentItem</span>
+<span class="c1">//MultipartEntity also implements HttpEntity</span>
+<span class="n">MultipartEntity</span> <span class="n">contentItem</span> 
<span class="o">=</span> <span class="k">new</span> <span 
class="n">MultipartEntity</span><span class="o">(</span><span 
class="kc">null</span><span class="o">,</span> <span class="kc">null</span> 
<span class="o">,</span><span class="n">UTF8</span><span class="o">);</span>
+<span class="c1">//The multipart/alternate container for the contents</span>
+<span class="n">HttpMultipart</span> <span class="n">content</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">HttpMultipart</span><span class="o">(</span><span 
class="s">&quot;alternate&quot;</span><span class="o">,</span> <span 
class="n">UTF8</span> <span class="o">,</span><span 
class="s">&quot;contentParts&quot;</span><span class="o">);</span>
+
+<span class="c1">//now add the container for the content to the content item 
container</span>
+<span class="n">contentItem</span><span class="o">.</span><span 
class="na">addPart</span><span class="o">(</span>
+    <span class="s">&quot;content&quot;</span><span class="o">,</span> <span 
class="c1">//the name MUST BE &quot;content&quot;!</span>
+    <span class="k">new</span> <span 
class="nf">MultipartContentBody</span><span class="o">(</span><span 
class="n">content</span><span class="o">));</span>
+
+<span class="c1">//now add the MS word content at the first location</span>
+<span class="c1">//this will make it the &quot;original&quot; content</span>
+<span class="n">content</span><span class="o">.</span><span 
class="na">addBodyPart</span><span class="o">(</span><span class="k">new</span> 
<span class="n">FormBodyPart</span><span class="o">(</span>
+    <span 
class="s">&quot;http://www.example.com/example.docx&quot;</span><span 
class="o">,</span> <span class="c1">//the id of the content part</span>
+    <span class="k">new</span> <span class="nf">InputStreamBody</span><span 
class="o">(</span>
+        <span class="n">wordIn</span><span class="o">,</span> 
+        <span 
class="s">&quot;application/vnd.openxmlformats-officedocument.wordprocessingml.document&quot;</span><span
 class="o">,</span> 
+        <span class="s">&quot;example.docx&quot;</span><span 
class="o">)));</span>
+
+ <span class="c1">//now add the alternate plain text version</span>
+ <span class="n">content</span><span class="o">.</span><span 
class="na">addBodyPart</span><span class="o">(</span><span class="k">new</span> 
<span class="n">FormBodyPart</span><span class="o">(</span>
+    <span 
class="s">&quot;http://www.example.com/example.docx&quot;</span><span 
class="o">,</span> <span class="c1">//the id of the content part</span>
+    <span class="k">new</span> <span class="nf">StringBody</span><span 
class="o">(</span> <span class="c1">//use a StringBody to avoid binary encoding 
for text</span>
+        <span class="n">IOUtils</span><span class="o">.</span><span 
class="na">toString</span><span class="o">(</span><span 
class="n">plainIn</span><span class="o">),</span> <span class="c1">//apache 
commons IO utility</span>
+        <span class="s">&quot;text/plain&quot;</span><span class="o">,</span>
+        <span class="n">Charset</span><span class="o">.</span><span 
class="na">forName</span><span class="o">(</span><span 
class="s">&quot;UTF-8&quot;</span><span class="o">))));</span>
+
+<span class="c1">//now we are ready to create and execute the POST request to 
the</span>
+<span class="c1">//Stanbol Enhancer</span>
+<span class="n">HttpPost</span> <span class="n">request</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">HttpPost</span><span class="o">(</span><span 
class="s">&quot;http://localhost:8080/enhancer&quot;</span><span 
class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span 
class="na">setEntity</span><span class="o">(</span><span 
class="n">contentItem</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span 
class="na">setHeader</span><span class="o">(</span><span 
class="s">&quot;Accept&quot;</span><span class="o">,</span><span 
class="err">&quot;</span><span class="n">application</span><span 
class="o">/</span><span class="n">rdf</span><span class="o">+</span><span 
class="n">xml</span><span class="o">);</span>
+<span class="n">Response</span> <span class="n">response</span> <span 
class="o">=</span> <span class="n">httpClient</span><span 
class="o">.</span><span class="na">execute</span><span class="o">(</span><span 
class="n">request</span><span class="o">);</span>
+</pre></div>
+
+
+<p>Note that for such requests <a href="engines/metaxaengine.html">Metaxa</a> 
will still try to extract metadata of the parsed MS Word document, but all 
other engines will use the plain text version as parsed by the request for 
processing.</p>
+<p><strong>Example 4: Parse existing free text annotations</strong></p>
+<p>This example shows how the multi-part content item API can be used to parse 
already existing tags for an parsed content to the Stanbol Enhancer. For this 
example it is important to understand that parsed metadata need to confirm to 
the Stanbol Enhancement Structure. Because of that this example consist of two 
main steps:</p>
+<ol>
+<li>Convert user tags to TextAnnotations</li>
+<li>Send existing Metadata along with the Content to the Stanbol Enhancer</li>
+</ol>
+<p>Also note that the code snippets will uses utilities provided by the 
"org.apache.stannbol.enhancer.servicesapi" module. As RDF framework Clerezza is 
used. Both dependencies are easily replaceable.</p>
+<p>First lets have a look at the required information</p>
+<div class="codehilite"><pre><span class="n">MGraph</span> <span 
class="n">graph</span><span class="o">;</span> <span class="c1">//the RDF graph 
to store the metadata</span>
+<span class="n">UriRef</span> <span class="n">ciUri</span><span 
class="o">;</span> <span class="c1">//the URI for the contentItem</span>
+<span class="n">String</span> <span class="n">tag</span><span 
class="o">;</span> <span class="c1">// user provided tag</span>
+<span class="n">UriRef</span> <span class="n">tagType</span><span 
class="o">;</span> <span class="c1">//the type of the Tag</span>
+</pre></div>
+
+
+<p>Reagrding the tag type: Stanbol natively supports the following types </p>
+<ul>
+<li><strong>Person</strong> (http://dbpedia.org/ontology/Person)</li>
+<li><strong>Organization</strong> (http://dbpedia.org/ontology/Organisation): 
NOTE the British spelling</li>
+<li><strong>Place</strong> (http://dbpedia.org/ontology/Place)</li>
+</ul>
+<p>The processing of parsed tags that use other or no type depends on the used 
<a href="engines">enhancement engines</a> and there configurations. Especially 
the configuration of used the <a 
href="engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a>s 
is important in that respect.</p>
+<div class="codehilite"><pre><span class="n">Resource</span> <span 
class="n">user</span><span class="o">;</span> <span class="c1">//the user that 
has created the tag (optional)</span>
+<span class="c1">//in case of an name just use a literal</span>
+<span class="n">user</span> <span class="o">=</span> <span 
class="k">new</span> <span class="n">PlainListeral</span><span 
class="o">(</span><span class="s">&quot;Rudolf Huber&quot;</span><span 
class="o">);</span>
+<span class="c1">//in case users have assigned URIs</span>
+<span class="n">user</span> <span class="o">=</span> <span 
class="k">new</span> <span class="n">UriRef</span><span class="o">(</span><span 
class="s">&quot;http://my.cms.org/users/rudof.huber&quot;</span><span 
class="o">);</span>
+</pre></div>
+
+
+<p>Now we can convert the information to TextAnnoations</p>
+<div class="codehilite"><pre><span class="c1">//first create a URI for the 
text annotation. Here we use a random URN</span>
+<span class="c1">//If you can create a meaningful URI this would be 
better!</span>
+<span class="n">UriRef</span> <span class="n">ta</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">UriRef</span><span class="o">(</span><span 
class="s">&quot;urn:user-annotation:&quot;</span><span class="o">+</span><span 
class="n">EnhancementEngineHelper</span><span class="o">.</span><span 
class="na">randomUUID</span><span class="o">());</span>
+<span class="c1">//The the &#39;rdf:type&#39;s</span>
+<span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span class="n">RDF</span><span 
class="o">.</span><span class="na">type</span><span class="o">,</span> <span 
class="n">TechnicalClasses</span><span class="o">.</span><span 
class="na">ENHANCER_TEXTANNOTATION</span><span class="o">));</span>
+<span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span class="n">RDF</span><span 
class="o">.</span><span class="na">type</span><span class="o">,</span> <span 
class="n">TechnicalClasses</span><span class="o">.</span><span 
class="na">ENHANCER_ENHANCEMENT</span><span class="o">));</span>
+
+<span class="c1">//this TextAnnotation is about the ContentItem</span>
+<span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span 
class="n">Properties</span><span class="o">.</span><span 
class="na">ENHANCER_EXTRACTED_FROM</span><span class="o">,</span> <span 
class="n">ciUri</span><span class="o">));</span>
+<span class="c1">//if the Tag uses a type add it</span>
+<span class="k">if</span><span class="o">(</span><span 
class="n">tagType</span> <span class="o">!=</span> <span 
class="kc">null</span><span class="o">){</span>
+    <span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span 
class="n">Properties</span><span class="o">.</span><span 
class="na">DC_TYPE</span><span class="o">,</span> <span 
class="n">tagType</span><span class="o">));</span>
+<span class="o">}</span>
+<span class="c1">//add the value of the tag</span>
+<span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span 
class="n">Properties</span><span class="o">.</span><span 
class="na">ENHANCER_SELECTED_TEXT</span><span class="o">,</span> <span 
class="k">new</span> <span class="n">PlainLiteralImpl</span><span 
class="o">(</span><span class="n">tag</span><span class="o">)));</span>
+<span class="c1">//add the user</span>
+<span class="k">if</span><span class="o">(</span><span class="n">user</span> 
<span class="o">!=</span> <span class="kc">null</span><span class="o">){</span>
+    <span class="n">graph</span><span class="o">.</span><span 
class="na">add</span><span class="o">(</span><span class="k">new</span> <span 
class="n">TripleImpl</span><span class="o">(</span><span 
class="n">ta</span><span class="o">,</span> <span 
class="n">Properties</span><span class="o">.</span><span 
class="na">DC_CREATOR</span><span class="o">,</span><span 
class="n">user</span><span class="o">));</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p>Now the 'graph' contains a valid TextAnnotation for the given user tag. 
This should be done for all tags of the current content.</p>
+<p>In the next step we need to serialize the RDF data. Again I will use here 
Clerezza as API, but any RDF framework will provide similar functionality</p>
+<p>:::java
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   //this tells the Serializer to create "application/rdf+xml"
+   serializer.serialize(out, metadata, SupportedFormat.RDF_XML);
+   String rdfContent = new String(out.toByteArray(),UTF8);</p>
+<p>Now we need to create the MultiPart MIME content item containing the 
metadata and the content</p>
+<p>:::java
+   String content; //the content we want to send to the Stanbol Enhancer</p>
+<div class="codehilite"><pre><span class="sr">//</span><span 
class="n">the</span> <span class="n">container</span> <span 
class="k">for</span> <span class="n">the</span> <span 
class="n">ContentITem</span>
+<span class="n">MultipartEntity</span> <span class="n">contentItem</span> 
<span class="o">=</span> <span class="k">new</span> <span 
class="n">MultipartEntity</span><span class="p">(</span><span 
class="n">null</span><span class="p">,</span> <span class="n">null</span> <span 
class="p">,</span><span class="n">UTF8</span><span class="p">);</span>
+
+<span class="sr">//</span><span class="n">The</span> <span 
class="n">Metadata</span> <span class="n">MUST</span> <span class="n">BE</span> 
<span class="n">the</span> <span class="n">first</span> <span 
class="n">element</span>
+<span class="n">contentItem</span><span class="o">.</span><span 
class="n">addPart</span><span class="p">(</span>
+    <span class="s">&quot;metadata&quot;</span><span class="p">,</span> <span 
class="sr">//</span><span class="n">the</span> <span class="n">name</span> 
<span class="n">MUST</span> <span class="n">BE</span> <span 
class="s">&quot;metadata&quot;</span> 
+    <span class="k">new</span> <span class="n">StringBody</span><span 
class="p">(</span><span class="n">rdfContent</span><span 
class="p">,</span><span class="n">SupportedFormat</span><span 
class="o">.</span><span class="n">RDF_XML</span><span class="p">,</span><span 
class="n">UTF8</span><span class="p">){</span>
+        <span class="nv">@Override</span>
+        <span class="n">public</span> <span class="n">String</span> <span 
class="n">getFilename</span><span class="p">()</span> <span class="p">{</span> 
<span class="sr">//</span><span class="n">The</span> <span 
class="n">filename</span> <span class="n">MUST</span> <span class="n">BE</span> 
<span class="n">the</span>
+            <span class="k">return</span> <span class="n">ciUri</span><span 
class="o">.</span><span class="n">getUnicodeString</span><span 
class="p">();</span> <span class="sr">//</span><span class="n">uri</span> <span 
class="n">of</span> <span class="n">the</span> <span 
class="n">ContentItem</span>
+        <span class="p">}</span>
+    <span class="p">});</span>
+</pre></div>
+
+
+<p>Note that because the StringBody class provided my the "httpmime" framework 
does not set a Filename we need to override this method and return the URI of 
the content item. This is essential, because we need ensure that the URI of the 
ContentItem is the same as the URI (variable 'ciUri') as used when creating the 
TextAnnotations for the user tags.</p>
+<p>For the following code snippet note that we can directly add the content to 
the content item container. Only if we would need to sent multiple alternate 
content versions (as shown in 'Example 3') the usage of an 
'multipart/alternate' container is required.</p>
+<div class="codehilite"><pre><span class="c1">//Add the content as second mime 
part</span>
+<span class="n">contentItem</span><span class="o">.</span><span 
class="na">addPart</span><span class="o">(</span>
+    <span class="s">&quot;content&quot;</span><span class="o">,</span> <span 
class="c1">//the name MUST BE &quot;content&quot;</span>
+    <span class="k">new</span> <span class="nf">StringBody</span><span 
class="o">(</span><span class="n">content</span><span class="o">,</span><span 
class="s">&quot;text/plain&quot;</span><span class="o">,</span><span 
class="n">UTF8</span><span class="o">));</span>
+
+<span class="c1">//now we are ready to create and execute the POST request to 
the</span>
+<span class="c1">//Stanbol Enhancer</span>
+<span class="n">HttpPost</span> <span class="n">request</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">HttpPost</span><span class="o">(</span><span 
class="s">&quot;http://localhost:8080/enhancer&quot;</span><span 
class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span 
class="na">setEntity</span><span class="o">(</span><span 
class="n">contentItem</span><span class="o">);</span>
+<span class="n">request</span><span class="o">.</span><span 
class="na">setHeader</span><span class="o">(</span><span 
class="s">&quot;Accept&quot;</span><span class="o">,</span><span 
class="err">&quot;</span><span class="n">application</span><span 
class="o">/</span><span class="n">rdf</span><span class="o">+</span><span 
class="n">xml</span><span class="o">);</span>
+<span class="n">Response</span> <span class="n">response</span> <span 
class="o">=</span> <span class="n">httpClient</span><span 
class="o">.</span><span class="na">execute</span><span class="o">(</span><span 
class="n">request</span><span class="o">);</span>
+</pre></div>
+
+
+<p>The response of the Enhancer will now contain Entity suggestions for the 
free text user tags.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>


Reply via email to