Get enhancements as RDFa
------------------------

                 Key: STANBOL-51
                 URL: https://issues.apache.org/jira/browse/STANBOL-51
             Project: Stanbol
          Issue Type: Improvement
          Components: FISE
            Reporter: Fabian Christ


Reported by henri.bergius, Jul 27, 2010

If original content has been submitted to FISE as HTML5 (see #41), then FISE 
could provide the enhancements back as RDFa inside the original content.

Delete comment
Comment 1 by project member rupert.westenthaler, Jul 27, 2010

This would be only possible if the content type of the content is HTML.
We would need to define how enhancements are represented as RDFa.

e.g.
<p> The meeting takes palce in <span about="/TextAnnotation1">Helsinki
   <span property="dc:creator" 
content="eu.iksproject.fise.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine"></span>
   <span property="dc:type" content="http://dbpedia.org/ontology/Place";></span>
   <span ... add all the other properties
</span>.</p>


An other possibility would be to define URIs for enhancements (e.g. 
http://localhost:8080/store/<contentID>/<enhancementID> and only add such URLs 
as RDFa annotations to the content.

e.g. 
<p> The meeting takes palce in <span 
about="http://localhost:8080/store/myContentItem/TextAnnotation1";>Helsinki</span>.</p>

This would have the advantage, that the HTML does not get overloaded with RDFa 
annotations, because only the link to the enhancement is added. However this is 
not quit useful if FISE runs on local host.

An third possibility would be to directly add results of the enhancement 
process  as RDFa annotation

e.g. assuming that FISE has found the Entity 
"http://www.dbpedia.org/page/Helsinki"; that would result in the following RDFa 
annotation
<p> The meeting takes palce in <span 
about="http://www.dbpedia.org/page/Helsinki";>Helsinki
   <span property="dc:title" content="Helsinki"></span>
   <span property="rdf:type" content="http://dbpedia.org/ontology/Place";></span>
</span>.</p>

This produces RDFa annotations that could be easily used by the client, however 
all the information about how such enhancements where created are lost. Maybe 
we could add an additional span with an property pointing to the ID of the 
enhancement.



Delete comment
Comment 2 by project member rupert.westenthaler, Jul 27, 2010

here is an Example for the third option that includes links to the enhancements

<p> The meeting takes palce in <span 
about="http://www.dbpedia.org/page/Helsinki";>Helsinki
   <span property="dc:title" content="Helsinki"></span>
   <span property="rdf:type" content="http://dbpedia.org/ontology/Place";></span>
   <span property="dc:creator" content="FISE">//tells that this RDFa annotation 
was created by FISE (one may use an URI instead)
   <span property="dc:source" content="TextEnhancement1"> //tells that this 
RDFa annotation is based on this enhancement (one may use an URI instead)
</span>.</p>

This example assumes that the entity with the highest confidence was added for 
Helsinki, but there might also be other entity suggestions. Therefore it would 
make sense, to implement a service that can be used to get additional 
information based on the content of the "dc:source" property.

best
Rupert

Delete comment
Comment 3 by project member rupert.westenthaler, Jul 29, 2010

Changed to type-Enhancement and status to accepted

Status: Accepted
Labels: -Type-Defect Type-Enhancement
Delete comment
Comment 4 by project member christ.fabian, Sep 14, 2010

This issue addresses [FR-220303].
"IKS services shall annotate the content items with meta-data."

Delete comment
Comment 5 by [email protected], Oct 21, 2010

Will it be possible to mark HTML5 sections which should not be enhanced ? 

What about submitting HTML5 fragments?


Delete comment
Comment 6 by project member rupert.westenthaler, Oct 22, 2010

Adding support for additional media types and extracting metadata of submitted 
content is definitely on the road map.
see also http://wiki.iks-project.eu/index.php/Workshops/EntityLinkingWorkshop

Regarding sections that should not be enhanced: As far as I can be remember, 
such a feature was not yet discussed.
Can you please provide some additional information such as:
What would be the actual Usecase?
Do you think of annotations within the HTML Document or additional information 
that describe what section should be processed by FISE?

Delete comment
Comment 7 by [email protected], Nov 03, 2010

> Regarding sections that should not be enhanced ...
> What would be the actual Usecase?

This would enable CMS systems to (automatically) add enhancements at the very 
end of the HTML-generation process after templates etc. have already been 
applied. There likely are sections in complete HTML pages where enhancements do 
not make sense or are unwanted such as navigation sections or (perhaps) 
advertisements.

Ideally FISE would even be able to automatically determine those areas without 
requiring that they are marked before (maybe by applying semantic technology 
;-).

> Do you think of annotations within the HTML Document
> or additional information that describe what section
> should be processed by FISE?

Thinking about that... The markup to denote the irrelevant (or the relevant?) 
sections would be semantic. So RDFa would be ok if that is easy to add in 
templates.

Delete comment
Comment 8 by project member rupert.westenthaler, Nov 03, 2010

During the last Project Meeting in Istanbul the FISE team discussed necessary 
changes to provide better support for different content types as well as 
existing metadata (such as RDFa embedded in HTML or exif metadata in fotos).
see http://wiki.iks-project.eu/index.php/Workshops/EntityLinkingWorkshop for 
more details.
With that additions in place it should be easy to support using RDFa to denote 
the irrelevant and/or the relevant sections of an HTML document.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to