Get enhancements as RDFa
------------------------
Key: STANBOL-51
URL: https://issues.apache.org/jira/browse/STANBOL-51
Project: Stanbol
Issue Type: Improvement
Components: FISE
Reporter: Fabian Christ
Reported by henri.bergius, Jul 27, 2010
If original content has been submitted to FISE as HTML5 (see #41), then FISE
could provide the enhancements back as RDFa inside the original content.
Delete comment
Comment 1 by project member rupert.westenthaler, Jul 27, 2010
This would be only possible if the content type of the content is HTML.
We would need to define how enhancements are represented as RDFa.
e.g.
<p> The meeting takes palce in <span about="/TextAnnotation1">Helsinki
<span property="dc:creator"
content="eu.iksproject.fise.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine"></span>
<span property="dc:type" content="http://dbpedia.org/ontology/Place"></span>
<span ... add all the other properties
</span>.</p>
An other possibility would be to define URIs for enhancements (e.g.
http://localhost:8080/store/<contentID>/<enhancementID> and only add such URLs
as RDFa annotations to the content.
e.g.
<p> The meeting takes palce in <span
about="http://localhost:8080/store/myContentItem/TextAnnotation1">Helsinki</span>.</p>
This would have the advantage, that the HTML does not get overloaded with RDFa
annotations, because only the link to the enhancement is added. However this is
not quit useful if FISE runs on local host.
An third possibility would be to directly add results of the enhancement
process as RDFa annotation
e.g. assuming that FISE has found the Entity
"http://www.dbpedia.org/page/Helsinki" that would result in the following RDFa
annotation
<p> The meeting takes palce in <span
about="http://www.dbpedia.org/page/Helsinki">Helsinki
<span property="dc:title" content="Helsinki"></span>
<span property="rdf:type" content="http://dbpedia.org/ontology/Place"></span>
</span>.</p>
This produces RDFa annotations that could be easily used by the client, however
all the information about how such enhancements where created are lost. Maybe
we could add an additional span with an property pointing to the ID of the
enhancement.
Delete comment
Comment 2 by project member rupert.westenthaler, Jul 27, 2010
here is an Example for the third option that includes links to the enhancements
<p> The meeting takes palce in <span
about="http://www.dbpedia.org/page/Helsinki">Helsinki
<span property="dc:title" content="Helsinki"></span>
<span property="rdf:type" content="http://dbpedia.org/ontology/Place"></span>
<span property="dc:creator" content="FISE">//tells that this RDFa annotation
was created by FISE (one may use an URI instead)
<span property="dc:source" content="TextEnhancement1"> //tells that this
RDFa annotation is based on this enhancement (one may use an URI instead)
</span>.</p>
This example assumes that the entity with the highest confidence was added for
Helsinki, but there might also be other entity suggestions. Therefore it would
make sense, to implement a service that can be used to get additional
information based on the content of the "dc:source" property.
best
Rupert
Delete comment
Comment 3 by project member rupert.westenthaler, Jul 29, 2010
Changed to type-Enhancement and status to accepted
Status: Accepted
Labels: -Type-Defect Type-Enhancement
Delete comment
Comment 4 by project member christ.fabian, Sep 14, 2010
This issue addresses [FR-220303].
"IKS services shall annotate the content items with meta-data."
Delete comment
Comment 5 by [email protected], Oct 21, 2010
Will it be possible to mark HTML5 sections which should not be enhanced ?
What about submitting HTML5 fragments?
Delete comment
Comment 6 by project member rupert.westenthaler, Oct 22, 2010
Adding support for additional media types and extracting metadata of submitted
content is definitely on the road map.
see also http://wiki.iks-project.eu/index.php/Workshops/EntityLinkingWorkshop
Regarding sections that should not be enhanced: As far as I can be remember,
such a feature was not yet discussed.
Can you please provide some additional information such as:
What would be the actual Usecase?
Do you think of annotations within the HTML Document or additional information
that describe what section should be processed by FISE?
Delete comment
Comment 7 by [email protected], Nov 03, 2010
> Regarding sections that should not be enhanced ...
> What would be the actual Usecase?
This would enable CMS systems to (automatically) add enhancements at the very
end of the HTML-generation process after templates etc. have already been
applied. There likely are sections in complete HTML pages where enhancements do
not make sense or are unwanted such as navigation sections or (perhaps)
advertisements.
Ideally FISE would even be able to automatically determine those areas without
requiring that they are marked before (maybe by applying semantic technology
;-).
> Do you think of annotations within the HTML Document
> or additional information that describe what section
> should be processed by FISE?
Thinking about that... The markup to denote the irrelevant (or the relevant?)
sections would be semantic. So RDFa would be ok if that is easy to add in
templates.
Delete comment
Comment 8 by project member rupert.westenthaler, Nov 03, 2010
During the last Project Meeting in Istanbul the FISE team discussed necessary
changes to provide better support for different content types as well as
existing metadata (such as RDFa embedded in HTML or exif metadata in fotos).
see http://wiki.iks-project.eu/index.php/Workshops/EntityLinkingWorkshop for
more details.
With that additions in place it should be easy to support using RDFa to denote
the irrelevant and/or the relevant sections of an HTML document.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.