Hi Florent

On Mon, Jul 18, 2011 at 11:09 PM, florent andré
<[email protected]> wrote:
> Hi !
>
> I worked on the UIMA engine to make it more generic.
> It's now easy to add uima annotator, and I try the RegexAnnotator [1].
>
> Depending on the configured regex, this annotator can output email, isbn,...
>
> AFAIK there is for now just TextAnnotation and EntityAnnotation type, and I
> don't know if they are suitable for things like email, telephone number,...
>
I see several possibilities:

1) use a TextAnnotation with a custom value for dc:type

urn:123 rdf:type TextAnnotation
urn:123 dc:type <http://www.w3.org/2006/vcard/ns#Cell>
urn:123 selected-text "+43 655 290989"
urn:123 start "123"^^xsd:int
urn:123 end "137"^^xsd:int

I used here the concepts defined for CellPhones by the vCard ontology

2) use a TextAnnotation with an additional type

urn:123 rdf:type TextAnnotation
urn:123 rdf:type http://schema.org/ContactPoint
urn:123 selected-text "+43 655 290989"
urn:123 start "123"^^xsd:int
urn:123 end "137"^^xsd:int
urn:123 http://schema.org/telephone "+43655290989"

Here I used the ContactPoint as defined by schema.org

As I am writing this I have a preference for variant (2). Any other
opinions, suggestions?

In any case based on such Annotations an other Engine could lookup
persons/organizations based on recognized telephone numbers and create
according EntityAnnotations.

The same principle would also work for ISBN numbers.

best
Rupert Westenthaler

>
> [1] http://uima.apache.org/sandbox.html#regex.annotator
>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to