Hi Florent On Mon, Jul 18, 2011 at 11:09 PM, florent andré <[email protected]> wrote: > Hi ! > > I worked on the UIMA engine to make it more generic. > It's now easy to add uima annotator, and I try the RegexAnnotator [1]. > > Depending on the configured regex, this annotator can output email, isbn,... > > AFAIK there is for now just TextAnnotation and EntityAnnotation type, and I > don't know if they are suitable for things like email, telephone number,... > I see several possibilities:
1) use a TextAnnotation with a custom value for dc:type urn:123 rdf:type TextAnnotation urn:123 dc:type <http://www.w3.org/2006/vcard/ns#Cell> urn:123 selected-text "+43 655 290989" urn:123 start "123"^^xsd:int urn:123 end "137"^^xsd:int I used here the concepts defined for CellPhones by the vCard ontology 2) use a TextAnnotation with an additional type urn:123 rdf:type TextAnnotation urn:123 rdf:type http://schema.org/ContactPoint urn:123 selected-text "+43 655 290989" urn:123 start "123"^^xsd:int urn:123 end "137"^^xsd:int urn:123 http://schema.org/telephone "+43655290989" Here I used the ContactPoint as defined by schema.org As I am writing this I have a preference for variant (2). Any other opinions, suggestions? In any case based on such Annotations an other Engine could lookup persons/organizations based on recognized telephone numbers and create according EntityAnnotations. The same principle would also work for ISBN numbers. best Rupert Westenthaler > > [1] http://uima.apache.org/sandbox.html#regex.annotator > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
