Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

Denny Vrandečić Fri, 22 Jun 2012 08:20:46 -0700

Here's the use case:

Every statement in Wikidata will have a URI. Every statement can have
one more references.
In many cases, the reference might be text on a website.


Whereas it is always possible (and probably what we will do first) as
well as correct to state:

Statement1 accordingTo SlashDot .

it would be preferable to be a bit more specific on that, and most
preferably it would be to go all the way down to the sentence saying

Statement1 accordingTo X .

with X being a URI denoting the sentence that I mean in a specific
Slashdot-Article.

I would prefer a standard or widely adopted way to how to do that, and
NIF-URIs seem to be a viable solution for that. We will come back to
this once we start modeling references in more detail.

The reference could be pointing to a book, to a video, to a
mesopotamic stone table, etc. (OK, I admit that the different media
types will be differently prioritized).

I hope this helps,
Cheers,
Denny

2012/6/21 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de>:
> Hello Denny,
> I was traveling for the past few weeks and can finally answer your email.
> See my comments inline.
>
> On 05/29/2012 05:25 PM, Denny VrandeÄ iÄ‡ wrote:
>
> Hello Sebastian,
>
>
> Just a few questions - as you note, it is easier if we all use the same
> standards, and so I want to ask about the relation to other related
> standards:
> * I understand that you dismiss IETF RFC 5147 because it is not stable
> enough, right?
>
> The offset scheme of NIF is built on this RFC.
> So the following would hold:
> @prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
> @prefix owl: <http://www.w3.org/2002/07/owl#> .
> ld:offset_717_729  owl:sameAs ld:char=717,12 .
>
>
> We might change the syntax and reuse the RFC syntax, but it has several
> issues:
> 1.  The optional part is not easy to handle, because you would need to add
> owl:sameAs statements:
>
> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
>
> So theoretically ok, but annoying to implement and check.
>
> 2. When implementing web services, NIF allows the client to choose the
> prefix:
> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president.
> returning URIs like <http://this.is/a/slash/prefix/offset_10_15>
> So RFC 5147 would look like:
> <http://this.is/a/slash/prefix/char=717,12>
> <http://this.is/a/slash/prefix/char=717,12;UTF-8>
> or
> <http://this.is/a/slash/prefix?char=717,12>
> <http://this.is/a/slash/prefix?char=717,12;UTF-8>
>
> 3. Character like = , prevent the use of prefixes:
> echo "@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#>
> .
> @prefix owl: <http://www.w3.org/2002/07/owl#> .
> ld:offset_717_729  owl:sameAs ld:char=717,12 .
> " > test.ttl ; rapper -i turtle  test.ttl
>
> 4. implementation is a little bit more difficult, given that :
> $arr = split("_", "offset_717_729") ;
> switch ($arr[0]){
>     case 'offset' :
>         $begin = $arr[1];
>         $end = $arr[2];
>         break;
>     case 'hash' :
>         $clength = $arr[1];
>         $slength = $arr[2];
>         $hash = $arr[3];
>         $rest = /*merge remaining with '_' */
>         break;
> }
>
> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a broader
> assumption.
>
> * what is the relation to the W3C media fragment URIs? Did not find a
> pointer there.
>
> They are designed for media such as images, video, not strings.
> Potentially, the same principle can be applied, but it is not yet
> engineered/researched.
>
> * any plans of standardizing your approach?
>
> We will do NIF 2.0  as a community standard and finish it in a couple of
> months. It will be published under open licences, so anybody W3C or ISO
> might pick it up, easily. Other than that there are plans by several EU
> projects (see e.g. here
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html)
> and a US project to use it and there are several third party
> implementations, already.  We would rather have it adopted first on a large
> scale and then standardized, properly, i.e. W3C. This worked quite well for
> the FOAF project or for RDB2RDF Mappers.
> Chances for fast standardization are not so unlikely, I would assume.
>
> We would strongly prefer to just use a standard instead of advocating
> contenders for one -- if one exists.
>
> You might want to look at:
> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
> and the same highlighting here:
> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
>
> NIF equivalent (4 triples instad of 14 and only one generated uuid):
> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a str:String ;
>     oa:hasBody [
>         oa:annotator <mailto:Bob> ;
>         cnt:chars "Hey Tim, good idea that Semantic Web!" .
>     ]
>
> So you might not think in a "contender" way. Approaches are complementary.
> NIF is simpler and the URIs have some features that might be wanted
> (stability, uniqueness, easy to implement).
> This is why I was asking for your *use case* .
>
> Note that: there are still some problems, when annotating DOM with URIs,
> e.g. xPointer is abandoned and was never finished. Xpath has its limits and
> is also expensive (i.e. SAX not possible).
> I think there is no proper solution as of now.
> All the best,
> Sebastian
>
>
> Cheers,
> Denny
>
>
>
>
> 2012/5/18 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de>
>
> Hello again,
> maybe the question, I asked was lost, as the text was TL;DR
>
> I heard that, it is planned to track provenance of facts. e.g. Berlin has
> 3,337,000 citizens found here:
> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm>
> Do you have a place where the use case and the requirements are documented
> for this? Or is it out of scope?
> Will it be course grained, i.e. website level ? Or fine grained, i.e. text
> paragraph level? See e.g. how Berlin is highlighted here:
> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
> vorprojekt/index.php?**annotation_request=http%3A%2F%**
> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C>
> in this very early prototype.
>
> Could you give me a link were I can read more about any Wikidata plans
> towards this direction?
> Sebastian
>
>
>
> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
>
> Dear all,
> (Note: I could not find the document, where your requirements regarding
> the tracking of facts on the web are written, so I am giving a general
> introduction to NIF. Please send me a link to the document that specifies
> your need for tracing facts on the web, thanks)
>
> I would like to point your attention to the URIs used in the NLP
> Interchange Format (NIF).
> NIF-URIs are quite easy to use, understand and implement. NIF has a
> one-triple-per-annotation paradigm.  The latest documentation can be found
> here:
> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>
> The basic idea is to use URIs with hash fragment ids to annotate or mark
> pages on the web:
> An example is the first occurrence of "Semantic Web" on
> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>
> as highlighted here:
> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
> vorprojekt/index.php?**annotation_request=http%3A%2F%**
> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>
> Here is a NIF example for linking a part of the document to the DBpedia
> entry of the Semantic Web:
> <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>
>      a str:StringInContext ;
>      sso:oen
> <http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>>
> .
>
>
> We are currently preparing a new draft for the spec 2.0. The old one can
> be found here:
> http://nlp2rdf.org/nif-1-0/
>
> There are several EU projects that intend to use NIF. Furthermore, it is
> easier for everybody, if we standardize a Web annotation format together.
> Please give feedback of your use cases.
> All the best,
> Sebastian
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage:
> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org
>
>
> ______________________________**_________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>
>
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org



-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

Reply via email to