Here's the use case: Every statement in Wikidata will have a URI. Every statement can have one more references. In many cases, the reference might be text on a website.
Whereas it is always possible (and probably what we will do first) as well as correct to state: Statement1 accordingTo SlashDot . it would be preferable to be a bit more specific on that, and most preferably it would be to go all the way down to the sentence saying Statement1 accordingTo X . with X being a URI denoting the sentence that I mean in a specific Slashdot-Article. I would prefer a standard or widely adopted way to how to do that, and NIF-URIs seem to be a viable solution for that. We will come back to this once we start modeling references in more detail. The reference could be pointing to a book, to a video, to a mesopotamic stone table, etc. (OK, I admit that the different media types will be differently prioritized). I hope this helps, Cheers, Denny 2012/6/21 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de>: > Hello Denny, > I was traveling for the past few weeks and can finally answer your email. > See my comments inline. > > On 05/29/2012 05:25 PM, Denny VrandeÄ ić wrote: > > Hello Sebastian, > > > Just a few questions - as you note, it is easier if we all use the same > standards, and so I want to ask about the relation to other related > standards: > * I understand that you dismiss IETF RFC 5147 because it is not stable > enough, right? > > The offset scheme of NIF is built on this RFC. > So the following would hold: > @prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> . > @prefix owl: <http://www.w3.org/2002/07/owl#> . > ld:offset_717_729 owl:sameAs ld:char=717,12 . > > > We might change the syntax and reuse the RFC syntax, but it has several > issues: > 1. The optional part is not easy to handle, because you would need to add > owl:sameAs statements: > > ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 . > ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 . > ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 . > > So theoretically ok, but annoying to implement and check. > > 2. When implementing web services, NIF allows the client to choose the > prefix: > http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president. > returning URIs like <http://this.is/a/slash/prefix/offset_10_15> > So RFC 5147 would look like: > <http://this.is/a/slash/prefix/char=717,12> > <http://this.is/a/slash/prefix/char=717,12;UTF-8> > or > <http://this.is/a/slash/prefix?char=717,12> > <http://this.is/a/slash/prefix?char=717,12;UTF-8> > > 3. Character like = , prevent the use of prefixes: > echo "@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> > . > @prefix owl: <http://www.w3.org/2002/07/owl#> . > ld:offset_717_729 owl:sameAs ld:char=717,12 . > " > test.ttl ; rapper -i turtle test.ttl > > 4. implementation is a little bit more difficult, given that : > $arr = split("_", "offset_717_729") ; > switch ($arr[0]){ > case 'offset' : > $begin = $arr[1]; > $end = $arr[2]; > break; > case 'hash' : > $clength = $arr[1]; > $slength = $arr[2]; > $hash = $arr[3]; > $rest = /*merge remaining with '_' */ > break; > } > > 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a broader > assumption. > > * what is the relation to the W3C media fragment URIs? Did not find a > pointer there. > > They are designed for media such as images, video, not strings. > Potentially, the same principle can be applied, but it is not yet > engineered/researched. > > * any plans of standardizing your approach? > > We will do NIF 2.0 as a community standard and finish it in a couple of > months. It will be published under open licences, so anybody W3C or ISO > might pick it up, easily. Other than that there are plans by several EU > projects (see e.g. here > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html) > and a US project to use it and there are several third party > implementations, already. We would rather have it adopted first on a large > scale and then standardized, properly, i.e. W3C. This worked quite well for > the FOAF project or for RDB2RDF Mappers. > Chances for fast standardization are not so unlikely, I would assume. > > We would strongly prefer to just use a standard instead of advocating > contenders for one -- if one exists. > > You might want to look at: > http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage > and the same highlighting here: > http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web > > NIF equivalent (4 triples instad of 14 and only one generated uuid): > ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a str:String ; > oa:hasBody [ > oa:annotator <mailto:Bob> ; > cnt:chars "Hey Tim, good idea that Semantic Web!" . > ] > > So you might not think in a "contender" way. Approaches are complementary. > NIF is simpler and the URIs have some features that might be wanted > (stability, uniqueness, easy to implement). > This is why I was asking for your *use case* . > > Note that: there are still some problems, when annotating DOM with URIs, > e.g. xPointer is abandoned and was never finished. Xpath has its limits and > is also expensive (i.e. SAX not possible). > I think there is no proper solution as of now. > All the best, > Sebastian > > > Cheers, > Denny > > > > > 2012/5/18 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de> > > Hello again, > maybe the question, I asked was lost, as the text was TL;DR > > I heard that, it is planned to track provenance of facts. e.g. Berlin has > 3,337,000 citizens found here: > http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm> > Do you have a place where the use case and the requirements are documented > for this? Or is it out of scope? > Will it be course grained, i.e. website level ? Or fine grained, i.e. text > paragraph level? See e.g. how Berlin is highlighted here: > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** > vorprojekt/index.php?**annotation_request=http%3A%2F%** > 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_** > 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C> > in this very early prototype. > > Could you give me a link were I can read more about any Wikidata plans > towards this direction? > Sebastian > > > > On 05/16/2012 09:10 AM, Sebastian Hellmann wrote: > > Dear all, > (Note: I could not find the document, where your requirements regarding > the tracking of facts on the web are written, so I am giving a general > introduction to NIF. Please send me a link to the document that specifies > your need for tracing facts on the web, thanks) > > I would like to point your attention to the URIs used in the NLP > Interchange Format (NIF). > NIF-URIs are quite easy to use, understand and implement. NIF has a > one-triple-per-annotation paradigm. The latest documentation can be found > here: > http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf> > > The basic idea is to use URIs with hash fragment ids to annotate or mark > pages on the web: > An example is the first occurrence of "Semantic Web" on > http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html> > as highlighted here: > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** > vorprojekt/index.php?**annotation_request=http%3A%2F%** > 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_** > 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web> > > Here is a NIF example for linking a part of the document to the DBpedia > entry of the Semantic Web: > <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> > > a str:StringInContext ; > sso:oen > <http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>> > . > > > We are currently preparing a new draft for the spec 2.0. The old one can > be found here: > http://nlp2rdf.org/nif-1-0/ > > There are several EU projects that intend to use NIF. Furthermore, it is > easier for everybody, if we standardize a Web annotation format together. > Please give feedback of your use cases. > All the best, > Sebastian > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://dbpedia.org > Homepage: > http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> > Research Group: http://aksw.org > > > ______________________________**_________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> > > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l