Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

Martynas Jusevičius Fri, 22 Jun 2012 09:21:12 -0700

Denny, the statement-level of granularity you're describing is achieved by
RDF reification. You describe it however as a "deprecated mechanism" of
provenance, without backing it up.


Why do you think there must be a better mechanism? Maybe you should take
another look at reification, or lower your provenance requirements, at
least initially?

Martynas
graphity.org
On Jun 22, 2012 5:20 PM, "Denny Vrandečić" <denny.vrande...@wikimedia.de>
wrote:

> Here's the use case:
>
> Every statement in Wikidata will have a URI. Every statement can have
> one more references.
> In many cases, the reference might be text on a website.
>
> Whereas it is always possible (and probably what we will do first) as
> well as correct to state:
>
> Statement1 accordingTo SlashDot .
>
> it would be preferable to be a bit more specific on that, and most
> preferably it would be to go all the way down to the sentence saying
>
> Statement1 accordingTo X .
>
> with X being a URI denoting the sentence that I mean in a specific
> Slashdot-Article.
>
> I would prefer a standard or widely adopted way to how to do that, and
> NIF-URIs seem to be a viable solution for that. We will come back to
> this once we start modeling references in more detail.
>
> The reference could be pointing to a book, to a video, to a
> mesopotamic stone table, etc. (OK, I admit that the different media
> types will be differently prioritized).
>
> I hope this helps,
> Cheers,
> Denny
>
> 2012/6/21 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de>:
> > Hello Denny,
> > I was traveling for the past few weeks and can finally answer your email.
> > See my comments inline.
> >
> > On 05/29/2012 05:25 PM, Denny VrandeÄ iÄ‡ wrote:
> >
> > Hello Sebastian,
> >
> >
> > Just a few questions - as you note, it is easier if we all use the same
> > standards, and so I want to ask about the relation to other related
> > standards:
> > * I understand that you dismiss IETF RFC 5147 because it is not stable
> > enough, right?
> >
> > The offset scheme of NIF is built on this RFC.
> > So the following would hold:
> > @prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
> > @prefix owl: <http://www.w3.org/2002/07/owl#> .
> > ld:offset_717_729  owl:sameAs ld:char=717,12 .
> >
> >
> > We might change the syntax and reuse the RFC syntax, but it has several
> > issues:
> > 1.  The optional part is not easy to handle, because you would need to
> add
> > owl:sameAs statements:
> >
> > ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
> > ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
> > ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
> >
> > So theoretically ok, but annoying to implement and check.
> >
> > 2. When implementing web services, NIF allows the client to choose the
> > prefix:
> >
> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president
> .
> > returning URIs like <http://this.is/a/slash/prefix/offset_10_15>
> > So RFC 5147 would look like:
> > <http://this.is/a/slash/prefix/char=717,12>
> > <http://this.is/a/slash/prefix/char=717,12;UTF-8>
> > or
> > <http://this.is/a/slash/prefix?char=717,12>
> > <http://this.is/a/slash/prefix?char=717,12;UTF-8>
> >
> > 3. Character like = , prevent the use of prefixes:
> > echo "@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#>
> > .
> > @prefix owl: <http://www.w3.org/2002/07/owl#> .
> > ld:offset_717_729  owl:sameAs ld:char=717,12 .
> > " > test.ttl ; rapper -i turtle  test.ttl
> >
> > 4. implementation is a little bit more difficult, given that :
> > $arr = split("_", "offset_717_729") ;
> > switch ($arr[0]){
> >     case 'offset' :
> >         $begin = $arr[1];
> >         $end = $arr[2];
> >         break;
> >     case 'hash' :
> >         $clength = $arr[1];
> >         $slength = $arr[2];
> >         $hash = $arr[3];
> >         $rest = /*merge remaining with '_' */
> >         break;
> > }
> >
> > 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a
> broader
> > assumption.
> >
> > * what is the relation to the W3C media fragment URIs? Did not find a
> > pointer there.
> >
> > They are designed for media such as images, video, not strings.
> > Potentially, the same principle can be applied, but it is not yet
> > engineered/researched.
> >
> > * any plans of standardizing your approach?
> >
> > We will do NIF 2.0  as a community standard and finish it in a couple of
> > months. It will be published under open licences, so anybody W3C or ISO
> > might pick it up, easily. Other than that there are plans by several EU
> > projects (see e.g. here
> >
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
> )
> > and a US project to use it and there are several third party
> > implementations, already.  We would rather have it adopted first on a
> large
> > scale and then standardized, properly, i.e. W3C. This worked quite well
> for
> > the FOAF project or for RDB2RDF Mappers.
> > Chances for fast standardization are not so unlikely, I would assume.
> >
> > We would strongly prefer to just use a standard instead of advocating
> > contenders for one -- if one exists.
> >
> > You might want to look at:
> > http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
> > and the same highlighting here:
> >
> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
> >
> > NIF equivalent (4 triples instad of 14 and only one generated uuid):
> > ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a
> str:String ;
> >     oa:hasBody [
> >         oa:annotator <mailto:Bob> ;
> >         cnt:chars "Hey Tim, good idea that Semantic Web!" .
> >     ]
> >
> > So you might not think in a "contender" way. Approaches are
> complementary.
> > NIF is simpler and the URIs have some features that might be wanted
> > (stability, uniqueness, easy to implement).
> > This is why I was asking for your *use case* .
> >
> > Note that: there are still some problems, when annotating DOM with URIs,
> > e.g. xPointer is abandoned and was never finished. Xpath has its limits
> and
> > is also expensive (i.e. SAX not possible).
> > I think there is no proper solution as of now.
> > All the best,
> > Sebastian
> >
> >
> > Cheers,
> > Denny
> >
> >
> >
> >
> > 2012/5/18 Sebastian Hellmann <hellm...@informatik.uni-leipzig.de>
> >
> > Hello again,
> > maybe the question, I asked was lost, as the text was TL;DR
> >
> > I heard that, it is planned to track provenance of facts. e.g. Berlin has
> > 3,337,000 citizens found here:
> > http://www.worldatlas.com/**citypops.htm<
> http://www.worldatlas.com/citypops.htm>
> > Do you have a place where the use case and the requirements are
> documented
> > for this? Or is it out of scope?
> > Will it be course grained, i.e. website level ? Or fine grained, i.e.
> text
> > paragraph level? See e.g. how Berlin is highlighted here:
> > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
> > vorprojekt/index.php?**annotation_request=http%3A%2F%**
> > 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
> >
> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C<
> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C
> >
> > in this very early prototype.
> >
> > Could you give me a link were I can read more about any Wikidata plans
> > towards this direction?
> > Sebastian
> >
> >
> >
> > On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
> >
> > Dear all,
> > (Note: I could not find the document, where your requirements regarding
> > the tracking of facts on the web are written, so I am giving a general
> > introduction to NIF. Please send me a link to the document that specifies
> > your need for tracing facts on the web, thanks)
> >
> > I would like to point your attention to the URIs used in the NLP
> > Interchange Format (NIF).
> > NIF-URIs are quite easy to use, understand and implement. NIF has a
> > one-triple-per-annotation paradigm.  The latest documentation can be
> found
> > here:
> > http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<
> http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
> >
> > The basic idea is to use URIs with hash fragment ids to annotate or mark
> > pages on the web:
> > An example is the first occurrence of "Semantic Web" on
> > http://www.w3.org/**DesignIssues/LinkedData.html<
> http://www.w3.org/DesignIssues/LinkedData.html>
> > as highlighted here:
> > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
> > vorprojekt/index.php?**annotation_request=http%3A%2F%**
> > 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
> > 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web<
> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
> >
> >
> > Here is a NIF example for linking a part of the document to the DBpedia
> > entry of the Semantic Web:
> > <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<
> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
> >
> >      a str:StringInContext ;
> >      sso:oen
> > <http://dbpedia.org/resource/**Semantic_Web<
> http://dbpedia.org/resource/Semantic_Web>>
> > .
> >
> >
> > We are currently preparing a new draft for the spec 2.0. The old one can
> > be found here:
> > http://nlp2rdf.org/nif-1-0/
> >
> > There are several EU projects that intend to use NIF. Furthermore, it is
> > easier for everybody, if we standardize a Web annotation format together.
> > Please give feedback of your use cases.
> > All the best,
> > Sebastian
> >
> >
> > --
> > Dipl. Inf. Sebastian Hellmann
> > Department of Computer Science, University of Leipzig
> > Projects: http://nlp2rdf.org , http://dbpedia.org
> > Homepage:
> > http://bis.informatik.uni-**leipzig.de/SebastianHellmann<
> http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> > Research Group: http://aksw.org
> >
> >
> > ______________________________**_________________
> > Wikidata-l mailing list
> > Wikidata-l@lists.wikimedia.org
> > https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
> >
> >
> >
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > Wikidata-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
> >
> >
> >
> > --
> > Dipl. Inf. Sebastian Hellmann
> > Department of Computer Science, University of Leipzig
> > Projects: http://nlp2rdf.org , http://dbpedia.org
> > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> > Research Group: http://aksw.org
>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

Reply via email to