[Wikidata] Solve legal uncertainty of Wikidata

2018-05-08 Thread mathieu stumpf guntz

Hello everybody,

There is a phabricator ticket on Solve legal uncertainty of Wikidata 
 that you might be interested 
to look at and participate in.


As Denny suggested in the ticket to give it more visibility through the 
discussion on the Wikidata chat 
, 
I thought it was interesting to highlight it a bit more.


Cheers

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-08 Thread Sebastian Hellmann

Hi Lucas, Denny,

all you need to do is update your entry on old.datahub.io:

https://old.datahub.io/dataset/wikidata

It was edited by Lucie-Aimée Kaffee two years ago. You need to contact 
her, as she created the Wikimedia org in Datahub. I might be able to 
have someone switch ownership of the org to a new account.


But there is many essential metadata missing:

Compare with the DBpedia entry: https://old.datahub.io/dataset/dbpedia

Especially the links and the triple size in the bottom. So you need to 
keep this one updated in order to appear in the LOD cloud.


Please tell me if you can't edit it, I know a former admin from the time 
datahub.io was first created 10 years ago in LOD2 and LATC EU projects, 
he might be able to do something in case there is nobody answering due 
to datahub.io switching to a new style.


All the best,

Sebastian


On 07.05.2018 22:35, Lucas Werkmeister wrote:
Folks, I’m already in contact with John, there’s no need to contact 
him again :)


Cheers, Lucas

Am Mo., 7. Mai 2018 um 19:32 Uhr schrieb Denny Vrandečić 
mailto:vrande...@gmail.com>>:


Well, then, we have tried several times to get into that diagram,
and it never worked out.

So, given the page you linke, it says:


  Contributing to the Diagram

First, make sure that you publish data according to the Linked
Data principles .
We interpret this as:

  * There must be /resolvable http:// (or https://) URIs/.
  * They must resolve, with or without content negotiation, to
/RDF data/ in one of the popular RDF formats (RDFa, RDF/XML,
Turtle, N-Triples).
  * The dataset must contain /at least 1000 triples/. (Hence, your
FOAF file most likely does not qualify.)
  * The dataset must be connected via /RDF links/ to a dataset
that is already in the diagram. This means, either your
dataset must use URIs from the other dataset, or vice versa.
We arbitrarily require at least 50 links.
  * Access of the /entire/ dataset must be possible via /RDF
crawling/, via an /RDF dump/, or via a /SPARQL endpoint/.

The process for adding datasets is still under development, please
contact John P. McCrae  to add a new dataset


Wikidata fulfills all the conditions easily. So, here we go, I am
adding John to this thread - although I know he already knows
about this request - and I am asking officially to enter Wikidata
into the LOD diagram.

Let's keep it all open, and see where it goes from here.

Cheers,
Denny


On Mon, May 7, 2018 at 4:15 AM Sebastian Hellmann
mailto:hellm...@informatik.uni-leipzig.de>> wrote:

Hi Denny, Maarten,

you should read your own emails. In fact it is quite easy to
join the LOD cloud diagram.

The most important step is to follow the instructions on the
page: http://lod-cloud.net under how to contribute and then
add the metadata.

Some years ago I made a Wordpress with enabled Linked Data:
http://www.klappstuhlclub.de/wp/ Even this is included as I
simply added the metadata entry.

Do you really think John McCrae added a line in the code that
says "if (dataset==wikidata) skip; " ?

You just need to add it like everybody else in LOD, DBpedia
also created its entry and updates it now and then. The same
accounts for http://lov.okfn.org Somebody from Wikidata needs
to upload the Wikidata properties as OWL.  If nobody does it,
it will not be in there.

All the best,

Sebastian


On 04.05.2018 18:33, Maarten Dammers wrote:

It almost feels like someone doesn’t want Wikidata in there?
Maybe that website is maintained by DBpedia fans? Just
thinking out loud here because DBpedia is very popular in the
academic world and Wikidata a huge threat for that popularity.

Maarten

Op 4 mei 2018 om 17:20 heeft Denny Vrandečić
mailto:vrande...@gmail.com>> het
volgende geschreven:


I'm pretty sure that Wikidata is doing better than 90% of
the current bubbles in the diagram.

If they wanted to have Wikidata in the diagram it would have
been there before it was too small to read it. :)

On Tue, May 1, 2018 at 7:47 AM Peter F. Patel-Schneider
mailto:pfpschnei...@gmail.com>> wrote:

Thanks for the corrections.

So https://www.wikidata.org/entity/Q42 is *the* Wikidata
IRI for Douglas
Adams.  Retrieving from this IRI results in a 303 See
Other to
https://www.wikidata.org/wiki/Special:EntityData/Q42,
which (I guess) is the
main IRI for representations of Douglas Adams and other
pages with
information about him.

From
htt

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



 
 

Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" 
To: "Discussion list for the Wikidata project." 
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

 
The DBpedia Databus is a platform that allows to exchange, curate and access 
data between multiple stakeholders. Any data entering the bus will be 
versioned, cleaned, mapped, linked and its licenses and provenance tracked. 
Hosting in multiple formats will be provided to access the data either as dump 
download or as API. Data governance stays with the data contributors.
 

Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
 

Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they could and should, but are hindered by technical and organisational 
barriers. They duplicate work on the same data. On the other hand, companies 
selling data can not do so in a scalable way. The loser is the consumer with 
the choice of inferior open data or buying from a djungle-like market.

Publishing data on the databus

If you are grinding your teeth about how to publish data on the web, you can 
just use the databus to do so. Data loaded on the bus will be highly visible, 
available and queryable. You should think of it as a service:

Visibility guarantees, that your citations and reputation goes up
Besides a web download, we can also provide a Linked Data interface, SPARQL 
endpoint, Lookup (autocomplete) or many other means of availability (like AWS 
or Docker images)
Any distribution we are doing will funnel feedback and collaboration 
opportunities your way to improve your dataset and your internal data quality
You will receive an enriched dataset, which is connected and complemented with 
any other available data (see the same folder names in data and fusion folders).
 
 

Data Sellers

If you are selling data, the databus provides numerous opportunities for you. 
You can link your offering to the open entities in the databus. This allows 
consumers to discover your services better by showing it with each request.
 

Data Consumers

Open data on the databus will be a commodity. We are greatly downing the cost 
for understanding the data, retrieving and reformatting it. We are constantly 
extending ways of using the data and are willing to implement any formats and 
APIs you need.
If you are lacking a certain kind of data, we can also scout for it and load it 
onto the databus.
 
 

How the Databus works at the moment

We are still in an initial state, but we already load 10 datasets (6 from 
DBpedia, 4 external) on the bus using these phases:

Acquisition: data is downloaded from the source and logged in
Conversion: data is converted to N-Triples and cleaned (Syntax parsing, 
datatype validation and SHACL)
Mapping: the vocabulary is mapped on the DBpedia Ontology and converted (We 
have been doing this for Wikipedia’s Infoboxes and Wikidata, but now we do it 
for other datasets as well)
Linking: Links are mainly collected from the sources, cleaned and enriched
IDying: All entities found are given a new Databus ID for tracking
Clustering: ID’s are merged onto clusters using one of the Databus ID’s as 
cluster representative
Data Comparison: Each dataset is compared with all other datasets. We have an 
algorithm that decides on the best value, but the main goal here is 
transparency, i.e. to see which data value was chosen and how it compares to 
the other sources.
A main knowledge graph fused from all the sources, i.e. a transparent aggregate
For each source, we are producing a local fused version called the “Databus 
Complement”. This is a major feedback mechanism for all data providers, where 
they can see what data they are missing, what data differs in other sources and 
what links are available for their IDs.
You can compare all data via a webservice (early prototype, just works for 
Eiffel Tower): 
http://88.99.242.78:9000/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2F12HpzV&p=http%3A%2F%2Fdbpedia.org%2Fontology%2Farchitect&src=general[http://88.99.242.78:9000/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2F12HpzV&p=http%3A%2F%2Fdbpedia.org%2Fontology%2Farchitect&src=general]

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Sebastian Hellmann

Hi Laura,


I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely?


a valid question. DBpedia is quite decentralised and hard to understand 
in its entirety. So actually some parts are improved and others will be 
replaced eventually (also an improvement, hopefully).


The main improvement here is that, we don't have  large monolithic 
releases that take forever anymore. Especially the language chapters and 
also the professional community can work better with the "platform" in 
terms of turnaround, effective contribution and also incentives for 
contribution. Another thing that will hopefully improve is that we can 
more sustainably maintain contributions and add-ons, which were formerly 
lost between releases. So the structure and processes will be clearer.


The DBpedia in the "main endpoint" will still be there, but in a way 
that nl.dbpedia.org/sparql or wikidata.dbpedia.org/sparql is there. The 
new hosted service will be more a knowledge graph of knowledge graph, 
where you can get either all information in a fused way or you can 
quickly jump to the sources, compare and do improvements there. Projects 
and organisations can also upload their data to query it there 
themselves or share it with others and persist it. Companies can sell or 
advertise their data. The core consists of the Wikipedia/Wikidata data 
and we hope to be able to improve it and also send contributors and 
contributions back to the Wikiverse.



Are you a DBPedia maintainer?
Yes, I took it as my task to talk to everybody in the community over the 
last year and draft/aggregate the new strategy and innovate.


All the best,
Sebastian


On 08.05.2018 13:42, Laura Morales wrote:

I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



  
  


Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" 
To: "Discussion list for the Wikidata project." 
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

  
The DBpedia Databus is a platform that allows to exchange, curate and access data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API. Data governance stays with the data contributors.
  


Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
  


Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they could and should, but are hindered by technical and organisational 
barriers. They duplicate work on the same data. On the other hand, companies 
selling data can not do so in a scalable way. The loser is the consumer with 
the choice of inferior open data or buying from a djungle-like market.

Publishing data on the databus

If you are grinding your teeth about how to publish data on the web, you can 
just use the databus to do so. Data loaded on the bus will be highly visible, 
available and queryable. You should think of it as a service:

Visibility guarantees, that your citations and reputation goes up
Besides a web download, we can also provide a Linked Data interface, SPARQL 
endpoint, Lookup (autocomplete) or many other means of availability (like AWS 
or Docker images)
Any distribution we are doing will funnel feedback and collaboration 
opportunities your way to improve your dataset and your internal data quality
You will receive an enriched dataset, which is connected and complemented with 
any other available data (see the same folder names in data and fusion folders).
  
  


Data Sellers

If you are selling data, the databus provides numerous opportunities for you. 
You can link your offering to the open entities in the databus. This allows 
consumers to discover your services better by showing it with each request.
  


Data Consumers

Open data on the databus will be a commodity. We are greatly downing the cost 
for understanding the data, retrieving and reformatting it. We are constantly 
extending ways of using the data and are willing to implement any formats and 
APIs you need.
If you are lacking a certain kind of data, we can also scout for it and load 

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
So, in short, DBPedia is turning into a business with a "community edition + 
enterprise edition" kind of model?


 
 

Sent: Tuesday, May 08, 2018 at 2:29 PM
From: "Sebastian Hellmann" 
To: "Discussion list for the Wikidata project" , 
"Laura Morales" 
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

Hi Laura,
 
I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? 
 
a valid question. DBpedia is quite decentralised and hard to understand in its 
entirety. So actually some parts are improved and others will be replaced 
eventually (also an improvement, hopefully).
The main improvement here is that, we don't have  large monolithic releases 
that take forever anymore. Especially the language chapters and also the 
professional community can work better with the "platform" in terms of 
turnaround, effective contribution and also incentives for contribution. 
Another thing that will hopefully improve is that we can more sustainably 
maintain contributions and add-ons, which were formerly lost between releases. 
So the structure and processes will be clearer.
The DBpedia in the "main endpoint" will still be there, but in a way that 
nl.dbpedia.org/sparql or wikidata.dbpedia.org/sparql is there. The new hosted 
service will be more a knowledge graph of knowledge graph, where you can get 
either all information in a fused way or you can quickly jump to the sources, 
compare and do improvements there. Projects and organisations can also upload 
their data to query it there themselves or share it with others and persist it. 
Companies can sell or advertise their data. The core consists of the 
Wikipedia/Wikidata data and we hope to be able to improve it and also send 
contributors and contributions back to the Wikiverse.
 
Are you a DBPedia maintainer?
Yes, I took it as my task to talk to everybody in the community over the last 
year and draft/aggregate the new strategy and innovate.

All the best,
Sebastian

 
On 08.05.2018 13:42, Laura Morales wrote:
I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



 
 

Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" 
[mailto:hellm...@informatik.uni-leipzig.de]
To: "Discussion list for the Wikidata project." 
[mailto:wikidata@lists.wikimedia.org]
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

 
The DBpedia Databus is a platform that allows to exchange, curate and access 
data between multiple stakeholders. Any data entering the bus will be 
versioned, cleaned, mapped, linked and its licenses and provenance tracked. 
Hosting in multiple formats will be provided to access the data either as dump 
download or as API. Data governance stays with the data contributors.
 

Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
 

Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they could and should, but are hindered by technical and organisational 
barriers. They duplicate work on the same data. On the other hand, companies 
selling data can not do so in a scalable way. The loser is the consumer with 
the choice of inferior open data or buying from a djungle-like market.

Publishing data on the databus

If you are grinding your teeth about how to publish data on the web, you can 
just use the databus to do so. Data loaded on the bus will be highly visible, 
available and queryable. You should think of it as a service:

Visibility guarantees, that your citations and reputation goes up
Besides a web download, we can also provide a Linked Data interface, SPARQL 
endpoint, Lookup (autocomplete) or many other means of availability (like AWS 
or Docker images)
Any distribution we are doing will funnel feedback and collaboration 
opportunities your way to improve your dataset and your internal data quality
You will receive an enriched dataset, which is connected and complemented with 
any other available data (see the same folder names in data and fusion folders).
 
 

Data Sellers

If you are selling data, the databus provides numerous opportunities for you. 
You can link your offering to the open entities in the databus. This allows 
consumers to discover your services better by s

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Sebastian Hellmann

Hi Laura,


On 08.05.2018 15:30, Laura Morales wrote:

So, in short, DBPedia is turning into a business with a "community edition + 
enterprise edition" kind of model?



No, definitely not. We were asked by many companies to make an 
enterprise edition, but we concluded that this would diminish the 
quality of available open data.


So the core tools are more a GitHub for data, where you can fork, mix, 
republish. The business model is a 
https://en.wikipedia.org/wiki/Clearing_house_(finance) where you can do 
the transactions yourself or pay if you would like the convenience of 
somebody else doing the work. This is an adaption of business models 
from open source software.


There is also a vision of producing economic Linked Data. Many bubbles 
in the LOD cloud have deteriorated a lot, since they have run out of 
funding for maintenance. In the future, we hope to provide a revenue 
stream for them via the clearing house mechanisms, i.e. files free, 
queryable via SPARQL/Linked Data as a paid service.


Also since the data is open, there should be no conflict to synchronize 
with Wikidata and make Wikidata richer.



All the best,
Sebastian


  
  


Sent: Tuesday, May 08, 2018 at 2:29 PM
From: "Sebastian Hellmann" 
To: "Discussion list for the Wikidata project" , "Laura 
Morales" 
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

Hi Laura,
  
I don't understand, is this just another project built on DBPedia, or a project to replace DBPedia entirely?
  
a valid question. DBpedia is quite decentralised and hard to understand in its entirety. So actually some parts are improved and others will be replaced eventually (also an improvement, hopefully).

The main improvement here is that, we don't have  large monolithic releases that take 
forever anymore. Especially the language chapters and also the professional community can 
work better with the "platform" in terms of turnaround, effective contribution 
and also incentives for contribution. Another thing that will hopefully improve is that 
we can more sustainably maintain contributions and add-ons, which were formerly lost 
between releases. So the structure and processes will be clearer.
The DBpedia in the "main endpoint" will still be there, but in a way that 
nl.dbpedia.org/sparql or wikidata.dbpedia.org/sparql is there. The new hosted service 
will be more a knowledge graph of knowledge graph, where you can get either all 
information in a fused way or you can quickly jump to the sources, compare and do 
improvements there. Projects and organisations can also upload their data to query it 
there themselves or share it with others and persist it. Companies can sell or advertise 
their data. The core consists of the Wikipedia/Wikidata data and we hope to be able to 
improve it and also send contributors and contributions back to the Wikiverse.
  
Are you a DBPedia maintainer?

Yes, I took it as my task to talk to everybody in the community over the last 
year and draft/aggregate the new strategy and innovate.

All the best,
Sebastian

  
On 08.05.2018 13:42, Laura Morales wrote:

I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



  
  


Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" 
[mailto:hellm...@informatik.uni-leipzig.de]
To: "Discussion list for the Wikidata project." 
[mailto:wikidata@lists.wikimedia.org]
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

  
The DBpedia Databus is a platform that allows to exchange, curate and access data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API. Data governance stays with the data contributors.
  


Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
  


Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they could and should, but are hindered by technical and organisational 
barriers. They duplicate work on the same data. On the other hand, companies 
selling data can not do so in a scalable way. The loser is the consumer with 
the choice of inferior open data or b

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Thad Guidry
So basically...

where you get "compute" heavy (querying SPARQL)... you are going to charge
fees for providing that compute heavy query service.

where you are not "compute" heavy (providing download bandwidth to get
files) ... you are not going to charge fees.

-Thad
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
Is this a question for Sebastian, or are you talking on behalf of the project?

 
 

Sent: Tuesday, May 08, 2018 at 5:10 PM
From: "Thad Guidry" 
To: "Discussion list for the Wikidata project" 
Cc: "Laura Morales" 
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

So basically...

where you get "compute" heavy (querying SPARQL)... you are going to charge fees 
for providing that compute heavy query service.
 where you are not "compute" heavy (providing download bandwidth to get files) 
... you are not going to charge fees.
 -Thad
 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Thad Guidry
I am asking Sebastian about the rationale for paid service.


On Tue, May 8, 2018 at 2:47 PM Laura Morales  wrote:

> Is this a question for Sebastian, or are you talking on behalf of the
> project?
>
>
>
>
> Sent: Tuesday, May 08, 2018 at 5:10 PM
> From: "Thad Guidry" 
> To: "Discussion list for the Wikidata project" <
> wikidata@lists.wikimedia.org>
> Cc: "Laura Morales" 
> Subject: Re: [Wikidata] DBpedia Databus (alpha version)
>
> So basically...
>
> where you get "compute" heavy (querying SPARQL)... you are going to charge
> fees for providing that compute heavy query service.
>  where you are not "compute" heavy (providing download bandwidth to get
> files) ... you are not going to charge fees.
>  -Thad
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata