Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-18 Thread Christian Bizer
Hi David, Mark, and Hugh, But I wonder where so many other sites (including mine) went ? The problem with crawling the Web of Linked Data is really that it is hard to get the datasets on the edges that set RDF links to other sources but are not the target of links from well-connected

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-18 Thread Michael Brunnbauer
Hello Chris, On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote: Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL. See http://www.sengine.info/ We try to crawl 1000 URLs from every site that has

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-18 Thread Kingsley Idehen
On 8/18/14 12:35 PM, Michael Brunnbauer wrote: Hello Chris, On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote: Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL. Seehttp://www.sengine.info/ We

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-18 Thread Michael Brunnbauer
Hello Kingsley, On Mon, Aug 18, 2014 at 12:57:35PM +0100, Kingsley Idehen wrote: Do you not have this data in RDF form? No. Part of our sales volume comes from selling such data (as CSV). Chris can have it for free for non-commercial use. Ideally, you should publish this data in a form

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-17 Thread Hugh Glaser
On 16 Aug 2014, at 12:57, David Wood da...@3roundstones.com wrote: On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote: On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote: Hi, But I wonder where so many other sites (including mine) went ? The problem

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-16 Thread David Wood
On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote: On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote: Hi, But I wonder where so many other sites (including mine) went ? The problem with crawling the Web of Linked Data is really that it is hard to get

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-08-15 Thread Mark Baker
On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote: Hi, But I wonder where so many other sites (including mine) went ? The problem with crawling the Web of Linked Data is really that it is hard to get the datasets on the edges that set RDF links to other sources but are

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-30 Thread Kingsley Idehen
On 7/25/14 6:31 PM, Michael Brunnbauer wrote: Hello Kingsley, On Fri, Jul 25, 2014 at 05:47:58PM -0400, Kingsley Idehen wrote: When you have a sense of the identity of an Agent and on behalf of whom it is operating, you can use RDF based Linked Data to construct and enforce usage policies.

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Christian Bizer
Hi, But I wonder where so many other sites (including mine) went ? The problem with crawling the Web of Linked Data is really that it is hard to get the datasets on the edges that set RDF links to other sources but are not the target of links from well-connected sources. We tried to work

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread ahogan
On 25/07/2014 06:04, Christian Bizer wrote: These problems are also the reason why we ask people on the list to point us at additional data sources, so that we upcoming cloud diagram can be as comprehensive as possible and it would be great if you could also point us at your sites. Just to

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Sarven Capadisli
On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote: Finally I wanted to raise one other troubling observation re: the LOD Cloud, which was that *only one new dataset was added to the LOD Cloud group in datahub over a period of twelve months* [2,3]. Jerven just added the first dataset in 8 months,

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser
Hi Aiden, I think I probably agree with everything you say, but with one exception: On 25 Jul 2014, at 19:14, aho...@dcc.uchile.cl wrote: found that the crawl encountered many problems accessing the various datasets in the catalogue: robots.txt, 401s, 502s, bad conneg, 404/dead, etc. The idea

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Sarven Capadisli
On 2014-07-25 20:46, Sarven Capadisli wrote: On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote: Finally I wanted to raise one other troubling observation re: the LOD Cloud, which was that *only one new dataset was added to the LOD Cloud group in datahub over a period of twelve months* [2,3].

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread ahogan
On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for a typical consumer of the dataset. By that measure, serious numbers of

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread ahogan
On 25/07/2014 15:00, Sarven Capadisli wrote: On 2014-07-25 20:46, Sarven Capadisli wrote: On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote: snip Given the state of CKAN, or at least the version that's running datahub.io now, looking solely at the lodcloud group [4], may be misleading. I

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser
Very interesting. On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote: On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the spider, but is clearly not a problem for

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread ahogan
On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting. On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote: On 25/07/2014 14:44, Hugh Glaser wrote: The idea that having a robots.txt that Disallows spiders is a “problem” for a dataset is rather bizarre. It is of course a problem for the

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Kingsley Idehen
On 7/25/14 3:12 PM, aho...@dcc.uchile.cl wrote: Put simply, as far as I can see, a dereferenceable URI behind a robots.txt blacklist is no longer a dereferenceable URI ... at least for a respectful software agent. Linked Data behind a robots.txt blacklist is no longer Linked Data. When you

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Kingsley Idehen
On 7/25/14 5:13 PM, aho...@dcc.uchile.cl wrote: Then it seems our core disagreement is on the notion of a robot, which is indeed a grey area. A Web Robot, Bot, or Spider is an Agent :-) You can use the identity of an Agent to control the privileges it has in your Linked Open Data Space. --

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser
Hi, Well, as you might guess, I can’t say I agree. Firstly, as you correctly say, if there is a robots.txt with Disallow / on the RDF on a LD site, then it effectively prohibits any LD app from accessing the LD. So clearly that can’t be what the publisher intended (the idea of publishing RDF

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Luca Matteis
Robots.txt to me works well for a web of documents. That is, wanting only humans to access certain resources. But for a web of data, why resort to a robots.txt when you could simply not put the resource online in the first place? On Fri, Jul 25, 2014 at 11:54 PM, Hugh Glaser h...@glasers.org

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Michael Brunnbauer
Hello Kingsley, On Fri, Jul 25, 2014 at 05:47:58PM -0400, Kingsley Idehen wrote: When you have a sense of the identity of an Agent and on behalf of whom it is operating, you can use RDF based Linked Data to construct and enforce usage policies. sarcasm Yes. Every Agent that does not use

Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Hugh Glaser
Hi Luca, Thanks for asking. I have resources that number 100Ms and even 1Bs of resolvable URIs. I even have datasets with effectively infinite numbers of URIs. Some people seem to find them useful, in the sense that they want to look specific things up. These are not documents - they are