Hi David, Mark, and Hugh,
But I wonder where so many other sites (including mine) went ?
The problem with crawling the Web of Linked Data is really that it is hard
to get the datasets on the edges that set RDF links to other sources but
are not the target of links from well-connected
Hello Chris,
On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote:
Sorry, we are not Google and simply did not have the resources to crawl the
whole Web and as for RDF/XML when dereferencing each URL.
See http://www.sengine.info/
We try to crawl 1000 URLs from every site that has
On 8/18/14 12:35 PM, Michael Brunnbauer wrote:
Hello Chris,
On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote:
Sorry, we are not Google and simply did not have the resources to crawl the
whole Web and as for RDF/XML when dereferencing each URL.
Seehttp://www.sengine.info/
We
Hello Kingsley,
On Mon, Aug 18, 2014 at 12:57:35PM +0100, Kingsley Idehen wrote:
Do you not have this data in RDF form?
No. Part of our sales volume comes from selling such data (as CSV). Chris
can have it for free for non-commercial use.
Ideally, you should publish this data
in a form
On 16 Aug 2014, at 12:57, David Wood da...@3roundstones.com wrote:
On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote:
On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote:
Hi,
But I wonder where so many other sites (including mine) went ?
The problem
On Aug 15, 2014, at 1:55 PM, Mark Baker dist...@acm.org wrote:
On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote:
Hi,
But I wonder where so many other sites (including mine) went ?
The problem with crawling the Web of Linked Data is really that it is hard
to get
On Fri, Jul 25, 2014 at 6:04 AM, Christian Bizer ch...@bizer.de wrote:
Hi,
But I wonder where so many other sites (including mine) went ?
The problem with crawling the Web of Linked Data is really that it is hard to
get the datasets on the edges that set RDF links to other sources but are
On 7/25/14 6:31 PM, Michael Brunnbauer wrote:
Hello Kingsley,
On Fri, Jul 25, 2014 at 05:47:58PM -0400, Kingsley Idehen wrote:
When you have a sense of the identity of an Agent and on behalf of whom it
is operating, you can use RDF based Linked Data to construct and enforce
usage policies.
Hi,
But I wonder where so many other sites (including mine) went ?
The problem with crawling the Web of Linked Data is really that it is hard to
get the datasets on the edges that set RDF links to other sources but are not
the target of links from well-connected sources.
We tried to work
On 25/07/2014 06:04, Christian Bizer wrote:
These problems are also the reason why we ask people on the list to
point us at additional data sources, so that we upcoming cloud diagram
can be as comprehensive as possible and it would be great if you could
also point us at your sites.
Just to
On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote:
Finally I wanted to raise one other troubling observation re: the LOD
Cloud, which was that *only one new dataset was added to the LOD Cloud
group in datahub over a period of twelve months* [2,3]. Jerven just added
the first dataset in 8 months,
Hi Aiden,
I think I probably agree with everything you say, but with one exception:
On 25 Jul 2014, at 19:14, aho...@dcc.uchile.cl wrote:
found that the crawl encountered many problems accessing the various
datasets in the catalogue: robots.txt, 401s, 502s, bad conneg, 404/dead,
etc.
The idea
On 2014-07-25 20:46, Sarven Capadisli wrote:
On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote:
Finally I wanted to raise one other troubling observation re: the LOD
Cloud, which was that *only one new dataset was added to the LOD Cloud
group in datahub over a period of twelve months* [2,3].
On 25/07/2014 14:44, Hugh Glaser wrote:
The idea that having a robots.txt that Disallows spiders
is a problem for a dataset is rather bizarre.
It is of course a problem for the spider, but is clearly not a problem
for a
typical consumer of the dataset.
By that measure, serious numbers of
On 25/07/2014 15:00, Sarven Capadisli wrote:
On 2014-07-25 20:46, Sarven Capadisli wrote:
On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote:
snip
Given the state of CKAN, or at least the version that's running
datahub.io now, looking solely at the lodcloud group [4], may be
misleading. I
Very interesting.
On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote:
On 25/07/2014 14:44, Hugh Glaser wrote:
The idea that having a robots.txt that Disallows spiders
is a “problem” for a dataset is rather bizarre.
It is of course a problem for the spider, but is clearly not a problem
for
On 25/07/2014 15:54, Hugh Glaser wrote: Very interesting.
On 25 Jul 2014, at 20:12, aho...@dcc.uchile.cl wrote:
On 25/07/2014 14:44, Hugh Glaser wrote:
The idea that having a robots.txt that Disallows spiders
is a problem for a dataset is rather bizarre.
It is of course a problem for the
On 7/25/14 3:12 PM, aho...@dcc.uchile.cl wrote:
Put simply, as far as I can see, a dereferenceable URI behind a robots.txt
blacklist is no longer a dereferenceable URI ... at least for a respectful
software agent. Linked Data behind a robots.txt blacklist is no longer
Linked Data.
When you
On 7/25/14 5:13 PM, aho...@dcc.uchile.cl wrote:
Then it seems our core disagreement is on the notion of a robot, which is
indeed a grey area.
A Web Robot, Bot, or Spider is an Agent :-)
You can use the identity of an Agent to control the privileges it has in
your Linked Open Data Space.
--
Hi,
Well, as you might guess, I can’t say I agree.
Firstly, as you correctly say, if there is a robots.txt with Disallow / on the
RDF on a LD site, then it effectively prohibits any LD app from accessing the
LD.
So clearly that can’t be what the publisher intended (the idea of publishing
RDF
Robots.txt to me works well for a web of documents. That is, wanting
only humans to access certain resources. But for a web of data, why
resort to a robots.txt when you could simply not put the resource
online in the first place?
On Fri, Jul 25, 2014 at 11:54 PM, Hugh Glaser h...@glasers.org
Hello Kingsley,
On Fri, Jul 25, 2014 at 05:47:58PM -0400, Kingsley Idehen wrote:
When you have a sense of the identity of an Agent and on behalf of whom it
is operating, you can use RDF based Linked Data to construct and enforce
usage policies.
sarcasm
Yes. Every Agent that does not use
Hi Luca,
Thanks for asking.
I have resources that number 100Ms and even 1Bs of resolvable URIs.
I even have datasets with effectively infinite numbers of URIs.
Some people seem to find them useful, in the sense that they want to look
specific things up.
These are not documents - they are
23 matches
Mail list logo