Re: Help needed: *brief* online poll about blank-nodes

2011-06-23 Thread M. Scott Marshall
Please feel free to send this sort of poll to "HCLS" . I think that it's an important issue but I only got around to forwarding it to a small group at HCLS. So, maybe for the next round. Cheers, Scott -- M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls http://staff.science.uv

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Karl Dubost
Le 23 juin 2011 à 04:36, Martin Hepp a écrit : > There already exist respective blacklists and services, e.g. > http://www.bot-trap.de/home/ Added to/Edited http://www.w3.org/wiki/Bad_Crawlers And modified http://www.w3.org/wiki/Write_Web_Crawler -- Karl Dubost - http://dev.opera.com/ Developer

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 12:42 PM, Dieter Fensel wrote: At 01:32 PM 6/23/2011, Sebastian Schaffert wrote: I am very well aware of the problem of adoption. At the same time, we have a similar problem not only in the publication of the data but also in the consumption: if we do not let users consume our data e

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 12:21 PM, Henry Story wrote: we are applying recursively linked data to solve a linked data problem. That's the neat bit:-) That's the heart and soul of this matter. Dogfood Linked Data and it goes viral. We don't have fine grained functional ACLs on the InterWeb. That's a mass

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 12:13 PM, Michael Brunnbauer wrote: re On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote: config = { 'Googlebot':['googlebot.com'], 'Mediapartners-Google':['googlebot.com'], 'msnbot':['live.com','msn.com','bing.com'], 'bingbot':['live.com','msn.com','bing.com'], 'Yahoo!

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Dieter Fensel
At 01:32 PM 6/23/2011, Sebastian Schaffert wrote: I am very well aware of the problem of adoption. At the same time, we have a similar problem not only in the publication of the data but also in the consumption: if we do not let users consume our data even in large scale, what use is the data a

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Henry Story
On 23 Jun 2011, at 13:27, Lin Clark wrote: > > > Radical, no? > If the Drupal RDF module takes up significantly more resources to generate > its output as compared to the HTML and other renderers, then YES, it should > protect itself. Instead of e-mailing every Drupal user, e-mail the Drupal

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Sebastian Schaffert
Martin, Am 23.06.2011 um 10:30 schrieb Martin Hepp: > Sebastian, all: > The community may not publicly admit it, but: SW and LOD have been BEGGING > for adoption for almost a decade. Now, if someone outside of a University > project publishes valuable RDF data in a well-above-the-standards way,

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Lin Clark
> > Radical, no? > If the Drupal RDF module takes up significantly more resources to generate > its output as compared to the HTML and other renderers, then YES, it should > protect itself. Instead of e-mailing every Drupal user, e-mail the Drupal > RDF module developers and request them to impleme

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Henry Story
On 23 Jun 2011, at 13:13, Michael Brunnbauer wrote: > > On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote: >>> config = { >>> 'Googlebot':['googlebot.com'], >>> 'Mediapartners-Google':['googlebot.com'], >>> 'msnbot':['live.com','msn.com','bing.com'], >>> 'bingbot':['live.com','msn.

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Henry Story
On 23 Jun 2011, at 12:38, Lin Clark wrote: > > On Thu, Jun 23, 2011 at 10:12 AM, Pablo Mendes wrote: > Maybe we should also consider that companies/universities advising people > (esp. small companies) to publish Linked Data, should give them complete > advice, including protection. If their

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Michael Brunnbauer
re On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote: > >config = { > >'Googlebot':['googlebot.com'], > >'Mediapartners-Google':['googlebot.com'], > >'msnbot':['live.com','msn.com','bing.com'], > >'bingbot':['live.com','msn.com','bing.com'], > >'Yahoo! Slurp':['yahoo.com','yahoo.net

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Pablo Mendes
> Sorry, but if paying for a service is what is required to protect > publishers from massive abuse of their resources No, not what I meant. You can also implement simple methods already discussed here. Now if your provider does not support that, then you'd be better off changing to another provi

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Lin Clark
On Thu, Jun 23, 2011 at 10:12 AM, Pablo Mendes wrote: > > Maybe we should also consider that companies/universities advising people > (esp. small companies) to publish Linked Data, should give them complete > advice, including protection. If their providers are not able to implement > such simple

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 11:32 AM, Kingsley Idehen wrote: Google and friends are the real problem to come, its the inadvertent SPARQL query that kicks off of a transitive crawl that's going to reek havoc. Basically, when FYN (Follow-Your-Nose) is executed by Bots -- smart Agents working on behalf of their ti

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread adasal
> > In the academic/buisness nonsense, you should look at how much IBM and co > put into SOAP, and where that got them. Pretty much nowhere. > I don't agree. SOAP is quite widely adopted in those areas where the use case (slow running - usually internal to internal - transactions) exist. Perhaps S

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 9:20 AM, Michael Brunnbauer wrote: re On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote: Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do n

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Antoine Zimmermann
Richard, My concern is not really about the idea of blacklisting etc. I am concerned about the means. Certainly a public wikipage is not a good place to put accusations. Le 23/06/2011 11:01, Richard Cyganiak a écrit : Antoine, On 23 Jun 2011, at 07:27, Antoine Zimmermann wrote: I started

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Pablo Mendes
We might have learned a couple of things about teaching, although we seemed to have focused the discussion on only one. Maybe we should also consider that companies/universities advising people (esp. small companies) to publish Linked Data, should give them complete advice, including protection.

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Richard Cyganiak
Antoine, On 23 Jun 2011, at 07:27, Antoine Zimmermann wrote: >> I started a list here: http://www.w3.org/wiki/Bad_Crawlers > > What's the use of this list? > Assume it stays empty, as you hope. What's the use? That should be obvious. > Assume it gets filled with names: so what? It does not prov

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Henry Story
On 23 Jun 2011, at 10:20, Michael Brunnbauer wrote: > > re > > On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote: >> Yes, WebID is out of question a good thing. I am not entirely sure, though, >> that you can make it a mandatory requirement for access to your site, >> because if a f

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Martin Hepp
There already exist respective blacklists and services, e.g. http://www.bot-trap.de/home/ It is pretty easy to set up honey pots (e.g. a directory "/bottrap"), link to there from your main-page but disallow crawling in there via robots.txt. You can the quickly collect and share IPs or IP ranges

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 9:09 AM, Martin Hepp wrote: Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous craw

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Martin Hepp
Sebastian, all: The community may not publicly admit it, but: SW and LOD have been BEGGING for adoption for almost a decade. Now, if someone outside of a University project publishes valuable RDF data in a well-above-the-standards way, you make him pay several hundred Euros for traffic just for

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Michael Brunnbauer
re On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote: > Yes, WebID is out of question a good thing. I am not entirely sure, though, > that you can make it a mandatory requirement for access to your site, because > if a few major consumers do not use WebID for their crawlers, site-owne

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Martin Hepp
Andreas, The difference is that amateur publishers enrich the Web with their diverse content. Amateur consumers cause trouble to the Web. Martin On Jun 22, 2011, at 9:29 PM, Andreas Harth wrote: > Hi Martin, > > On 06/22/2011 09:08 PM, Martin Hepp wrote: >> Please make a survey among typical

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Martin Hepp
Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous crawlers. On Jun 22, 2011, at 9:10 PM, Kingsl

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/23/11 12:08 AM, Sebastian Schaffert wrote: Am 22.06.2011 um 23:01 schrieb Lin Clark: On Wed, Jun 22, 2011 at 9:33 PM, Sebastian Schaffert wrote: Your complaint sounds to me a bit like "help, too many clients access my data". I'm sure that Martin is really tired of saying this, so I wil

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Kingsley Idehen
On 6/22/11 11:26 PM, Henry Story wrote: On 23 Jun 2011, at 00:11, Alexandre Passant wrote: On 22 Jun 2011, at 22:49, Richard Cyganiak wrote: On 21 Jun 2011, at 10:44, Martin Hepp wrote: PS: I will not release the IP ranges from which the trouble originated, but rest assured, there were top

Re: Think before you write Semantic Web crawlers

2011-06-23 Thread Antoine Zimmermann
Just one more comment: such a list could be useful if it's published by a well identified person or group who can be contacted in case of disagreement or to get off the list. Le 23/06/2011 08:27, Antoine Zimmermann a écrit : Le 22/06/2011 23:49, Richard Cyganiak a écrit : On 21 Jun 2011, at 1