Please feel free to send this sort of poll to "HCLS"
. I think that it's an important issue
but I only got around to forwarding it to a small group at HCLS. So,
maybe for the next round.
Cheers,
Scott
--
M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
http://staff.science.uv
Le 23 juin 2011 à 04:36, Martin Hepp a écrit :
> There already exist respective blacklists and services, e.g.
> http://www.bot-trap.de/home/
Added to/Edited http://www.w3.org/wiki/Bad_Crawlers
And modified http://www.w3.org/wiki/Write_Web_Crawler
--
Karl Dubost - http://dev.opera.com/
Developer
On 6/23/11 12:42 PM, Dieter Fensel wrote:
At 01:32 PM 6/23/2011, Sebastian Schaffert wrote:
I am very well aware of the problem of adoption. At the same time, we
have a similar problem not only in the publication of the data but
also in the consumption: if we do not let users consume our data e
On 6/23/11 12:21 PM, Henry Story wrote:
we are applying recursively linked data to solve a linked data problem.
That's the neat bit:-)
That's the heart and soul of this matter.
Dogfood Linked Data and it goes viral.
We don't have fine grained functional ACLs on the InterWeb. That's a
mass
On 6/23/11 12:13 PM, Michael Brunnbauer wrote:
re
On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote:
config = {
'Googlebot':['googlebot.com'],
'Mediapartners-Google':['googlebot.com'],
'msnbot':['live.com','msn.com','bing.com'],
'bingbot':['live.com','msn.com','bing.com'],
'Yahoo!
At 01:32 PM 6/23/2011, Sebastian Schaffert wrote:
I am very well aware of the problem of adoption. At the same time,
we have a similar problem not only in the publication of the data
but also in the consumption: if we do not let users consume our data
even in large scale, what use is the data a
On 23 Jun 2011, at 13:27, Lin Clark wrote:
>
>
> Radical, no?
> If the Drupal RDF module takes up significantly more resources to generate
> its output as compared to the HTML and other renderers, then YES, it should
> protect itself. Instead of e-mailing every Drupal user, e-mail the Drupal
Martin,
Am 23.06.2011 um 10:30 schrieb Martin Hepp:
> Sebastian, all:
> The community may not publicly admit it, but: SW and LOD have been BEGGING
> for adoption for almost a decade. Now, if someone outside of a University
> project publishes valuable RDF data in a well-above-the-standards way,
>
> Radical, no?
> If the Drupal RDF module takes up significantly more resources to generate
> its output as compared to the HTML and other renderers, then YES, it should
> protect itself. Instead of e-mailing every Drupal user, e-mail the Drupal
> RDF module developers and request them to impleme
On 23 Jun 2011, at 13:13, Michael Brunnbauer wrote:
>
> On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote:
>>> config = {
>>> 'Googlebot':['googlebot.com'],
>>> 'Mediapartners-Google':['googlebot.com'],
>>> 'msnbot':['live.com','msn.com','bing.com'],
>>> 'bingbot':['live.com','msn.
On 23 Jun 2011, at 12:38, Lin Clark wrote:
>
> On Thu, Jun 23, 2011 at 10:12 AM, Pablo Mendes wrote:
> Maybe we should also consider that companies/universities advising people
> (esp. small companies) to publish Linked Data, should give them complete
> advice, including protection. If their
re
On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote:
> >config = {
> >'Googlebot':['googlebot.com'],
> >'Mediapartners-Google':['googlebot.com'],
> >'msnbot':['live.com','msn.com','bing.com'],
> >'bingbot':['live.com','msn.com','bing.com'],
> >'Yahoo! Slurp':['yahoo.com','yahoo.net
> Sorry, but if paying for a service is what is required to protect
> publishers from massive abuse of their resources
No, not what I meant. You can also implement simple methods already
discussed here. Now if your provider does not support that, then you'd be
better off changing to another provi
On Thu, Jun 23, 2011 at 10:12 AM, Pablo Mendes wrote:
>
> Maybe we should also consider that companies/universities advising people
> (esp. small companies) to publish Linked Data, should give them complete
> advice, including protection. If their providers are not able to implement
> such simple
On 6/23/11 11:32 AM, Kingsley Idehen wrote:
Google and friends are the real problem to come, its the inadvertent
SPARQL query that kicks off of a transitive crawl that's going to reek
havoc. Basically, when FYN (Follow-Your-Nose) is executed by Bots --
smart Agents working on behalf of their ti
>
> In the academic/buisness nonsense, you should look at how much IBM and co
> put into SOAP, and where that got them. Pretty much nowhere.
>
I don't agree. SOAP is quite widely adopted in those areas where the use
case (slow running - usually internal to internal - transactions) exist.
Perhaps S
On 6/23/11 9:20 AM, Michael Brunnbauer wrote:
re
On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
Yes, WebID is out of question a good thing. I am not entirely sure, though,
that you can make it a mandatory requirement for access to your site, because
if a few major consumers do n
Richard,
My concern is not really about the idea of blacklisting etc. I am
concerned about the means. Certainly a public wikipage is not a good
place to put accusations.
Le 23/06/2011 11:01, Richard Cyganiak a écrit :
Antoine,
On 23 Jun 2011, at 07:27, Antoine Zimmermann wrote:
I started
We might have learned a couple of things about teaching, although we seemed
to have focused the discussion on only one.
Maybe we should also consider that companies/universities advising people
(esp. small companies) to publish Linked Data, should give them complete
advice, including protection.
Antoine,
On 23 Jun 2011, at 07:27, Antoine Zimmermann wrote:
>> I started a list here: http://www.w3.org/wiki/Bad_Crawlers
>
> What's the use of this list?
> Assume it stays empty, as you hope. What's the use?
That should be obvious.
> Assume it gets filled with names: so what? It does not prov
On 23 Jun 2011, at 10:20, Michael Brunnbauer wrote:
>
> re
>
> On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
>> Yes, WebID is out of question a good thing. I am not entirely sure, though,
>> that you can make it a mandatory requirement for access to your site,
>> because if a f
There already exist respective blacklists and services, e.g.
http://www.bot-trap.de/home/
It is pretty easy to set up honey pots (e.g. a directory "/bottrap"), link to
there from your main-page but disallow crawling in there via robots.txt.
You can the quickly collect and share IPs or IP ranges
On 6/23/11 9:09 AM, Martin Hepp wrote:
Yes, WebID is out of question a good thing. I am not entirely sure, though,
that you can make it a mandatory requirement for access to your site, because
if a few major consumers do not use WebID for their crawlers, site-owners
cannot block anonymous craw
Sebastian, all:
The community may not publicly admit it, but: SW and LOD have been BEGGING for
adoption for almost a decade. Now, if someone outside of a University project
publishes valuable RDF data in a well-above-the-standards way, you make him pay
several hundred Euros for traffic just for
re
On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
> Yes, WebID is out of question a good thing. I am not entirely sure, though,
> that you can make it a mandatory requirement for access to your site, because
> if a few major consumers do not use WebID for their crawlers, site-owne
Andreas,
The difference is that amateur publishers enrich the Web with their diverse
content.
Amateur consumers cause trouble to the Web.
Martin
On Jun 22, 2011, at 9:29 PM, Andreas Harth wrote:
> Hi Martin,
>
> On 06/22/2011 09:08 PM, Martin Hepp wrote:
>> Please make a survey among typical
Yes, WebID is out of question a good thing. I am not entirely sure, though,
that you can make it a mandatory requirement for access to your site, because
if a few major consumers do not use WebID for their crawlers, site-owners
cannot block anonymous crawlers.
On Jun 22, 2011, at 9:10 PM, Kingsl
On 6/23/11 12:08 AM, Sebastian Schaffert wrote:
Am 22.06.2011 um 23:01 schrieb Lin Clark:
On Wed, Jun 22, 2011 at 9:33 PM, Sebastian
Schaffert wrote:
Your complaint sounds to me a bit like "help, too many clients access my data".
I'm sure that Martin is really tired of saying this, so I wil
On 6/22/11 11:26 PM, Henry Story wrote:
On 23 Jun 2011, at 00:11, Alexandre Passant wrote:
On 22 Jun 2011, at 22:49, Richard Cyganiak wrote:
On 21 Jun 2011, at 10:44, Martin Hepp wrote:
PS: I will not release the IP ranges from which the trouble originated, but
rest assured, there were top
Just one more comment: such a list could be useful if it's published by
a well identified person or group who can be contacted in case of
disagreement or to get off the list.
Le 23/06/2011 08:27, Antoine Zimmermann a écrit :
Le 22/06/2011 23:49, Richard Cyganiak a écrit :
On 21 Jun 2011, at 1
30 matches
Mail list logo