date:20050826

Re: Don't Aggregrate Me

2005-08-26 Thread Eric Scheid

On 26/8/05 3:55 PM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: > Remember, PubSub never does > anything that a desktop client doesn't do. Periodic re-fetching is a robotic behaviour, common to both desktop aggregators and server based aggregators. Robots.txt was established to minimise harm caused b

Re: Don't Aggregrate Me

2005-08-26 Thread Antone Roundy

On Friday, August 26, 2005, at 04:39 AM, Eric Scheid wrote: On 26/8/05 3:55 PM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: Remember, PubSub never does anything that a desktop client doesn't do. Periodic re-fetching is a robotic behaviour, common to both desktop aggregators and server based aggre

Re: Don't Aggregrate Me

2005-08-26 Thread A. Pagaltzis

* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 01:00]: > My impression has always been that robots.txt was intended to > stop robots that crawl a site (i.e. they read one page, extract > the URLs from it and then read those pages). I don't believe > robots.txt is intended to stop processes that simpl

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

There are no wildcards in /robots.txt, only path prefixes and user-agent names. There is one special user-agent, "*", which means "all". I can't think of any good reason to always ignore the disallows for *. I guess it is OK to implement the parts of a spec that you want. Just don't answer "yes"

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Antone Roundy wrote: > I'm with Bob on this. If a person publishes a feed without limiting > access to it, they either don't know what they're doing, or they're > EXPECTING it to be polled on a regular basis. As long as PubSub > doesn't poll too fast, the publisher is getting exactly what they >

Re: Don't Aggregrate Me

2005-08-26 Thread James M Snell

Ok, so this discussion has definitely been interesting... let's see if we can turn it into something actionable. 1. Desktop aggregators and services like pubsub really do not fall into the same category as robots/crawlers and therefore should not necessarily be paying attention to robots.txt

Atom for internal use? , etc

2005-08-26 Thread blaine

Hi, I am developing a web app framework and am using a RSSesque xml format to allow the application layer talk to a higher layer. XSL is used to combine this and then passes it on up to the gui layer. I would like to move to Atom instead of RSS since Atom seems to encapsulate more things. I woul

Re: Atom for internal use? , etc

2005-08-26 Thread A. Pagaltzis

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> [2005-08-26 19:20]: > Is there an attribute to the LINK tag I can use as some sort of id that i can > assign to the feed tag and link to it via the link tag? My first thought would be to use xml:id on the feed element and a fragment identifier in the @href.

Re: Don't Aggregrate Me

2005-08-26 Thread Mark Pilgrim

On 8/25/05, Roger B. <[EMAIL PROTECTED]> wrote: > > Mhh. I have not looked into this. But is not every desktop aggregator > > a robot? > > Henry: Depends on who you ask. (See the Newsmonster debates from a > couple years ago.) As I am the one who kicked off the Newsmonster debates a couple years

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

--On August 26, 2005 9:51:10 AM -0700 James M Snell <[EMAIL PROTECTED]> wrote: > Add a new link rel="readers" whose href points to a robots.txt-like file that > either allows or disallows the aggregator for specific URI's and establishes > polling rate preferences > > User-agent: {aggregator-u

Re: Don't Aggregrate Me

2005-08-26 Thread James M Snell

Graham wrote: (And before you say "but my aggregator is nothing but a podcast client, and the feeds are nothing but links to enclosures, so it's obvious that the publisher wanted me to download them" -- WRONG! The publisher might want that, or they might not ... So you're saying browsers

Re: Don't Aggregrate Me

2005-08-26 Thread Karl Dubost

Le 05-08-25 à 18:51, Bob Wyman a écrit : At PubSub we *never* "crawl" to discover feed URLs. The only feeds we know about are: 1. Feeds that have announced their presence with a ping 2. Feeds that have been announced to us via a FeedMesh message. 3. Feeds that have been manually

Re: Don't Aggregrate Me

2005-08-26 Thread Graham

On 26 Aug 2005, at 7:46 pm, Mark Pilgrim wrote: 2. If a user gives a feed URL to a program *and then the program finds all the URLs in that feed and requests them too*, the program needs to support robots.txt exclusions for all the URLs other than the original URL it was given. ... (And befo

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Mark Pilgrim wrote (among other things): > (And before you say "but my aggregator is nothing but a podcast > client, and the feeds are nothing but links to enclosures, so > it's obvious that the publisher wanted me to download them" -- WRONG! I agree with just about everything that Mark wr

Re: Don't Aggregrate Me

2005-08-26 Thread Roger B.

> Remember, PubSub never does > anything that a desktop client doesn't do. Bob: What about FeedMesh? If I ping blo.gs, they pass that ping along to you, and PubSub fetches my feed, then PubSub is doing something a desktop client doesn't do. It's following a link found in one place and retrieving/

Re: Don't Aggregrate Me

2005-08-26 Thread A. Pagaltzis

* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 22:50]: > It strikes me that not all URIs are created equally and not > everything that looks like crawling is really "crawling." @xlink:type? Regards, -- Aristotle Pagaltzis //

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Roger Benningfield wrote: > We've got a mechanism that allows any user with his own domain > and a text editor to tell us whether or not he wants us messing with > his stuff. I think it's foolish to ignore that. The problem is that we have *many* such mechanisms. Robots.txt is only one. Ot

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Karl Dubost wrote: > - How one who has previously submitted a feed URL remove it from > the index? (Change of opinions) If you are the publisher of a feed and you don't want us to monitor your content, complain to us and we'll filter you out. Folk do this every once in a while. Send us an

Re: Don't Aggregrate Me

2005-08-26 Thread Karl Dubost

Le 05-08-26 à 17:53, Bob Wyman a écrit : Karl Dubost wrote: - How one who has previously submitted a feed URL remove it from the index? (Change of opinions) If you are the publisher of a feed and you don't want us to monitor your content, complain to us and we'll filter you out. Folk do

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Karl Dubost points out that it is hard to figure out what email address to send messages to if you want to "de-list" from PubSub...: Karl, Please, accept my apologies for this. I could have sworn we had the policy prominently displayed on the site. I know we used to have it there. This mus

Re: Don't Aggregrate Me

2005-08-26 Thread Eric Scheid

On 27/8/05 6:40 AM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: > I think "crawling" URI's found in tags, > tags and enclosures isn't crawling... Or... Is there something I'm > missing here? crawling tags isn't a huge problem because it doesn't lead to a recursive situation. Same withh stylesheets

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

I'm adding robots@mccmedia.com to this dicussion. That is the classic list for robots.txt discussion. Robots list: this is a discussion about the interactions of /robots.txt and clients or robots that fetch RSS feeds. "Atom" is a new format in the RSS family. --On August 26, 2005 8:39:59 PM +100

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

RE: Don't Aggregrate Me

Re: Don't Aggregrate Me

Atom for internal use? , etc

Re: Atom for internal use? , etc

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

RE: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

RE: Don't Aggregrate Me

RE: Don't Aggregrate Me

Re: Don't Aggregrate Me

RE: Don't Aggregrate Me

Re: Don't Aggregrate Me

Re: Don't Aggregrate Me

22 matches

Site Navigation

Mail list logo

Footer information