/robots.txt does not
exists but http://www.hurriyet.com.tr/robots.txt exists.
Ahmet
--- On Sun, 11/18/12, Karl Wright daddy...@gmail.com wrote:
From: Karl Wright daddy...@gmail.com
Subject: Re: Anyone out there using RSS connector, who wants to help?
To: Ahmet Arslan iori...@yahoo.com, dev
://rss.hurriyet.com.tr/robots.txt does not
exists but http://www.hurriyet.com.tr/robots.txt exists.
Ahmet
--- On Sun, 11/18/12, Karl Wright daddy...@gmail.com wrote:
From: Karl Wright daddy...@gmail.com
Subject: Re: Anyone out there using RSS connector, who wants to help?
To: Ahmet Arslan iori
Subject: Re: Anyone out there using RSS connector, who wants to help?
Hi,
Regarding WARN 2012-11-17 23:01:17,649 (Worker thread '31') -
Pre-ingest service interruption reported for job 1353185325276
connection 'rss': Couldn't fetch robots.txt from
http://www.milliyet.com.tr:-1;
I see that http
/2012 4:47 PM
To: dev@manifoldcf.apache.org
Subject: Re: Anyone out there using RSS connector, who wants to help?
Hi,
Regarding WARN 2012-11-17 23:01:17,649 (Worker thread '31') -
Pre-ingest service interruption reported for job 1353185325276
connection 'rss': Couldn't fetch robots.txt from
:
From: Karl Wright daddy...@gmail.com
Subject: Re: Anyone out there using RSS connector, who wants to help?
To: Ahmet Arslan iori...@yahoo.com, dev@manifoldcf.apache.org
dev@manifoldcf.apache.org
Date: Sunday, November 18, 2012, 8:04 PM
Hi Ahmet,
I tried your example, but it looked like
/12, Ahmet Arslan iori...@yahoo.com wrote:
From: Ahmet Arslan iori...@yahoo.com
Subject: Re: Anyone out there using RSS connector, who wants to help?
To: dev@manifoldcf.apache.org
Date: Saturday, November 17, 2012, 11:11 PM
Hi Karl,
Never used rss connector. But here is what I have done
Hi all,
The branch https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-120
contains an RSS connector that has been updated to use httpcomponents
4.2.2. I'd love for people who are in a position to do significant
RSS crawling to try it out before I pull it into trunk. Any takers?