Hi Chris
> There are currently 2 plugins that parse feeds and get them indexed:
> parse-rss - older, but gets the job done
> feed - newer, and takes advantage of the ability to parse/index feeds in
> one step, rather than in many
[..]
> Parse-rss indexes the whole feed, whereas the feed plugi
Hi Brian,
Sorry for taking so long to reply. Here ya go:
> Do you have any URLs for feeds that are reliably parsed and indexed by
> the feed parser?
I haven't tested/used this plugin in a quite a while. There was someone on
the nutch-user list before, nutch.newbie, that was doing quite a bit
Hi Pike,
Parse-rss indexes the whole feed, whereas the feed plugin takes advantage
of NUTCH-443, which allows Parsers to return multiple Parse objects, which
indexes each item in the feed as its own record.
HTH,
Chris
On 10/15/07 7:25 AM, "Pike" <[EMAIL PROTECTED]> wrote:
> Hi
>
>>> I hav
Hi
>> I have this with all results: what is indexed
>> seems to be 1 record per feed, containing a
>> parsed version of the content including all its items,
>> with sometimes bits of xml and html markup in it.
>>
>> I was assuming this is the intended behaviour ?
>
> It may well be the intended
Pike wrote:
Hi Ricky, Chris
I've not noticed much
difference, with both plugins failing on the feedburner feed:
- http://feeds.feedburner.com/Techcrunch
Strange, but that feed is indeed invalid xml if I wget it.
It starts with newlines and ends with comments. Very
picky, but that's not all
Hi Ricky, Chris
> I've not noticed much
> difference, with both plugins failing on the feedburner feed:
>
> - http://feeds.feedburner.com/Techcrunch
>
Strange, but that feed is indeed invalid xml if I wget it.
It starts with newlines and ends with comments. Very
picky, but that's not allowed af
Chris Mattmann wrote:
There are currently 2 plugins that parse feeds and get them indexed:
parse-rss - older, but gets the job done
feed - newer, and takes advantage of the ability to parse/index feeds in
one step, rather than in many
I didn't realise this as I was using 0.9 where only pars
Chris,
Recently, I've been playing around with the feed plugin from the nightly
build but unsuccessfully. I can't get any indexed fields from feeds in
the wild.
Do you have any URLs for feeds that are reliably parsed and indexed by
the feed parser? Does it actually index atom at present? There
Hi Rick,
Glad to hear that you're interested in using Nutch!
There are currently 2 plugins that parse feeds and get them indexed:
parse-rss - older, but gets the job done
feed - newer, and takes advantage of the ability to parse/index feeds in
one step, rather than in many
There are other
Hi all,
I've recently downloaded Nutch v0.9, to experiment in searching blog
posts and RSS/Atom feeds. So far I have managed to get it to
successfully crawl, index and search some websites.
I am now starting my investigations to use Nutch to crawl/index/search
news/blog feeds. And have inc
10 matches
Mail list logo