Re: Get all the feed metadata

harsh Mon, 28 Mar 2016 21:00:40 -0700

Hi Lewis

seedurl.txt file is as follows


http://timesofindia.feedsportal.com/c/33039/f/533917/index.rss
http://timesofindia.feedsportal.com/c/33039/f/533922/index.rss
http://www.thehindu.com/news/cities/Delhi/?service=rss
http://www.thehindu.com/news/international/?service=rss
http://indianexpress.com/section/sports/cricket/feed/
http://indianexpress.com/section/sports/feed/
http://news.google.co.in/news?cf=all&hl=en&pz=1&ned=in&output=rss

While executing parse phase, all the URLs extracting form the rss-feedsare kept as out_links with corresponding title. but fail to extract andstore pub_date,author etc for each URL(through ROME API which isalready used in nutch).



On Monday 28 March 2016 06:24 PM, Lewis John Mcgibbney wrote:

Hi harsh,
Do you have an example URL?

On Mon, Mar 28, 2016 at 2:59 AM, <user-digest-h...@nutch.apache.org> wrote:

From: harsh <harsh.sha...@orkash.com>
To: user@nutch.apache.org
Cc:
Date: Mon, 28 Mar 2016 15:29:26 +0530
Subject: Get all the feed metadata
Hi All
I want to get all articles feed with their metadata (category,publishing
date,snippet) through RSS. I explored Nutch 2.x but this gives only Url and
title.What should I do to get the metadata.

Thanks
Harsh

Re: Get all the feed metadata

Reply via email to