Hi Lewis
seedurl.txt file is as follows
http://timesofindia.feedsportal.com/c/33039/f/533917/index.rss
http://timesofindia.feedsportal.com/c/33039/f/533922/index.rss
http://www.thehindu.com/news/cities/Delhi/?service=rss
http://www.thehindu.com/news/international/?service=rss
http://indianexpress.com/section/sports/cricket/feed/
http://indianexpress.com/section/sports/feed/
http://news.google.co.in/news?cf=all&hl=en&pz=1&ned=in&output=rss
While executing parse phase, all the URLs extracting form the rss-feeds
are kept as out_links with corresponding title. but fail to extract and
store pub_date,author etc for each URL(through ROME API which is
already used in nutch).
On Monday 28 March 2016 06:24 PM, Lewis John Mcgibbney wrote:
Hi harsh,
Do you have an example URL?
On Mon, Mar 28, 2016 at 2:59 AM, <user-digest-h...@nutch.apache.org> wrote:
From: harsh <harsh.sha...@orkash.com>
To: user@nutch.apache.org
Cc:
Date: Mon, 28 Mar 2016 15:29:26 +0530
Subject: Get all the feed metadata
Hi All
I want to get all articles feed with their metadata (category,publishing
date,snippet) through RSS. I explored Nutch 2.x but this gives only Url and
title.What should I do to get the metadata.
Thanks
Harsh