Run nutch 2.x in eclipse

2015-07-02 Thread ThiepLV
Hello I have just begin study nutch, And I try run nutch 2.x in eclipse by tutorial https://wiki.apache.org/nutch/RunNutchInEclipse But when run InjectorJob then i have a exeption as follows: 2015-07-03 00:42:18,532 ERROR crawl.InjectorJob (InjectorJob.java:run(278)) - InjectorJob: java.lang.Clas

Re: [MASSMAIL]Parent URL

2015-07-02 Thread Jorge Luis Betancourt González
If by parent/root page you're talking about the URL that points to the page (kind of a referrer): page A has a link to page B ( A -> B) you can index in Solr the inlinks of each page. This can be done with a simple indexing filter, perhaps [1] can be helpful, but it depends on what you're trying

Re: Parent URL

2015-07-02 Thread Julien Nioche
Hi Shani Tracking the seed URL which led to a given page is easy : you can add a custom metadata to the seeds being the seed URL itself e.g. *http://www.guardian.co.uk seed=http://www.guardian.co.uk * then specify 'seed' as a value for the co

Parent URL

2015-07-02 Thread Chaushu, Shani
Hi, I'm using Nutch 1.9 with Solr 4.10 There is any way so see in solr for each page the parent/root page they came from? Thanks, Shani - Intel Electronics Ltd. This e-mail and any attachments may contain confidential material