Help on adding custom headers

2017-01-01 Thread AshokRaj.Lourdusamy
Hi, Is there a way to add headers, whenever the url to be fetched matches a condition. Ex: http://www.abc123.com, if url contains (abc) then something like, addheader("username", "nutch") ?I think we can add headers for all requests in config. But is this condition che

Need help on getting HTML content

2016-12-15 Thread AshokRaj.Lourdusamy
Hi, For a particular tag (), I need to save the entire HTML of the tag. Now I am able to save only the text content in getText() called in HTMLParser.java. But there is no way to store the HTML content. Please share your thoughts on this. [cid:fa305800-d0e3-4d32-9d78-d446a104d2b4] Thanks

Re: Need to index Parent URL also

2016-11-29 Thread AshokRaj.Lourdusamy
Thanks Sebastian. I did as you suggested, & it worked like a charm. It would have took me days otherwise. :) The targets for-loop handles each link. So there I am adding it to the metadata. [cid:ff67b182-784f-4201-be44-0b45b6cdcf5d] From: Sebastian Na

Need to index Parent URL also

2016-11-27 Thread AshokRaj.Lourdusamy
Hi, While nutch1.x is indexing in solr (or Elasticsearch) I need to include the immediate parent URL too. There is no clear help online on where to do this. I don't need the hierarchy till seed url, but just the immediate parent of current parsing document. Someone suggested to do it on out