Hi Folks, A very happy new year to all of you.
I am currently using Apache nutch 1.16 and successfully extracting the html content given seed urls. One of the requirements I have is to extract all the image and video links from the html in a separate object. Since I have the html content, I can use a library like jsoup to parse the content and extract img tags. I was wondering if there is a way in nutch to do this? I am assuming I will have to override HtmlParseFilter class and then add my extraction logic there. Is my understanding correct? Any sample code reference will be helpful as well. Thanks Prateek

