Hi - i think i would implement a custom parser filter that looks for specific tags and attributes and add it to the parse meta data. Using the index-metatags plugin i would then have those newly added fields indexed.
Markus -----Original message----- From:Alan Francis <[email protected]> Sent:Tue 27-05-2014 15:47 Subject:Identifying Video Links in Pages To:[email protected]; I have a use case in which we want to separate pages which have an iframe embed tag from youtube. and add it as a additional field for indexing. I am using apache Nutch 1.8 with Solr 4.8 What I have done so far is to over-ride the "parse-html" plugin and identify iframe tags with youtube urls in ComContentUtils.getTextHelper() and append it in "content" with some special tags I then receive the content in an Custom Indexing filter plugin to extract the urls from the content and add it as a new field in NutchDocument. Is there a better way to do this? -- -Alan Francis

