Markus,

Thanks, I will try that too.. I am a newbie. will have to read up the
source for index-metatags better.


On Tue, May 27, 2014 at 7:23 PM, Markus Jelsma
<[email protected]>wrote:

> Hi - i think i would implement a custom parser filter that looks for
> specific tags and attributes and add it to the parse meta data. Using the
> index-metatags plugin i would then have those newly added fields indexed.
>
>
> Markus
>
>
> -----Original message-----
> From:Alan Francis <[email protected]>
> Sent:Tue 27-05-2014 15:47
> Subject:Identifying Video Links in Pages
> To:[email protected];
> I have a use case in which we want to separate pages which have an iframe
> embed tag from youtube. and add it as a additional field for indexing.
>
> I am using apache Nutch 1.8 with Solr 4.8
>
> What I have done so far is to over-ride the "parse-html" plugin and
> identify iframe tags with youtube urls in ComContentUtils.getTextHelper()
> and append it in "content" with some special tags
>
> I then receive the content in an Custom Indexing filter plugin to extract
> the urls from the content and add it as a new field in NutchDocument.
>
> Is there a better way to do this?
>
>
>
> --
> -Alan Francis
>



-- 
-Alan Francis

Reply via email to