Hi Tony!

Kinda like that: 

// host -> xpath map
static Map<String, String> xpaths = ... //initialize map only once
...
String xpath = xpaths.get(content.getUrl().getHost())


Best regards, Alexander

--- Вт, 11.6.13, Tony Mullins <tonymullins...@gmail.com> пишет:

> От: Tony Mullins <tonymullins...@gmail.com>
> Тема: Re: Data Extraction from 100+ different sites...
> Кому: user@nutch.apache.org
> Дата: Вторник, 11 июнь 2013, 20:59
> Hi Markus,
> 
> I couldn't understand how can I avoid switch cases in your
> suggested
> idea....
> 
> I would have one plugin which will implement HtmlParseFilter
> and I would
> have to check the current URL by getting content.getUrl()
> and this all will
> be happening in same class so I would have to add swicth
> cases... I may
> could add xpath expression for each site in separate files
> but to get XPath
> expression I would have to decide which file I have to read
> and for that I
> would have to add my this code logic in swith case....
> 
> Please correct me if I am getting this all wrong !!!
> 
> And I think this is common requirement for web crawling
> solutions to get
> custom data from page... then are not there any such Nutch
> plugins
> available on web ?
> 
> Thanks,
> Tony.

Reply via email to