Hi Tony! Kinda like that:
// host -> xpath map static Map<String, String> xpaths = ... //initialize map only once ... String xpath = xpaths.get(content.getUrl().getHost()) Best regards, Alexander --- Вт, 11.6.13, Tony Mullins <tonymullins...@gmail.com> пишет: > От: Tony Mullins <tonymullins...@gmail.com> > Тема: Re: Data Extraction from 100+ different sites... > Кому: user@nutch.apache.org > Дата: Вторник, 11 июнь 2013, 20:59 > Hi Markus, > > I couldn't understand how can I avoid switch cases in your > suggested > idea.... > > I would have one plugin which will implement HtmlParseFilter > and I would > have to check the current URL by getting content.getUrl() > and this all will > be happening in same class so I would have to add swicth > cases... I may > could add xpath expression for each site in separate files > but to get XPath > expression I would have to decide which file I have to read > and for that I > would have to add my this code logic in swith case.... > > Please correct me if I am getting this all wrong !!! > > And I think this is common requirement for web crawling > solutions to get > custom data from page... then are not there any such Nutch > plugins > available on web ? > > Thanks, > Tony.