On 11 March 2013 15:04, Rohan Thakur <[email protected]> wrote:
> hi
>
> I am new to nutch I wanted to know does nutch take care of any kind of
> format change in the urls that we have set to crawl and does not require
> any manual changes to the kind of changes that has been applied to the urls
> to be crawled. like if we want to extract the price and model number from
> particular urls and have configured it in nutch now we if they have changed
> the way the model name and its price been displayed in the urls like any
> changes in the tags will we still be able to extract the required data from
> the urls without changing any thing in nutch.

If I understand you correctly, you are asking if Nutch
can automatically adapt to changes in a web page's
structure. The answer is no, beyond maybe something
trivial that can be captured by an extension to Nutch's
HtmlParser. Maybe you could give an example of what
you are trying to accomplish.

Regards,
Gora

Reply via email to