Hi Team,

I need to crawl a website using Apache Nutch. Currently, I am using Nutch
1.x.

I have followed the steps provided in the following URL upto 'invertlink'
step.

https://wiki.apache.org/nutch/NutchTutorial

Then, used 'readseg' command to dump the segments. The dump file is created
successfully.

Now, I have the following questions.

1. Is this the right file (segment dump file) to read contents of a
website? If yes, how to read the contents from dump file? I am unable to
read as it looks like encrypted.
2. Otherwise, how can I read the contents of a website?

Thanks,
Vijay

Reply via email to