hi,
how nutch works: 1. Create a new WebDB (admin db -create). 2. Inject root URLs into the WebDB (inject). 3. Generate a fetchlist from the WebDB in a new segment (generate). 4. Fetch content from URLs in the fetchlist (fetch). 5. Update the WebDB with links from fetched pages (updatedb). 6. Repeat steps 3-5 until the required depth is reached. 7. Update segments with scores and links from the WebDB (updatesegs). 8. Index the fetched pages (index). 9. Eliminate duplicate content (and duplicate URLs) from the indexes (dedup). 10. Merge the indexes into a single index for searching (merge). mehdi > Date: Tue, 1 Feb 2011 04:48:14 -0500 > Subject: Help : Nutch indexing mechanism > From: [email protected] > To: [email protected] > > Hello everybody, > I want to know how nutch actually does indexing..What are the steps involved > in indexing.. > Thanks in advance > Regards > Amna Waqar

