I have question on the contents of crawldb folder with Nutch 1.6 After I do updatedb step, crawldb folder includes the following. Is this correct result I should get? If not, how I can fix it?
If I execute "generate" on this crawldb below, will it generate full url lists? My concern is that updatedb process is not completed fully because we "624730206" and "current" folder at the same time. Does Nutch take care of this? I appreciate your help hduser@hadoopdev1:~$ hadoop dfs -ls 160milyonurls/crawldb Warning: $HADOOP_HOME is deprecated. Found 3 items drwxr-xr-x - hduser supergroup 0 2013-07-05 23:55 /user/hduser/160milyonurls/crawldb/624730206 drwxr-xr-x - hduser supergroup 0 2013-07-08 18:59 /user/hduser/160milyonurls/crawldb/current drwxr-xr-x - hduser supergroup 0 2013-07-03 14:39 /user/hduser/160milyonurls/crawldb/old -- View this message in context: http://lucene.472066.n3.nabble.com/crawldb-contents-tp4076345.html Sent from the Nutch - User mailing list archive at Nabble.com.