Hi Small suggestion, but I do not see any -dir argument passed alongside your initial invertlinks command. I understand that you have multiple segment directories, which have been fetched over a recent number of days, and that the output would also suggest the process was properly executed, however I have never used the command without the -dir option (as it has always worked for me), therefore I can only suggest that this may be the problem.
On Tue, Aug 23, 2011 at 3:29 PM, Marek Bachmann <[email protected]>wrote: > Hi Markus, > > thank you for the quick reply. I already searched for this Configuration > error and found: > > http://www.mail-archive.com/**[email protected]/**msg15397.html<http://www.mail-archive.com/[email protected]/msg15397.html> > > Where they say that "This exception is innocuous - it helps to debug at > which points in the code the Configuration instances are being created. > (...)" > > I have indeed not much disk space on the machine but it should be enough at > the moment: > > root@hrz-vm180:/home/**nutchServer/relaunch_nutch/**runtime/local/bin# df > -h . > Filesystem Size Used Avail Use% Mounted on > /dev/vda1 20G 5.9G 15G 30% /home > > As I am root and all directories under > /home/nutchServer/relaunch_**nutch/runtime/local/bin > are set to root:root and 755 permissions shouldn't be the problem. > > Any further suggestions? :-/ > > Thank you once again > > > > Am 23.08.2011 16:10, schrieb Markus Jelsma: > > There are some peculiarities in your log: >> >> 2011-08-23 14:47:34,833 DEBUG conf.Configuration - java.io.IOException: >> config() >> at org.apache.hadoop.conf.**Configuration.<init>(** >> Configuration.java:211) >> at org.apache.hadoop.conf.**Configuration.<init>(** >> Configuration.java:198) >> at org.apache.hadoop.mapred.**JobConf.<init>(JobConf.java:**213) >> at >> org.apache.hadoop.mapred.**LocalJobRunner$Job.<init>(** >> LocalJobRunner.java:93) >> at >> org.apache.hadoop.mapred.**LocalJobRunner.submitJob(** >> LocalJobRunner.java:373) >> at >> org.apache.hadoop.mapred.**JobClient.submitJobInternal(** >> JobClient.java:800) >> at org.apache.hadoop.mapred.**JobClient.submitJob(JobClient.** >> java:730) >> at org.apache.hadoop.mapred.**JobClient.runJob(JobClient.** >> java:1249) >> at org.apache.nutch.crawl.LinkDb.**invert(LinkDb.java:190) >> at org.apache.nutch.crawl.LinkDb.**run(LinkDb.java:292) >> at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65) >> at org.apache.nutch.crawl.LinkDb.**main(LinkDb.java:255) >> >> 2011-08-23 14:47:34,922 INFO mapred.JobClient - Running job: >> job_local_0002 >> 2011-08-23 14:47:34,923 DEBUG conf.Configuration - java.io.IOException: >> config(config) >> at org.apache.hadoop.conf.**Configuration.<init>(** >> Configuration.java:226) >> at org.apache.hadoop.mapred.**JobConf.<init>(JobConf.java:**184) >> at org.apache.hadoop.mapreduce.**JobContext.<init>(JobContext.** >> java:52) >> at org.apache.hadoop.mapred.**JobContext.<init>(JobContext.** >> java:32) >> at org.apache.hadoop.mapred.**JobContext.<init>(JobContext.** >> java:38) >> at >> org.apache.hadoop.mapred.**LocalJobRunner$Job.run(** >> LocalJobRunner.java:111) >> >> >> Can you check permissions, disk space etc? >> >> >> >> On Tuesday 23 August 2011 16:05:16 Marek Bachmann wrote: >> >>> Hey Ho, >>> >>> for some reasons the inverlinks command produces an empty linkdb. >>> >>> I did: >>> >>> root@hrz-vm180:/home/**nutchServer/relaunch_nutch/**runtime/local/bin# >>> ./nutch invertlinks crawl/linkdb crawl/segments/* -noNormalize -noFilter >>> LinkDb: starting at 2011-08-23 14:47:21 >>> LinkDb: linkdb: crawl/linkdb >>> LinkDb: URL normalize: false >>> LinkDb: URL filter: false >>> LinkDb: adding segment: crawl/segments/20110817164804 >>> LinkDb: adding segment: crawl/segments/20110817164912 >>> LinkDb: adding segment: crawl/segments/20110817165053 >>> LinkDb: adding segment: crawl/segments/20110817165524 >>> LinkDb: adding segment: crawl/segments/20110817170729 >>> LinkDb: adding segment: crawl/segments/20110817171757 >>> LinkDb: adding segment: crawl/segments/20110817172919 >>> LinkDb: adding segment: crawl/segments/20110819135218 >>> LinkDb: adding segment: crawl/segments/20110819165658 >>> LinkDb: adding segment: crawl/segments/20110819170807 >>> LinkDb: adding segment: crawl/segments/20110819171841 >>> LinkDb: adding segment: crawl/segments/20110819173350 >>> LinkDb: adding segment: crawl/segments/20110822135934 >>> LinkDb: adding segment: crawl/segments/20110822141229 >>> LinkDb: adding segment: crawl/segments/20110822143419 >>> LinkDb: adding segment: crawl/segments/20110822143824 >>> LinkDb: adding segment: crawl/segments/20110822144031 >>> LinkDb: adding segment: crawl/segments/20110822144232 >>> LinkDb: adding segment: crawl/segments/20110822144435 >>> LinkDb: adding segment: crawl/segments/20110822144617 >>> LinkDb: adding segment: crawl/segments/20110822144750 >>> LinkDb: adding segment: crawl/segments/20110822144927 >>> LinkDb: adding segment: crawl/segments/20110822145249 >>> LinkDb: adding segment: crawl/segments/20110822150757 >>> LinkDb: adding segment: crawl/segments/20110822152354 >>> LinkDb: adding segment: crawl/segments/20110822152503 >>> LinkDb: adding segment: crawl/segments/20110822153900 >>> LinkDb: adding segment: crawl/segments/20110822155321 >>> LinkDb: adding segment: crawl/segments/20110822155732 >>> LinkDb: merging with existing linkdb: crawl/linkdb >>> LinkDb: finished at 2011-08-23 14:47:35, elapsed: 00:00:14 >>> >>> After that: >>> >>> root@hrz-vm180:/home/**nutchServer/relaunch_nutch/**runtime/local/bin# >>> ./nutch readlinkdb crawl/linkdb/ -dump linkdump >>> LinkDb dump: starting at 2011-08-23 14:48:26 >>> LinkDb dump: db: crawl/linkdb/ >>> LinkDb dump: finished at 2011-08-23 14:48:27, elapsed: 00:00:01 >>> >>> And then: >>> >>> root@hrz-vm180:/home/**nutchServer/relaunch_nutch/**runtime/local/bin# >>> cd >>> linkdump/ >>> root@hrz-vm180:/home/**nutchServer/relaunch_nutch/** >>> runtime/local/bin/linkdump# >>> ll >>> total 0 >>> -rwxrwxrwx 1 root root 0 Aug 23 14:48 part-00000 >>> root@hrz-vm180:/home/**nutchServer/relaunch_nutch/** >>> runtime/local/bin/linkdump# >>> >>> As you see, the dump size is 0 byte. >>> >>> Unfortunately I have no idea what went wrong. >>> >>> I have attached the hadoop.log for the inverlinks process. Perhaps that >>> helps anybody? >>> >> >> > -- *Lewis*

