Hi Markus,
thank you for the quick reply. I already searched for this Configuration
error and found:
http://www.mail-archive.com/[email protected]/msg15397.html
Where they say that "This exception is innocuous - it helps to debug at
which points in the code the Configuration instances are being created.
(...)"
I have indeed not much disk space on the machine but it should be enough
at the moment:
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 20G 5.9G 15G 30% /home
As I am root and all directories under
/home/nutchServer/relaunch_nutch/runtime/local/bin are set to root:root
and 755 permissions shouldn't be the problem.
Any further suggestions? :-/
Thank you once again
Am 23.08.2011 16:10, schrieb Markus Jelsma:
There are some peculiarities in your log:
2011-08-23 14:47:34,833 DEBUG conf.Configuration - java.io.IOException:
config()
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:213)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:190)
at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:292)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:255)
2011-08-23 14:47:34,922 INFO mapred.JobClient - Running job: job_local_0002
2011-08-23 14:47:34,923 DEBUG conf.Configuration - java.io.IOException:
config(config)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:226)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:184)
at org.apache.hadoop.mapreduce.JobContext.<init>(JobContext.java:52)
at org.apache.hadoop.mapred.JobContext.<init>(JobContext.java:32)
at org.apache.hadoop.mapred.JobContext.<init>(JobContext.java:38)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:111)
Can you check permissions, disk space etc?
On Tuesday 23 August 2011 16:05:16 Marek Bachmann wrote:
Hey Ho,
for some reasons the inverlinks command produces an empty linkdb.
I did:
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin#
./nutch invertlinks crawl/linkdb crawl/segments/* -noNormalize -noFilter
LinkDb: starting at 2011-08-23 14:47:21
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: false
LinkDb: URL filter: false
LinkDb: adding segment: crawl/segments/20110817164804
LinkDb: adding segment: crawl/segments/20110817164912
LinkDb: adding segment: crawl/segments/20110817165053
LinkDb: adding segment: crawl/segments/20110817165524
LinkDb: adding segment: crawl/segments/20110817170729
LinkDb: adding segment: crawl/segments/20110817171757
LinkDb: adding segment: crawl/segments/20110817172919
LinkDb: adding segment: crawl/segments/20110819135218
LinkDb: adding segment: crawl/segments/20110819165658
LinkDb: adding segment: crawl/segments/20110819170807
LinkDb: adding segment: crawl/segments/20110819171841
LinkDb: adding segment: crawl/segments/20110819173350
LinkDb: adding segment: crawl/segments/20110822135934
LinkDb: adding segment: crawl/segments/20110822141229
LinkDb: adding segment: crawl/segments/20110822143419
LinkDb: adding segment: crawl/segments/20110822143824
LinkDb: adding segment: crawl/segments/20110822144031
LinkDb: adding segment: crawl/segments/20110822144232
LinkDb: adding segment: crawl/segments/20110822144435
LinkDb: adding segment: crawl/segments/20110822144617
LinkDb: adding segment: crawl/segments/20110822144750
LinkDb: adding segment: crawl/segments/20110822144927
LinkDb: adding segment: crawl/segments/20110822145249
LinkDb: adding segment: crawl/segments/20110822150757
LinkDb: adding segment: crawl/segments/20110822152354
LinkDb: adding segment: crawl/segments/20110822152503
LinkDb: adding segment: crawl/segments/20110822153900
LinkDb: adding segment: crawl/segments/20110822155321
LinkDb: adding segment: crawl/segments/20110822155732
LinkDb: merging with existing linkdb: crawl/linkdb
LinkDb: finished at 2011-08-23 14:47:35, elapsed: 00:00:14
After that:
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin#
./nutch readlinkdb crawl/linkdb/ -dump linkdump
LinkDb dump: starting at 2011-08-23 14:48:26
LinkDb dump: db: crawl/linkdb/
LinkDb dump: finished at 2011-08-23 14:48:27, elapsed: 00:00:01
And then:
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin# cd
linkdump/
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin/linkdump#
ll
total 0
-rwxrwxrwx 1 root root 0 Aug 23 14:48 part-00000
root@hrz-vm180:/home/nutchServer/relaunch_nutch/runtime/local/bin/linkdump#
As you see, the dump size is 0 byte.
Unfortunately I have no idea what went wrong.
I have attached the hadoop.log for the inverlinks process. Perhaps that
helps anybody?