Hi together, I tried to get Apache Nutch 2.2.1 running in a "Getting started" manner (not the best, I know, but the only possible) and compiled some information from the 1.x and 2.x tutorial and the command line options listing (http://wiki.apache.org/nutch/CommandLineOptions). Maybe you can help me by pointing to the item where I'm wrong and help others by improving the tutorial for 2.x:
1. wget http://apache.openmirror.de/nutch/2.2.1/apache-nutch-2.2.1-src.tar.gz, untaring, cd etc. 2. downloaded and started HBase, shell is running, test database creation successful 3. adjusting ivy/ivy.xml to include HBase, adjusting conf/gora.properties 4. ant && ant runtime && cd runtime/local 5. added value for property http.agent.name to conf/nutch-site.xml 6. added <pre>http://nutch.apache.org/</pre> to urls/seed.txt 7. invoking $ bin/nutch org.apache.nutch.crawl.InjectorJob urls/seed.txt causes <code> bin/nutch org.apache.nutch.crawl.InjectorJob urls/seed.txt InjectorJob: starting at 2014-05-28 19:19:08 InjectorJob: Injecting urlDir: urls/seed.txt InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.lang.NumberFormatException: For input string: "55723B܂^ �OPBUF richter-local.de����ߛ�(" at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282) Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: For input string: "55723B܂^ �OPBUF richter-local.de����ߛ�(" at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) ... 7 more Caused by: java.lang.NumberFormatException: For input string: "55723B܂^ �OPBUF richter-local.de����ߛ�(" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:63) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354) at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109) ... 9 more </code> My /etc/hosts looks like this: <code> 127.0.0.1 localhost </code> I recognize the substring "richter-local.de" which is a host which I used before restart /etc/hosts. I found out that on http://etechnologytips.com/create-web-crawler-data-miner/ one can find pretty much the same setup, but I didn't follow it in the first place. Any help is appreciated :) Best regards, Kalle Richter
signature.asc
Description: OpenPGP digital signature