Hi together,
I tried to get Apache Nutch 2.2.1 running in a "Getting started" manner
(not the best, I know, but the only possible) and compiled some
information from the 1.x and 2.x tutorial and the command line options
listing (http://wiki.apache.org/nutch/CommandLineOptions). Maybe you can
help me by pointing to the item where I'm wrong and help others by
improving the tutorial for 2.x:

1. wget
http://apache.openmirror.de/nutch/2.2.1/apache-nutch-2.2.1-src.tar.gz,
untaring, cd etc.
2. downloaded and started HBase, shell is running, test database
creation successful
3. adjusting ivy/ivy.xml to include HBase, adjusting conf/gora.properties
4. ant && ant runtime && cd runtime/local
5. added value for property http.agent.name to conf/nutch-site.xml
6. added <pre>http://nutch.apache.org/</pre> to urls/seed.txt
7. invoking
$ bin/nutch org.apache.nutch.crawl.InjectorJob urls/seed.txt
causes
<code>
bin/nutch org.apache.nutch.crawl.InjectorJob urls/seed.txt
InjectorJob: starting at 2014-05-28 19:19:08
InjectorJob: Injecting urlDir: urls/seed.txt
InjectorJob: org.apache.gora.util.GoraException:
java.lang.RuntimeException: java.lang.NumberFormatException: For input
string: "55723B܂^
�OPBUF

richter-local.de����ߛ�("
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
        at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException:
For input string: "55723B܂^
�OPBUF

richter-local.de����ߛ�("
        at 
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127)
        at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
        ... 7 more
Caused by: java.lang.NumberFormatException: For input string: "55723B܂^
�OPBUF

richter-local.de����ߛ�("
        at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:492)
        at java.lang.Integer.parseInt(Integer.java:527)
        at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:63)
        at
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
        at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354)
        at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
        at 
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109)
        ... 9 more

</code>
My /etc/hosts looks like this:
<code>
127.0.0.1       localhost
</code>
I recognize the substring "richter-local.de" which is a host which I
used before restart /etc/hosts.

I found out that on
http://etechnologytips.com/create-web-crawler-data-miner/ one can find
pretty much the same setup, but I didn't follow it in the first place.

Any help is appreciated :)

Best regards,
Kalle Richter

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to