Hello and good evening. 
I'new at nutch. I am using the version 2.2.1 and MySql  as datastore. I
followed this tutorial : http://nlp.solutions.asia/?p=362#more-362. The
first time that i made crawl, it was a success. I start with this url:
nutch.apache.org. I could see the result on my database in workbench. But
when i tried a different url, the crawl began to fail one after another.

I have this in regex-urlfilter.txt:

# accept anything else
#+.

+^http://([a-z0-9]*\.)* nutch.apache.org/

#
-.

And in nutch-site.xml, I have:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>http.agent.name</name>
<value>Maria</value>
</property>

<property> 
<name>http.robots.agents</name> 
<value>Maria,*</value> ....
</description> 
</property>

<property>
<name>http.accept.language</name>
<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>

<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other
information
is available</description>
</property>

<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ….
</description>
</property>

</configuration>

First I started to look for solution for this problem:

InjectorJob: total number of urls rejected by filters: 2
InjectorJob: total number of urls injected after normalization: 0

But I didn't find match to solve this. So, i'm searching for solution to
this exception that is in hadoop.log:

java.lang.NullPointerException
        at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
        at 
org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

I didn't find match to solve this yet.
I appreciate any suggestion.
And thanks for reading.

Maria *.*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/InjectorJob-total-number-of-urls-injected-after-normalization-and-filtering-0-looking-for-solutions-tp4111993.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to