Hi Luis,
I am not sure what will be cause that. Did you check your solr index for
committed document ? Maybe it didn't commit. You dont need run all over
nutch jobs. Other jobs works fine. You can only run dedup job with :
bin/nutch solrdedup sorl_url
After that you can you share your solr.log.
Talat
19-10-2013 04:43 tarihinde, Luis Armando Roca Fumero yazdı:
Thanks a lot Talat :), I truly appreciate your help, and the others persons
that gave me ideas
I fixed Solr schema, following the Nutch Tutorial I had changed the line: <field name="content" type="text_general" stored="true"
indexed="true"/> for <field name="content" type="text" stored="true" indexed="true"/>, but this is wrong
I fixed that and ran again the nutch 1.7 but still getting problems :( , you
can see a new hadoop.log here: http://pastebin.com/2qY0sUJh
The errors are:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:160)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Any ideas are wellcomed!!!
Thanks in advance,
Luis Armando
________________________________________
De: Talat UYARER [[email protected]]
Enviado el: viernes, 18 de octubre de 2013 03:39 p.m.
Para: [email protected]
Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate
Ok Luis,
I found your problem. :) You have a problem about Solr Schema. In your
hadoop.log you can see this line:
1.
org.apache.solr.common.SolrException: {msg=SolrCore 'collection1' is
not available due to init failure: Unknown fieldType 'text'
specified on field
content,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Unknown
fieldType 'text' specified on field content at
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
a
As you see, When nutch try to commit Solr throw an exception. You should
check your Solr schema. You can ask me why does solrdedup throw an
exception. Because IndexerJob didnt commit your document to Solr. When
try to run dedup it didnt find any document check for duplication.
Talat
La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario.
Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu
Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba.
http://www.congresouniversidad.cu/