Hi Mourad, I haven't understood your suggestion
Can you please explain? Erol Akarsu On Tue, Nov 13, 2012 at 10:53 AM, Mouradk <mourad...@gmail.com> wrote: > Hello karl, > > I have restarted a new one, please let me know if that helps. > > Regards, > > Mourad > On 13 Nov 2012, at 15:45, Erol Akarsu <eaka...@gmail.com> wrote: > > > Lewis, > > > > Thanks for looking at this. SOL has newest payched schema and I restarted > > tomcat. > > > > I set DEBUG for SolrIndexerJob in log4j.properties file > > > > log4j.logger.org.apache.nutch.indexer.solr.SolrIndexerJob=DEBUG,cmdstdout > > > >> Can I > >> also suggest that you experiment with the crawl script (which > >> accompanies the nutch script) instead of using the deprecated crawl > >> command. > > > > Where is this script? bin folder has only nutch script. > > > >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review > >> your hadoop.log as well. I can confirm that I was able to get Nutch > >> trunk working with a standalone Solr 4.0 multicore server with the > >> patch applied just last week. > > > > I am using nutch 2.1 not trunk. Does it make any difference on behavior > of > > nutch script? > > Can you give me main points, maybe a scripts of what is your full steps, > > on how you tested and got this working last week? > > > > > > I am getting this in hadop.log > > > > 2012-11-13 10:34:50,466 INFO solr.SolrIndexerJob - SolrIndexerJob: > starting > > 2012-11-13 10:34:50,805 INFO plugin.PluginRepository - Plugins: looking > > in: /home/eakarsu/searchProject/apache-nutch-2.1/runtime/local/plugins > > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Plugin > > Auto-activation mode: [true] > > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Registered > Plugins: > > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - the nutch > core > > extension points (nutch-extensionpoints) > > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Basic URL > > Normalizer (urlnormalizer-basic) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Basic > Indexing > > Filter (index-basic) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Html Parse > > Plug-in (parse-html) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - HTTP > Framework > > (lib-http) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Pass-through > > URL Normalizer (urlnormalizer-pass) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > > Filter (urlfilter-regex) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Http Protocol > > Plug-in (protocol-http) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > > Normalizer (urlnormalizer-regex) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Tika Parser > > Plug-in (parse-tika) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - OPIC Scoring > > Plug-in (scoring-opic) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - CyberNeko > HTML > > Parser (lib-nekohtml) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Anchor > Indexing > > Filter (index-anchor) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > > Filter Framework (lib-regex-filter) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Registered > > Extension-Points: > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL > > Normalizer (org.apache.nutch.net.URLNormalizer) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch > Protocol > > (org.apache.nutch.protocol.Protocol) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Parse Filter > > (org.apache.nutch.parse.ParseFilter) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL > > Filter (org.apache.nutch.net.URLFilter) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch > Indexing > > Filter (org.apache.nutch.indexer.IndexingFilter) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Content > > Parser (org.apache.nutch.parse.Parser) > > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Scoring > > (org.apache.nutch.scoring.ScoringFilter) > > 2012-11-13 10:34:50,872 INFO basic.BasicIndexingFilter - Maximum title > > length for indexing set to: 100 > > 2012-11-13 10:34:50,872 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.basic.BasicIndexingFilter > > 2012-11-13 10:34:50,875 INFO anchor.AnchorIndexingFilter - Anchor > > deduplication is: off > > 2012-11-13 10:34:50,875 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > > 2012-11-13 10:34:51,891 WARN util.NativeCodeLoader - Unable to load > > native-hadoop library for your platform... using builtin-java classes > where > > applicable > > 2012-11-13 10:34:52,765 INFO mapreduce.GoraRecordReader - > > gora.buffer.read.limit = 10000 > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: content > > dest: content > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: site dest: > > site > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: title > dest: > > title > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: host dest: > > host > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: segment > > dest: segment > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: boost > dest: > > boost > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: digest > dest: > > digest > > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: tstamp > dest: > > tstamp > > 2012-11-13 10:34:52,821 INFO basic.BasicIndexingFilter - Maximum title > > length for indexing set to: 100 > > 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.basic.BasicIndexingFilter > > 2012-11-13 10:34:52,821 INFO anchor.AnchorIndexingFilter - Anchor > > deduplication is: off > > 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > > 2012-11-13 10:34:55,434 WARN mapred.FileOutputCommitter - Output path is > > null in cleanup > > 2012-11-13 10:34:56,455 ERROR solr.SolrIndexerJob - SolrIndexerJob: > > org.apache.solr.common.SolrException: Not Found > > > > Not Found > > > > request: http://localhost:8080/sol40/update > > at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) > > at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > > at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) > > at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75) > > at > > > org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:60) > > at > > org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:75) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:84) > > > > > > On Tue, Nov 13, 2012 at 9:53 AM, Lewis John Mcgibbney < > > lewis.mcgibb...@gmail.com> wrote: > > > >> Hi, > >> > >> On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu <eaka...@gmail.com> wrote: > >>> Lewis, > >>> > >>> I applied the patch you told me. I replaced schema.xml of sol4 > >> installation > >>> with schme-sol4.xml. Solr 4.0 system is up and running and I can see > its > >>> web page with http://localhost:8080/sol40. > >> > >> You would need to either rename schema-solr4.xml to schema, then copy > >> this to your tomcat solr installation before starting/restarting the > >> server or alternatively copy the contents of the newly patched file to > >> the solr existing schema.xml > >> > >>> > >>> I followed tutorial blindly. Crawling went fine but it seem very slow > >>> compared to previous before patch applied > >> > >> Considering the patch only applies to the Solr indexing stage crawl > >> performance should not be affected in the slightest. Especially when > >> you are not passing the solr server URL during the crawl phase. Can I > >> also suggest that you experiment with the crawl script (which > >> accompanies the nutch script) instead of using the deprecated crawl > >> command. > >> > >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review > >> your hadoop.log as well. I can confirm that I was able to get Nutch > >> trunk working with a standalone Solr 4.0 multicore server with the > >> patch applied just last week. > >> > >> As I said, Markus has also suggested some additions to the patch so > >> maybe try catching some irregularities... trial and error. > >> > >> hth > >> > >> Lewis > >> > >