Hi Mourad,

I haven't understood your suggestion

Can you please explain?

Erol Akarsu

On Tue, Nov 13, 2012 at 10:53 AM, Mouradk <mourad...@gmail.com> wrote:

> Hello karl,
>
> I have restarted a new one, please let me know if that helps.
>
> Regards,
>
> Mourad
> On 13 Nov 2012, at 15:45, Erol Akarsu <eaka...@gmail.com> wrote:
>
> > Lewis,
> >
> > Thanks for looking at this. SOL has newest payched schema and I restarted
> > tomcat.
> >
> > I set DEBUG for SolrIndexerJob in log4j.properties file
> >
> > log4j.logger.org.apache.nutch.indexer.solr.SolrIndexerJob=DEBUG,cmdstdout
> >
> >> Can I
> >> also suggest that you experiment with the crawl script (which
> >> accompanies the nutch script) instead of using the deprecated crawl
> >> command.
> >
> > Where is this script? bin folder has only nutch script.
> >
> >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review
> >> your hadoop.log as well. I can confirm that I was able to get Nutch
> >> trunk working with a standalone Solr 4.0 multicore server with the
> >> patch applied just last week.
> >
> > I am using nutch 2.1 not trunk. Does it make any difference on behavior
> of
> > nutch script?
> > Can you give me main points, maybe a scripts of what is your full steps,
> > on how you tested and got this working last week?
> >
> >
> > I am getting this in hadop.log
> >
> > 2012-11-13 10:34:50,466 INFO  solr.SolrIndexerJob - SolrIndexerJob:
> starting
> > 2012-11-13 10:34:50,805 INFO  plugin.PluginRepository - Plugins: looking
> > in: /home/eakarsu/searchProject/apache-nutch-2.1/runtime/local/plugins
> > 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository - Plugin
> > Auto-activation mode: [true]
> > 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository - Registered
> Plugins:
> > 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository -     the nutch
> core
> > extension points (nutch-extensionpoints)
> > 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository -     Basic URL
> > Normalizer (urlnormalizer-basic)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Basic
> Indexing
> > Filter (index-basic)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Html Parse
> > Plug-in (parse-html)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     HTTP
> Framework
> > (lib-http)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Pass-through
> > URL Normalizer (urlnormalizer-pass)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> > Filter (urlfilter-regex)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Http Protocol
> > Plug-in (protocol-http)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> > Normalizer (urlnormalizer-regex)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Tika Parser
> > Plug-in (parse-tika)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     OPIC Scoring
> > Plug-in (scoring-opic)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     CyberNeko
> HTML
> > Parser (lib-nekohtml)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Anchor
> Indexing
> > Filter (index-anchor)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> > Filter Framework (lib-regex-filter)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository - Registered
> > Extension-Points:
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch URL
> > Normalizer (org.apache.nutch.net.URLNormalizer)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch
> Protocol
> > (org.apache.nutch.protocol.Protocol)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Parse Filter
> > (org.apache.nutch.parse.ParseFilter)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch URL
> > Filter (org.apache.nutch.net.URLFilter)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch
> Indexing
> > Filter (org.apache.nutch.indexer.IndexingFilter)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Content
> > Parser (org.apache.nutch.parse.Parser)
> > 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Scoring
> > (org.apache.nutch.scoring.ScoringFilter)
> > 2012-11-13 10:34:50,872 INFO  basic.BasicIndexingFilter - Maximum title
> > length for indexing set to: 100
> > 2012-11-13 10:34:50,872 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > 2012-11-13 10:34:50,875 INFO  anchor.AnchorIndexingFilter - Anchor
> > deduplication is: off
> > 2012-11-13 10:34:50,875 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > 2012-11-13 10:34:51,891 WARN  util.NativeCodeLoader - Unable to load
> > native-hadoop library for your platform... using builtin-java classes
> where
> > applicable
> > 2012-11-13 10:34:52,765 INFO  mapreduce.GoraRecordReader -
> > gora.buffer.read.limit = 10000
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: content
> > dest: content
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: site dest:
> > site
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: title
> dest:
> > title
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: host dest:
> > host
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: segment
> > dest: segment
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: boost
> dest:
> > boost
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: digest
> dest:
> > digest
> > 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: tstamp
> dest:
> > tstamp
> > 2012-11-13 10:34:52,821 INFO  basic.BasicIndexingFilter - Maximum title
> > length for indexing set to: 100
> > 2012-11-13 10:34:52,821 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > 2012-11-13 10:34:52,821 INFO  anchor.AnchorIndexingFilter - Anchor
> > deduplication is: off
> > 2012-11-13 10:34:52,821 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > 2012-11-13 10:34:55,434 WARN  mapred.FileOutputCommitter - Output path is
> > null in cleanup
> > 2012-11-13 10:34:56,455 ERROR solr.SolrIndexerJob - SolrIndexerJob:
> > org.apache.solr.common.SolrException: Not Found
> >
> > Not Found
> >
> > request: http://localhost:8080/sol40/update
> >    at
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> >    at
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >    at
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >    at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
> >    at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75)
> >    at
> >
> org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:60)
> >    at
> > org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:75)
> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >    at
> > org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:84)
> >
> >
> > On Tue, Nov 13, 2012 at 9:53 AM, Lewis John Mcgibbney <
> > lewis.mcgibb...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu <eaka...@gmail.com> wrote:
> >>> Lewis,
> >>>
> >>> I applied the patch you told me. I replaced schema.xml of sol4
> >> installation
> >>> with schme-sol4.xml. Solr 4.0 system is up and running and I can see
> its
> >>> web page with http://localhost:8080/sol40.
> >>
> >> You would need to either rename schema-solr4.xml to schema, then copy
> >> this to your tomcat solr installation before starting/restarting the
> >> server or alternatively copy the contents of the newly patched file to
> >> the solr existing schema.xml
> >>
> >>>
> >>> I followed tutorial blindly. Crawling went fine but it seem very slow
> >>> compared to previous before patch applied
> >>
> >> Considering the patch only applies to the Solr indexing stage crawl
> >> performance should not be affected in the slightest. Especially when
> >> you are not passing the solr server URL during the crawl phase. Can I
> >> also suggest that you experiment with the crawl script (which
> >> accompanies the nutch script) instead of using the deprecated crawl
> >> command.
> >>
> >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review
> >> your hadoop.log as well. I can confirm that I was able to get Nutch
> >> trunk working with a standalone Solr 4.0 multicore server with the
> >> patch applied just last week.
> >>
> >> As I said, Markus has also suggested some additions to the patch so
> >> maybe try catching some irregularities... trial and error.
> >>
> >> hth
> >>
> >> Lewis
> >>
>
>

Reply via email to