Following are the logs from hadoop.log 2018-03-02 11:18:45,220 INFO indexer.IndexingJob - IndexingJob: starting 2018-03-02 11:18:45,791 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-03-02 11:18:46,138 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: -1 2018-03-02 11:18:46,138 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2018-03-02 11:18:46,140 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2018-03-02 11:18:46,140 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2018-03-02 11:18:46,157 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.metadata.MetadataIndexer 2018-03-02 11:18:46,535 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter 2018-03-02 11:18:48,663 WARN conf.Configuration - file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2018-03-02 11:18:48,666 WARN conf.Configuration - file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2018-03-02 11:18:48,792 WARN conf.Configuration - file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2018-03-02 11:18:48,798 WARN conf.Configuration - file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2018-03-02 11:18:49,093 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.elastic.ElasticIndexWriter 2018-03-02 11:18:54,737 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: -1 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2018-03-02 11:18:54,737 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.metadata.MetadataIndexer 2018-03-02 11:18:54,738 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter 2018-03-02 11:18:56,883 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.elastic.ElasticIndexWriter 2018-03-02 11:18:56,884 INFO indexer.IndexingJob - Active IndexWriters : ElasticIndexWriter elastic.cluster : elastic prefix cluster elastic.host : hostname elastic.port : port (default 9200) elastic.index : elastic index command elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
2018-03-02 11:18:56,939 INFO indexer.IndexingJob - IndexingJob: done. On Thu, Mar 1, 2018 at 10:11 PM, Sebastian Nagel <wastl.na...@googlemail.com > wrote: > It's impossible to find the reason from console output. > Please check the hadoop.log, it should contain more logs > including those from ElasticIndexWriter. > > Sebastian > > On 03/01/2018 06:38 AM, Yash Thenuan Thenuan wrote: > > Hi Sebastian All of this is coming but the problem is,The content is not > > sent sent.Nothing is indexed to es. > > This is the output on debug level. > > > > ElasticIndexWriter > > > > elastic.cluster : elastic prefix cluster > > > > elastic.host : hostname > > > > elastic.port : port (default 9200) > > > > elastic.index : elastic index command > > > > elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) > > > > elastic.max.bulk.size : elastic bulk index length. (default 2500500 > ~2.5MB) > > > > > > no modules loaded > > > > loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin] > > > > loaded plugin [org.elasticsearch.join.ParentJoinPlugin] > > > > loaded plugin [org.elasticsearch.percolator.PercolatorPlugin] > > > > loaded plugin [org.elasticsearch.script.mustache.MustachePlugin] > > > > loaded plugin [org.elasticsearch.transport.Netty4Plugin] > > > > created thread pool: name [force_merge], size [1], queue size [unbounded] > > > > created thread pool: name [fetch_shard_started], core [1], max [8], keep > > alive [5m] > > > > created thread pool: name [listener], size [2], queue size [unbounded] > > > > created thread pool: name [index], size [4], queue size [200] > > > > created thread pool: name [refresh], core [1], max [2], keep alive [5m] > > > > created thread pool: name [generic], core [4], max [128], keep alive > [30s] > > > > created thread pool: name [warmer], core [1], max [2], keep alive [5m] > > > > thread pool [search] will adjust queue by [50] when determining automatic > > queue size > > > > created thread pool: name [search], size [7], queue size [1k] > > > > created thread pool: name [flush], core [1], max [2], keep alive [5m] > > > > created thread pool: name [fetch_shard_store], core [1], max [8], keep > > alive [5m] > > > > created thread pool: name [management], core [1], max [5], keep alive > [5m] > > > > created thread pool: name [get], size [4], queue size [1k] > > > > created thread pool: name [bulk], size [4], queue size [200] > > > > created thread pool: name [snapshot], core [1], max [2], keep alive [5m] > > > > node_sampler_interval[5s] > > > > adding address [{#transport#-1}{nNtPR9OJShWSW-ayXRDILA}{localhost}{ > > 127.0.0.1:9300}] > > > > connected to node > > [{tzfqJn0}{tzfqJn0sS5OPV4lKreU60w}{QCGd9doAQaGw4Q_lOqniLQ}{127.0.0.1}{ > > 127.0.0.1:9300}] > > > > IndexingJob: done > > > > > > On Wed, Feb 28, 2018 at 10:05 PM, Sebastian Nagel < > > wastl.na...@googlemail.com> wrote: > > > >> I never tried ES with Nutch 2.3 but it should be similar to setup as for > >> 1.x: > >> > >> - enable the plugin "indexer-elastic" in plugin.includes > >> (upgrade and rename to "indexer-elastic2" in 2.4) > >> > >> - expects ES 1.4.1 > >> > >> - available/required options are found in the log file (hadoop.log): > >> ElasticIndexWriter > >> elastic.cluster : elastic prefix cluster > >> elastic.host : hostname > >> elastic.port : port (default 9300) > >> elastic.index : elastic index command > >> elastic.max.bulk.docs : elastic bulk index doc counts. (default > >> 250) > >> elastic.max.bulk.size : elastic bulk index length. (default > >> 2500500 ~2.5MB) > >> > >> Sebastian > >> > >> On 02/28/2018 01:26 PM, Yash Thenuan Thenuan wrote: > >>> Yeah > >>> I was also thinking that > >>> Can somebody help me with nutch 2.3? > >>> > >>> On 28 Feb 2018 17:53, "Yossi Tamari" <yossi.tam...@pipl.com> wrote: > >>> > >>>> Sorry, I just realized that you're using Nutch 2.x and I'm answering > for > >>>> Nutch 1.x. I'm afraid I can't help you. > >>>> > >>>>> -----Original Message----- > >>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] > >>>>> Sent: 28 February 2018 14:20 > >>>>> To: user@nutch.apache.org > >>>>> Subject: RE: Regarding Indexing to elasticsearch > >>>>> > >>>>> IndexingJob (<batchId> | -all |-reindex) [-crawlId <id>] This is the > >>>> output of > >>>>> nutch index i have already configured the nutch-site.xml. > >>>>> > >>>>> On 28 Feb 2018 17:41, "Yossi Tamari" <yossi.tam...@pipl.com> wrote: > >>>>> > >>>>>> I suggest you run "nutch index", take a look at the returned help > >>>>>> message, and continue from there. > >>>>>> Broadly, first of all you need to configure your elasticsearch > >>>>>> environment in nutch-site.xml, and then you need to run nutch index > >>>>>> with the location of your CrawlDB and either the segment you want to > >>>>>> index or the directory that contains all the segments you want to > >>>> index. > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] > >>>>>>> Sent: 28 February 2018 14:06 > >>>>>>> To: user@nutch.apache.org > >>>>>>> Subject: RE: Regarding Indexing to elasticsearch > >>>>>>> > >>>>>>> All I want is to index my parsed data to elasticsearch. > >>>>>>> > >>>>>>> > >>>>>>> On 28 Feb 2018 17:34, "Yossi Tamari" <yossi.tam...@pipl.com> > wrote: > >>>>>>> > >>>>>>> Hi Yash, > >>>>>>> > >>>>>>> The nutch index command does not have a -all flag, so I'm not sure > >>>>>>> what > >>>>>> you're > >>>>>>> trying to achieve here. > >>>>>>> > >>>>>>> Yossi. > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] > >>>>>>>> Sent: 28 February 2018 13:55 > >>>>>>>> To: user@nutch.apache.org > >>>>>>>> Subject: Regarding Indexing to elasticsearch > >>>>>>>> > >>>>>>>> Can somebody please tell me what happens when we hit the bin/nutc > >>>>>>>> index > >>>>>>> -all > >>>>>>>> command. > >>>>>>>> Because I can't figure out why the write function inside the > >>>>>>> elastic-indexer is not > >>>>>>>> getting executed. > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > >> > > > >