Hi, looks more like that there is nothing to index.
Unfortunately, in 2.x there are no log messages on by default which indicate how many documents are sent to the index back-ends. The easiest way is to enable Job counters in conf/log4j.properties by adding the line: log4j.logger.org.apache.hadoop.mapreduce.Job=INFO or setting the level to INFO for log4j.logger.org.apache.hadoop=WARN Make sure the log4j.properties is correctly deployed (in doubt, run "ant runtime"). Then check the hadoop.log again: there should be a counter DocumentCount with non-zero value. Best, Sebastian On 03/02/2018 06:50 AM, Yash Thenuan Thenuan wrote: > Following are the logs from hadoop.log > > 2018-03-02 11:18:45,220 INFO indexer.IndexingJob - IndexingJob: starting > 2018-03-02 11:18:45,791 WARN util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2018-03-02 11:18:46,138 INFO basic.BasicIndexingFilter - Maximum title > length for indexing set to: -1 > 2018-03-02 11:18:46,138 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2018-03-02 11:18:46,140 INFO anchor.AnchorIndexingFilter - Anchor > deduplication is: off > 2018-03-02 11:18:46,140 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > 2018-03-02 11:18:46,157 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.metadata.MetadataIndexer > 2018-03-02 11:18:46,535 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.more.MoreIndexingFilter > 2018-03-02 11:18:48,663 WARN conf.Configuration - > file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2018-03-02 11:18:48,666 WARN conf.Configuration - > file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2018-03-02 11:18:48,792 WARN conf.Configuration - > file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2018-03-02 11:18:48,798 WARN conf.Configuration - > file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2018-03-02 11:18:49,093 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter > 2018-03-02 11:18:54,737 INFO basic.BasicIndexingFilter - Maximum title > length for indexing set to: -1 > 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2018-03-02 11:18:54,737 INFO anchor.AnchorIndexingFilter - Anchor > deduplication is: off > 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > 2018-03-02 11:18:54,737 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.metadata.MetadataIndexer > 2018-03-02 11:18:54,738 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.more.MoreIndexingFilter > 2018-03-02 11:18:56,883 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter > 2018-03-02 11:18:56,884 INFO indexer.IndexingJob - Active IndexWriters : > ElasticIndexWriter > elastic.cluster : elastic prefix cluster > elastic.host : hostname > elastic.port : port (default 9200) > elastic.index : elastic index command > elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) > elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB) > > > 2018-03-02 11:18:56,939 INFO indexer.IndexingJob - IndexingJob: done. > > > On Thu, Mar 1, 2018 at 10:11 PM, Sebastian Nagel <wastl.na...@googlemail.com >> wrote: > >> It's impossible to find the reason from console output. >> Please check the hadoop.log, it should contain more logs >> including those from ElasticIndexWriter. >> >> Sebastian >> >> On 03/01/2018 06:38 AM, Yash Thenuan Thenuan wrote: >>> Hi Sebastian All of this is coming but the problem is,The content is not >>> sent sent.Nothing is indexed to es. >>> This is the output on debug level. >>> >>> ElasticIndexWriter >>> >>> elastic.cluster : elastic prefix cluster >>> >>> elastic.host : hostname >>> >>> elastic.port : port (default 9200) >>> >>> elastic.index : elastic index command >>> >>> elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) >>> >>> elastic.max.bulk.size : elastic bulk index length. (default 2500500 >> ~2.5MB) >>> >>> >>> no modules loaded >>> >>> loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin] >>> >>> loaded plugin [org.elasticsearch.join.ParentJoinPlugin] >>> >>> loaded plugin [org.elasticsearch.percolator.PercolatorPlugin] >>> >>> loaded plugin [org.elasticsearch.script.mustache.MustachePlugin] >>> >>> loaded plugin [org.elasticsearch.transport.Netty4Plugin] >>> >>> created thread pool: name [force_merge], size [1], queue size [unbounded] >>> >>> created thread pool: name [fetch_shard_started], core [1], max [8], keep >>> alive [5m] >>> >>> created thread pool: name [listener], size [2], queue size [unbounded] >>> >>> created thread pool: name [index], size [4], queue size [200] >>> >>> created thread pool: name [refresh], core [1], max [2], keep alive [5m] >>> >>> created thread pool: name [generic], core [4], max [128], keep alive >> [30s] >>> >>> created thread pool: name [warmer], core [1], max [2], keep alive [5m] >>> >>> thread pool [search] will adjust queue by [50] when determining automatic >>> queue size >>> >>> created thread pool: name [search], size [7], queue size [1k] >>> >>> created thread pool: name [flush], core [1], max [2], keep alive [5m] >>> >>> created thread pool: name [fetch_shard_store], core [1], max [8], keep >>> alive [5m] >>> >>> created thread pool: name [management], core [1], max [5], keep alive >> [5m] >>> >>> created thread pool: name [get], size [4], queue size [1k] >>> >>> created thread pool: name [bulk], size [4], queue size [200] >>> >>> created thread pool: name [snapshot], core [1], max [2], keep alive [5m] >>> >>> node_sampler_interval[5s] >>> >>> adding address [{#transport#-1}{nNtPR9OJShWSW-ayXRDILA}{localhost}{ >>> 127.0.0.1:9300}] >>> >>> connected to node >>> [{tzfqJn0}{tzfqJn0sS5OPV4lKreU60w}{QCGd9doAQaGw4Q_lOqniLQ}{127.0.0.1}{ >>> 127.0.0.1:9300}] >>> >>> IndexingJob: done >>> >>> >>> On Wed, Feb 28, 2018 at 10:05 PM, Sebastian Nagel < >>> wastl.na...@googlemail.com> wrote: >>> >>>> I never tried ES with Nutch 2.3 but it should be similar to setup as for >>>> 1.x: >>>> >>>> - enable the plugin "indexer-elastic" in plugin.includes >>>> (upgrade and rename to "indexer-elastic2" in 2.4) >>>> >>>> - expects ES 1.4.1 >>>> >>>> - available/required options are found in the log file (hadoop.log): >>>> ElasticIndexWriter >>>> elastic.cluster : elastic prefix cluster >>>> elastic.host : hostname >>>> elastic.port : port (default 9300) >>>> elastic.index : elastic index command >>>> elastic.max.bulk.docs : elastic bulk index doc counts. (default >>>> 250) >>>> elastic.max.bulk.size : elastic bulk index length. (default >>>> 2500500 ~2.5MB) >>>> >>>> Sebastian >>>> >>>> On 02/28/2018 01:26 PM, Yash Thenuan Thenuan wrote: >>>>> Yeah >>>>> I was also thinking that >>>>> Can somebody help me with nutch 2.3? >>>>> >>>>> On 28 Feb 2018 17:53, "Yossi Tamari" <yossi.tam...@pipl.com> wrote: >>>>> >>>>>> Sorry, I just realized that you're using Nutch 2.x and I'm answering >> for >>>>>> Nutch 1.x. I'm afraid I can't help you. >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] >>>>>>> Sent: 28 February 2018 14:20 >>>>>>> To: user@nutch.apache.org >>>>>>> Subject: RE: Regarding Indexing to elasticsearch >>>>>>> >>>>>>> IndexingJob (<batchId> | -all |-reindex) [-crawlId <id>] This is the >>>>>> output of >>>>>>> nutch index i have already configured the nutch-site.xml. >>>>>>> >>>>>>> On 28 Feb 2018 17:41, "Yossi Tamari" <yossi.tam...@pipl.com> wrote: >>>>>>> >>>>>>>> I suggest you run "nutch index", take a look at the returned help >>>>>>>> message, and continue from there. >>>>>>>> Broadly, first of all you need to configure your elasticsearch >>>>>>>> environment in nutch-site.xml, and then you need to run nutch index >>>>>>>> with the location of your CrawlDB and either the segment you want to >>>>>>>> index or the directory that contains all the segments you want to >>>>>> index. >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] >>>>>>>>> Sent: 28 February 2018 14:06 >>>>>>>>> To: user@nutch.apache.org >>>>>>>>> Subject: RE: Regarding Indexing to elasticsearch >>>>>>>>> >>>>>>>>> All I want is to index my parsed data to elasticsearch. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 28 Feb 2018 17:34, "Yossi Tamari" <yossi.tam...@pipl.com> >> wrote: >>>>>>>>> >>>>>>>>> Hi Yash, >>>>>>>>> >>>>>>>>> The nutch index command does not have a -all flag, so I'm not sure >>>>>>>>> what >>>>>>>> you're >>>>>>>>> trying to achieve here. >>>>>>>>> >>>>>>>>> Yossi. >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in] >>>>>>>>>> Sent: 28 February 2018 13:55 >>>>>>>>>> To: user@nutch.apache.org >>>>>>>>>> Subject: Regarding Indexing to elasticsearch >>>>>>>>>> >>>>>>>>>> Can somebody please tell me what happens when we hit the bin/nutc >>>>>>>>>> index >>>>>>>>> -all >>>>>>>>>> command. >>>>>>>>>> Because I can't figure out why the write function inside the >>>>>>>>> elastic-indexer is not >>>>>>>>>> getting executed. >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >