Re: Regarding Indexing to elasticsearch

Yash Thenuan Thenuan Thu, 01 Mar 2018 21:50:56 -0800

Following are the logs from hadoop.log

2018-03-02 11:18:45,220 INFO  indexer.IndexingJob - IndexingJob: starting
2018-03-02 11:18:45,791 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2018-03-02 11:18:46,138 INFO  basic.BasicIndexingFilter - Maximum title
length for indexing set to: -1
2018-03-02 11:18:46,138 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2018-03-02 11:18:46,140 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2018-03-02 11:18:46,140 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2018-03-02 11:18:46,157 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.metadata.MetadataIndexer
2018-03-02 11:18:46,535 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.more.MoreIndexingFilter
2018-03-02 11:18:48,663 WARN  conf.Configuration -
file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2018-03-02 11:18:48,666 WARN  conf.Configuration -
file:/tmp/hadoop-yasht/mapred/staging/yasht1100834069/.staging/job_local1100834069_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2018-03-02 11:18:48,792 WARN  conf.Configuration -
file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2018-03-02 11:18:48,798 WARN  conf.Configuration -
file:/tmp/hadoop-yasht/mapred/local/localRunner/yasht/job_local1100834069_0001/job_local1100834069_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2018-03-02 11:18:49,093 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
2018-03-02 11:18:54,737 INFO  basic.BasicIndexingFilter - Maximum title
length for indexing set to: -1
2018-03-02 11:18:54,737 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2018-03-02 11:18:54,737 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2018-03-02 11:18:54,737 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2018-03-02 11:18:54,737 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.metadata.MetadataIndexer
2018-03-02 11:18:54,738 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.more.MoreIndexingFilter
2018-03-02 11:18:56,883 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
2018-03-02 11:18:56,884 INFO  indexer.IndexingJob - Active IndexWriters :
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port  (default 9200)
elastic.index : elastic index command
elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)



2018-03-02 11:18:56,939 INFO  indexer.IndexingJob - IndexingJob: done.


On Thu, Mar 1, 2018 at 10:11 PM, Sebastian Nagel <wastl.na...@googlemail.com
> wrote:

> It's impossible to find the reason from console output.
> Please check the hadoop.log, it should contain more logs
> including those from ElasticIndexWriter.
>
> Sebastian
>
> On 03/01/2018 06:38 AM, Yash Thenuan Thenuan wrote:
> > Hi Sebastian All of this is coming but the problem is,The content is not
> > sent sent.Nothing is indexed to es.
> > This is the output on debug level.
> >
> > ElasticIndexWriter
> >
> > elastic.cluster : elastic prefix cluster
> >
> > elastic.host : hostname
> >
> > elastic.port : port  (default 9200)
> >
> > elastic.index : elastic index command
> >
> > elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
> >
> > elastic.max.bulk.size : elastic bulk index length. (default 2500500
> ~2.5MB)
> >
> >
> > no modules loaded
> >
> > loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
> >
> > loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
> >
> > loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
> >
> > loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
> >
> > loaded plugin [org.elasticsearch.transport.Netty4Plugin]
> >
> > created thread pool: name [force_merge], size [1], queue size [unbounded]
> >
> > created thread pool: name [fetch_shard_started], core [1], max [8], keep
> > alive [5m]
> >
> > created thread pool: name [listener], size [2], queue size [unbounded]
> >
> > created thread pool: name [index], size [4], queue size [200]
> >
> > created thread pool: name [refresh], core [1], max [2], keep alive [5m]
> >
> > created thread pool: name [generic], core [4], max [128], keep alive
> [30s]
> >
> > created thread pool: name [warmer], core [1], max [2], keep alive [5m]
> >
> > thread pool [search] will adjust queue by [50] when determining automatic
> > queue size
> >
> > created thread pool: name [search], size [7], queue size [1k]
> >
> > created thread pool: name [flush], core [1], max [2], keep alive [5m]
> >
> > created thread pool: name [fetch_shard_store], core [1], max [8], keep
> > alive [5m]
> >
> > created thread pool: name [management], core [1], max [5], keep alive
> [5m]
> >
> > created thread pool: name [get], size [4], queue size [1k]
> >
> > created thread pool: name [bulk], size [4], queue size [200]
> >
> > created thread pool: name [snapshot], core [1], max [2], keep alive [5m]
> >
> > node_sampler_interval[5s]
> >
> > adding address [{#transport#-1}{nNtPR9OJShWSW-ayXRDILA}{localhost}{
> > 127.0.0.1:9300}]
> >
> > connected to node
> > [{tzfqJn0}{tzfqJn0sS5OPV4lKreU60w}{QCGd9doAQaGw4Q_lOqniLQ}{127.0.0.1}{
> > 127.0.0.1:9300}]
> >
> > IndexingJob: done
> >
> >
> > On Wed, Feb 28, 2018 at 10:05 PM, Sebastian Nagel <
> > wastl.na...@googlemail.com> wrote:
> >
> >> I never tried ES with Nutch 2.3 but it should be similar to setup as for
> >> 1.x:
> >>
> >> - enable the plugin "indexer-elastic" in plugin.includes
> >>   (upgrade and rename to "indexer-elastic2" in 2.4)
> >>
> >> - expects ES 1.4.1
> >>
> >> - available/required options are found in the log file (hadoop.log):
> >>    ElasticIndexWriter
> >>         elastic.cluster : elastic prefix cluster
> >>         elastic.host : hostname
> >>         elastic.port : port  (default 9300)
> >>         elastic.index : elastic index command
> >>         elastic.max.bulk.docs : elastic bulk index doc counts. (default
> >> 250)
> >>         elastic.max.bulk.size : elastic bulk index length. (default
> >> 2500500 ~2.5MB)
> >>
> >> Sebastian
> >>
> >> On 02/28/2018 01:26 PM, Yash Thenuan Thenuan wrote:
> >>> Yeah
> >>> I was also thinking that
> >>> Can somebody help me with nutch 2.3?
> >>>
> >>> On 28 Feb 2018 17:53, "Yossi Tamari" <yossi.tam...@pipl.com> wrote:
> >>>
> >>>> Sorry, I just realized that you're using Nutch 2.x and I'm answering
> for
> >>>> Nutch 1.x. I'm afraid I can't help you.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in]
> >>>>> Sent: 28 February 2018 14:20
> >>>>> To: user@nutch.apache.org
> >>>>> Subject: RE: Regarding Indexing to elasticsearch
> >>>>>
> >>>>> IndexingJob (<batchId> | -all |-reindex) [-crawlId <id>] This is the
> >>>> output of
> >>>>> nutch index i have already configured the nutch-site.xml.
> >>>>>
> >>>>> On 28 Feb 2018 17:41, "Yossi Tamari" <yossi.tam...@pipl.com> wrote:
> >>>>>
> >>>>>> I suggest you run "nutch index", take a look at the returned help
> >>>>>> message, and continue from there.
> >>>>>> Broadly, first of all you need to configure your elasticsearch
> >>>>>> environment in nutch-site.xml, and then you need to run nutch index
> >>>>>> with the location of your CrawlDB and either the segment you want to
> >>>>>> index or the directory that contains all the segments you want to
> >>>> index.
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in]
> >>>>>>> Sent: 28 February 2018 14:06
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: RE: Regarding Indexing to elasticsearch
> >>>>>>>
> >>>>>>> All I want  is to index my parsed data to elasticsearch.
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28 Feb 2018 17:34, "Yossi Tamari" <yossi.tam...@pipl.com>
> wrote:
> >>>>>>>
> >>>>>>> Hi Yash,
> >>>>>>>
> >>>>>>> The nutch index command does not have a -all flag, so I'm not sure
> >>>>>>> what
> >>>>>> you're
> >>>>>>> trying to achieve here.
> >>>>>>>
> >>>>>>>         Yossi.
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Yash Thenuan Thenuan [mailto:rit2014...@iiita.ac.in]
> >>>>>>>> Sent: 28 February 2018 13:55
> >>>>>>>> To: user@nutch.apache.org
> >>>>>>>> Subject: Regarding Indexing to elasticsearch
> >>>>>>>>
> >>>>>>>> Can somebody please tell me what happens when we hit the bin/nutc
> >>>>>>>> index
> >>>>>>> -all
> >>>>>>>> command.
> >>>>>>>> Because I can't figure out why the write function inside the
> >>>>>>> elastic-indexer is not
> >>>>>>>> getting executed.
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Regarding Indexing to elasticsearch

Reply via email to