Re: MRIT's morphline mapper doesn't co-locate with data

Tom Chen Thu, 25 Sep 2014 06:53:42 -0700

Do you have the solr Jira number for the new ingestion tool?

Thanks


On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek <whosc...@cloudera.com>
wrote:

> Based on our measurements, Lucene indexing is so CPU intensive that it
> wouldn’t really help much to exploit data locality on read. The
> overwhelming bottleneck remains the same. Having said that, we have an
> ingestion tool in the works that will take advantage of data locality for
> splitable files as well.
>
> Wolfgang.
>
> On Sep 24, 2014, at 9:38 AM, Tom Chen <tomchen1...@gmail.com> wrote:
>
> > Hi,
> >
> > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> > mapper. The mapper doesn't co-locate with the input data that it process.
> > Isn't this a performance hit?
> >
> > Ideally, morphline mapper should be run on those hosts that contain most
> > data blocks for the input files it process.
> >
> > Regards,
> > Tom
>
>

Re: MRIT's morphline mapper doesn't co-locate with data

Reply via email to