Do you have the solr Jira number for the new ingestion tool? Thanks
On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek <whosc...@cloudera.com> wrote: > Based on our measurements, Lucene indexing is so CPU intensive that it > wouldn’t really help much to exploit data locality on read. The > overwhelming bottleneck remains the same. Having said that, we have an > ingestion tool in the works that will take advantage of data locality for > splitable files as well. > > Wolfgang. > > On Sep 24, 2014, at 9:38 AM, Tom Chen <tomchen1...@gmail.com> wrote: > > > Hi, > > > > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline > > mapper. The mapper doesn't co-locate with the input data that it process. > > Isn't this a performance hit? > > > > Ideally, morphline mapper should be run on those hosts that contain most > > data blocks for the input files it process. > > > > Regards, > > Tom > >