Re: Solr on HDFS: increase in query time with increase in data

2016-12-16 Thread Shawn Heisey
On 12/16/2016 11:58 AM, Chetas Joshi wrote: > How different the index data caching mechanism is for the Streaming > API from the cursor approach? Solr and Lucene do not handle that caching. Systems external to Solr (like the OS, or HDFS) handle the caching. The cache effectiveness will be a comb

Re: Solr on HDFS: increase in query time with increase in data

2016-12-16 Thread Chetas Joshi
Thank you everyone. I would add nodes to the SolrCloud and split the shards. Shawn, Thank you for explaining why putting index data on local file system could be a better idea than using HDFS. I need to find out how HDFS caches the index files in a resource constrained environment. I would also

Re: Solr on HDFS: increase in query time with increase in data

2016-12-16 Thread Shawn Heisey
On 12/14/2016 11:58 AM, Chetas Joshi wrote: > I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have > the following config. > maxShardsperNode: 1 > replicationFactor: 1 > > I have been ingesting data into Solr for the last 3 months. With increase > in data, I am observing increa

Re: Solr on HDFS: increase in query time with increase in data

2016-12-16 Thread Piyush Kunal
I think 70GB is too huge for a shard. How much memory does the system is having? Incase solr does not have sufficient memory to load the indexes, it will use only the amount of memory defined in your Solr Caches. Although you are on HFDS, solr performances will be really bad if it has do disk IO a

Re: Solr on HDFS: increase in query time with increase in data

2016-12-15 Thread Reth RM
I think the shard index size is huge and should be split. On Wed, Dec 14, 2016 at 10:58 AM, Chetas Joshi wrote: > Hi everyone, > > I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have > the following config. > maxShardsperNode: 1 > replicationFactor: 1 > > I have been ingest

Solr on HDFS: increase in query time with increase in data

2016-12-14 Thread Chetas Joshi
Hi everyone, I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have the following config. maxShardsperNode: 1 replicationFactor: 1 I have been ingesting data into Solr for the last 3 months. With increase in data, I am observing increase in the query time. Currently the size of