leader split-brain at least once a day - need help

2015-01-07 Thread Thomas Lamy
Hi there, we are running a 3 server cloud serving a dozen single-shard/replicate-everywhere collections. The 2 biggest collections are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat 7.0.56, Oracle Java 1.7.0_72-b14 10 of the 12 collections (the small ones) get

Solr support for multi-tenant applications

2015-01-07 Thread Danesh Kuruppu
Hi all, I need to use solr for multi-tenant application. What is the best way I could achieve multi tenancy with solr? One possibility is to have separate core for each tenant domain. 1. Is it recommended to do it? 2. Are there any issues with have a large number of Solr Cores? Please

Re: How large is your solr index?

2015-01-07 Thread Bram Van Dam
On 01/06/2015 07:54 PM, Erick Erickson wrote: Have you considered pre-supposing SolrCloud and using the SPLITSHARD API command? I think that's the direction we'll probably be going. Index size (at least for us) can be unpredictable in some cases. Some clients start out small and then grow

Re: Solr support for multi-tenant applications

2015-01-07 Thread Bram Van Dam
One possibility is to have separate core for each tenant domain. You could do that, and it's probably the way to go if you have a lot of data. However, if you don't have much data, you can achieve multi-tenancy by adding a filter to all your queries, for instance: query = userQuery

Re: leader split-brain at least once a day - need help

2015-01-07 Thread Ugo Matrangolo
Hi Thomas, I did not get these split brains (probably our use case is simpler) but we got the spammed Zk phenomenon. The easiest way to fix it is to: 1. Shut down all the Solr servers in the failing cluster 2. Connect to zk using its CLI 3. rmr overseer/queue 4. Restart Solr Think is way faster

Re: How to limit the number of result sets of the 'export' handler

2015-01-07 Thread Alexandre Rafalovitch
I believe export is streaming and it avoids building various caches, so it will not blow up Solr's memory on large datasets. You can read a lot more details in the JIRA that introduced it: https://issues.apache.org/jira/browse/SOLR-5244 I am not sure how it compares with deep-paging though.

Re: How large is your solr index?

2015-01-07 Thread Shawn Heisey
On 1/7/2015 2:26 PM, Joseph Obernberger wrote: Thank you Toke - yes - the data is indexed throughout the day. We are handling very few searches - probably 50 a day; this is an RD system. Our HDFS cache, I believe, is too small at 10GBytes per shard. This comes out to 20GBytes of HDFS cache

Re: Determining the Number of Solr Shards

2015-01-07 Thread Shawn Heisey
On 1/7/2015 3:29 PM, Nishanth S wrote: I am working on coming up with a solr architecture layout for my use case.We are a very write heavy application with no down time tolerance and have low SLAs on reads when compared with writes.I am looking at around 12K tps with average index size of

Re: How large is your solr index?

2015-01-07 Thread Peter Sturge
Is there a problem with multi-valued fields and distributed queries? No. But there are some components that don't do the right thing in distributed mode, joins for instance. The list is actually quite small and is getting smaller all the time. Yes, joins is the main one. There used to be

RE: How large is your solr index?

2015-01-07 Thread Toke Eskildsen
Joseph Obernberger [j...@lovehorsepower.com] wrote: [HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes] A typical query takes about 7 seconds to run, but we also do faceting and clustering. Those can take in the 3 - 5 minute range depends on what was queried, but can be as little as 10

Re: Solr support for multi-tenant applications

2015-01-07 Thread Jack Krupansky
Indeed, it is all about the numbers. So, Danesh, what are your numbers - number of tenants and number of documents per tenant. What is the expected distribution curve of documents per tenant? The only limit I would suggest is that you not have more than low hundreds of cores/tenants. Will

Re: Solr Date Range not returning results for last 1 month

2015-01-07 Thread Chris Hostetter
: However the facets I am getting for the date is till last month, say today : is 24th December and I am getting it till 24th November. How should I : modify my query to obtain results till today? Tried a few options using HIT : and TRIAL :) but could not arrive at a solution. it's not clear

Re: How large is your solr index?

2015-01-07 Thread Joseph Obernberger
Thank you Toke - yes - the data is indexed throughout the day. We are handling very few searches - probably 50 a day; this is an RD system. Our HDFS cache, I believe, is too small at 10GBytes per shard. This comes out to 20GBytes of HDFS cache per physical machine plus about 10G each for the

Re: How large is your solr index?

2015-01-07 Thread Erick Erickson
You shouldn't _have_ to keep track of this yourself since Solr 4.4, see SOLR-4965 and the associated Lucene JIRA. Those are supposed to make issuing a commit on an index that hasn't changed a no-op. If you do issue commits and do open new searchers when the index has NOT changed, it's worth a

Determining the Number of Solr Shards

2015-01-07 Thread Nishanth S
Hi All, I am working on coming up with a solr architecture layout for my use case.We are a very write heavy application with no down time tolerance and have low SLAs on reads when compared with writes.I am looking at around 12K tps with average index size of solr document in the range of

Re: Determining the Number of Solr Shards

2015-01-07 Thread Walter Underwood
This is described as “write heavy”, so I think that is 12,000 writes/second, not queries. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 7, 2015, at 5:16 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/7/2015 3:29 PM, Nishanth S wrote: I am working on coming

Re: UUIDUpdateProcessorFactory causes repeated documents when uploading csv files?

2015-01-07 Thread Chris Hostetter
: It's a single Solr Instance, and in my files, I used 'doc_key' everywhere, : but I changed it to id in the email I sent out wanting to make it easier : to read, sorry don't mean to confuse you :) https://wiki.apache.org/solr/UsingMailingLists - what version of solr? - how exactly are you

Re: Determining the Number of Solr Shards

2015-01-07 Thread Nishanth S
Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the moment would be in the 1000 reads/second. Guess finding out the right number of shards would be my starting point. Thanks, Nishanth On Wed, Jan 7, 2015 at 6:28 PM, Walter Underwood wun...@wunderwood.org wrote: This is

Re: Determining the Number of Solr Shards

2015-01-07 Thread Erick Erickson
1,000 queries/second is not trivial either. My starting point for QPS is about 50. But that's entirely straw man and (and as the link Shawn provided indicates) only testing will determine if that's realistic. So going for 1,000 queries/second, you're talking 20 replicas for each shard. And

Re: Determining the Number of Solr Shards

2015-01-07 Thread Jack Krupansky
Anybody on the list have a feel for how many simultaneous queries Solr can handle in parallel? Will it be linear WRT the number of CPU cores? Or are their other bottlenecks or locks in Lucene or Solr such that even with more CPU cores the Solr server will be saturated with fewer queries than the

Re: Determining the Number of Solr Shards

2015-01-07 Thread Shawn Heisey
On 1/7/2015 7:14 PM, Nishanth S wrote: Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the moment would be in the 1000 reads/second. Guess finding out the right number of shards would be my starting point. I don't think indexing 12000 docs per second would be too much

Re: How to limit the number of result sets of the 'export' handler

2015-01-07 Thread Joel Bernstein
Sandy, Export uses a very different approach then the normal select approach. Export uses an incremental stream sorting approach that won't run out of memory when sorting very large result sets. And Export does not use stored fields to return results, it uses docValues caches to return results.

Re: Garbage Collection tuning - G1 is now a good option

2015-01-07 Thread Otis Gospodnetic
Not sure about AggressiveOpts, but G1 has been working for us nicely. We've successfully used it with HBase, Hadoop, Elasticsearch, and other custom Java apps (all still Java 7, but Java 8 should be even better). Not sure if we are using in on our Solr instances. e.g. see

Re: How large is your solr index?

2015-01-07 Thread Erick Erickson
See below: On Wed, Jan 7, 2015 at 1:25 AM, Bram Van Dam bram.van...@intix.eu wrote: On 01/06/2015 07:54 PM, Erick Erickson wrote: Have you considered pre-supposing SolrCloud and using the SPLITSHARD API command? I think that's the direction we'll probably be going. Index size (at least

problem with solr server start

2015-01-07 Thread paulding
I am new to lucene-solr. I downloaded solr 4.10.3 and installed it in windows server 2008. I tried to start the server following README in example template DIH, java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar There is no error message in the command line console. When I use a browser to

Re: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-07 Thread Erick Erickson
And keep in mind that starving the OS of memory to give it to the JVM is an anti-pattern, see Uwe's excellent blog on MMapDirectory here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Wed, Jan 7, 2015 at 5:55 AM, Shawn Heisey apa...@elyograg.org wrote:

Re: How large is your solr index?

2015-01-07 Thread Joseph Obernberger
Kinda late to the party on this very interesting thread, but I'm wondering if anyone has been using SolrCloud with HDFS at large scales? We really like this capability since our data is inside of Hadoop and we can run the Solr shards on the same nodes, and we only need to manage one pool of

Re: Is defining facet fields in solrconfig.xml mandatory ?

2015-01-07 Thread Erik Hatcher
No, that’s not mandatory. That is just an example of how a request handler could spell that out, but those parameters can be (and often are, depending on the nature of the application) specified per request. Erik On Jan 7, 2015, at 1:27 PM, Vishal Swaroop vishal@gmail.com wrote:

Re: Is defining facet fields in solrconfig.xml mandatory ?

2015-01-07 Thread Chris Hostetter
: I am exploring faceting in SOLR in collection1 example Faceting fields are : defined in solrconfig.xml under browse request handler which is used in : in-built VelocityResponseWriter context is everything -- you cut out the key line that would answer explain your question...

Is defining facet fields in solrconfig.xml mandatory ?

2015-01-07 Thread Vishal Swaroop
Hi, I am exploring faceting in SOLR in collection1 example Faceting fields are defined in solrconfig.xml under browse request handler which is used in in-built VelocityResponseWriter requestHandler name=/browse class=solr.SearchHandler ... str name=faceton/str str

Re: How large is your solr index?

2015-01-07 Thread Erick Erickson
bq: I'm wondering if anyone has been using SolrCloud with HDFS at large scales Absolutely, there are several companies doing this, see Lucidworks and Cloudera for two instances. Solr itself has the MapReduceIndexerTool for indexing to Solr's running on HDFS FWIW. About needing 3x the memory..

Re: ignoring bad documents during index

2015-01-07 Thread SolrUser1543
I have implemented an update processor as described above. On single solr instance it works fine. When I testing it on solr cloud with several nodes and trying to index few documents , when some of them are incorrect , each instance is creating its response, but it is not aggregated by the

Re: leader split-brain at least once a day - need help

2015-01-07 Thread Alan Woodward
I had a similar issue, which was caused by https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC pauses or similar before the leader mismatches occur? Alan Woodward www.flax.co.uk On 7 Jan 2015, at 10:01, Thomas Lamy wrote: Hi there, we are running a 3 server cloud

Re: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-07 Thread Shawn Heisey
On 1/6/2015 1:10 PM, Abhishek Sharma wrote: *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i keep this low, my CPU hits 100% and response time for indexing increases a lot.. And i have hit OOM Error as well when this value is low.. Is this too high? If so, how can I

Re: Running Multiple Solr Instances

2015-01-07 Thread Nishanth S
Hey Ganesh, This was not for clustering.I do not think you would need clustering with solr cloud.With solr cloud when you create a collection from scratch it creates the data directories under solr home.Now if your drives are mounted as (/d/1,/d/2 etc) you would want to use all the storage