Hi there,
we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest collections
are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,
Tomcat 7.0.56, Oracle Java 1.7.0_72-b14
10 of the 12 collections (the small ones) get
Hi all,
I need to use solr for multi-tenant application. What is the best way I
could achieve multi tenancy with solr?
One possibility is to have separate core for each tenant domain.
1. Is it recommended to do it?
2. Are there any issues with have a large number of Solr Cores?
Please
On 01/06/2015 07:54 PM, Erick Erickson wrote:
Have you considered pre-supposing SolrCloud and using the SPLITSHARD
API command?
I think that's the direction we'll probably be going. Index size (at
least for us) can be unpredictable in some cases. Some clients start out
small and then grow
One possibility is to have separate core for each tenant domain.
You could do that, and it's probably the way to go if you have a lot of
data.
However, if you don't have much data, you can achieve multi-tenancy by
adding a filter to all your queries, for instance:
query = userQuery
Hi Thomas,
I did not get these split brains (probably our use case is simpler) but we
got the spammed Zk phenomenon.
The easiest way to fix it is to:
1. Shut down all the Solr servers in the failing cluster
2. Connect to zk using its CLI
3. rmr overseer/queue
4. Restart Solr
Think is way faster
I believe export is streaming and it avoids building various caches,
so it will not blow up Solr's memory on large datasets.
You can read a lot more details in the JIRA that introduced it:
https://issues.apache.org/jira/browse/SOLR-5244
I am not sure how it compares with deep-paging though.
On 1/7/2015 2:26 PM, Joseph Obernberger wrote:
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an RD system.
Our HDFS cache, I believe, is too small at 10GBytes per shard. This
comes out to 20GBytes of HDFS cache
On 1/7/2015 3:29 PM, Nishanth S wrote:
I am working on coming up with a solr architecture layout for my use
case.We are a very write heavy application with no down time tolerance and
have low SLAs on reads when compared with writes.I am looking at around
12K tps with average index size of
Is there a problem with multi-valued fields and distributed queries?
No. But there are some components that don't do the right thing in
distributed mode, joins for instance. The list is actually quite small and
is getting smaller all the time.
Yes, joins is the main one. There used to be
Joseph Obernberger [j...@lovehorsepower.com] wrote:
[HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes]
A typical query takes about 7 seconds to run, but we also do faceting
and clustering. Those can take in the 3 - 5 minute range depends on
what was queried, but can be as little as 10
Indeed, it is all about the numbers. So, Danesh, what are your numbers -
number of tenants and number of documents per tenant. What is the expected
distribution curve of documents per tenant?
The only limit I would suggest is that you not have more than low
hundreds of cores/tenants.
Will
: However the facets I am getting for the date is till last month, say today
: is 24th December and I am getting it till 24th November. How should I
: modify my query to obtain results till today? Tried a few options using HIT
: and TRIAL :) but could not arrive at a solution.
it's not clear
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an RD system.
Our HDFS cache, I believe, is too small at 10GBytes per shard. This
comes out to 20GBytes of HDFS cache per physical machine plus about 10G
each for the
You shouldn't _have_ to keep track of this yourself since Solr 4.4,
see SOLR-4965 and the associated Lucene JIRA. Those are supposed to
make issuing a commit on an index that hasn't changed a no-op.
If you do issue commits and do open new searchers when the index has
NOT changed, it's worth a
Hi All,
I am working on coming up with a solr architecture layout for my use
case.We are a very write heavy application with no down time tolerance and
have low SLAs on reads when compared with writes.I am looking at around
12K tps with average index size of solr document in the range of
This is described as “write heavy”, so I think that is 12,000 writes/second,
not queries.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 7, 2015, at 5:16 PM, Shawn Heisey apa...@elyograg.org wrote:
On 1/7/2015 3:29 PM, Nishanth S wrote:
I am working on coming
: It's a single Solr Instance, and in my files, I used 'doc_key' everywhere,
: but I changed it to id in the email I sent out wanting to make it easier
: to read, sorry don't mean to confuse you :)
https://wiki.apache.org/solr/UsingMailingLists
- what version of solr?
- how exactly are you
Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the
moment would be in the 1000 reads/second. Guess finding out the right
number of shards would be my starting point.
Thanks,
Nishanth
On Wed, Jan 7, 2015 at 6:28 PM, Walter Underwood wun...@wunderwood.org
wrote:
This is
1,000 queries/second is not trivial either. My starting point for QPS
is about 50.
But that's entirely straw man and (and as the link Shawn provided indicates)
only testing will determine if that's realistic.
So going for 1,000 queries/second, you're talking 20 replicas for
each shard.
And
Anybody on the list have a feel for how many simultaneous queries Solr can
handle in parallel? Will it be linear WRT the number of CPU cores? Or are
their other bottlenecks or locks in Lucene or Solr such that even with more
CPU cores the Solr server will be saturated with fewer queries than the
On 1/7/2015 7:14 PM, Nishanth S wrote:
Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the
moment would be in the 1000 reads/second. Guess finding out the right
number of shards would be my starting point.
I don't think indexing 12000 docs per second would be too much
Sandy,
Export uses a very different approach then the normal select approach.
Export uses an incremental stream sorting approach that won't run out of
memory when sorting very large result sets. And Export does not use stored
fields to return results, it uses docValues caches to return results.
Not sure about AggressiveOpts, but G1 has been working for us nicely.
We've successfully used it with HBase, Hadoop, Elasticsearch, and other
custom Java apps (all still Java 7, but Java 8 should be even better). Not
sure if we are using in on our Solr instances.
e.g. see
See below:
On Wed, Jan 7, 2015 at 1:25 AM, Bram Van Dam bram.van...@intix.eu wrote:
On 01/06/2015 07:54 PM, Erick Erickson wrote:
Have you considered pre-supposing SolrCloud and using the SPLITSHARD
API command?
I think that's the direction we'll probably be going. Index size (at least
I am new to lucene-solr. I downloaded solr 4.10.3 and installed it in windows
server 2008.
I tried to start the server following README in example template DIH,
java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar
There is no error message in the command line console.
When I use a browser to
And keep in mind that starving the OS of memory to
give it to the JVM is an anti-pattern, see Uwe's
excellent blog on MMapDirectory here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Best,
Erick
On Wed, Jan 7, 2015 at 5:55 AM, Shawn Heisey apa...@elyograg.org wrote:
Kinda late to the party on this very interesting thread, but I'm
wondering if anyone has been using SolrCloud with HDFS at large scales?
We really like this capability since our data is inside of Hadoop and we
can run the Solr shards on the same nodes, and we only need to manage
one pool of
No, that’s not mandatory. That is just an example of how a request handler
could spell that out, but those parameters can be (and often are, depending on
the nature of the application) specified per request.
Erik
On Jan 7, 2015, at 1:27 PM, Vishal Swaroop vishal@gmail.com wrote:
: I am exploring faceting in SOLR in collection1 example Faceting fields are
: defined in solrconfig.xml under browse request handler which is used in
: in-built VelocityResponseWriter
context is everything -- you cut out the key line that would answer
explain your question...
Hi,
I am exploring faceting in SOLR in collection1 example Faceting fields are
defined in solrconfig.xml under browse request handler which is used in
in-built VelocityResponseWriter
requestHandler name=/browse class=solr.SearchHandler
...
str name=faceton/str
str
bq: I'm wondering if anyone has been using SolrCloud with HDFS at large scales
Absolutely, there are several companies doing this, see Lucidworks and
Cloudera for two instances.
Solr itself has the MapReduceIndexerTool for indexing to Solr's
running on HDFS FWIW.
About needing 3x the memory..
I have implemented an update processor as described above.
On single solr instance it works fine.
When I testing it on solr cloud with several nodes and trying to index few
documents , when some of them are incorrect , each instance is creating its
response, but it is not aggregated by the
I had a similar issue, which was caused by
https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC
pauses or similar before the leader mismatches occur?
Alan Woodward
www.flax.co.uk
On 7 Jan 2015, at 10:01, Thomas Lamy wrote:
Hi there,
we are running a 3 server cloud
On 1/6/2015 1:10 PM, Abhishek Sharma wrote:
*Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
keep this low, my CPU hits 100% and response time for indexing increases a
lot.. And i have hit OOM Error as well when this value is low..
Is this too high? If so, how can I
Hey Ganesh,
This was not for clustering.I do not think you would need clustering with
solr cloud.With solr cloud when you create a collection from scratch it
creates the data directories under solr home.Now if your drives are mounted
as (/d/1,/d/2 etc) you would want to use all the storage
35 matches
Mail list logo