On 1/7/2015 7:14 PM, Nishanth S wrote:
> Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the
> moment would be in the 1000 reads/second. Guess finding out the right
> number of shards would be my starting point.
I don't think indexing 12000 docs per second would be too much
Anybody on the list have a feel for how many simultaneous queries Solr can
handle in parallel? Will it be linear WRT the number of CPU cores? Or are
their other bottlenecks or locks in Lucene or Solr such that even with more
CPU cores the Solr server will be saturated with fewer queries than the
nu
1,000 queries/second is not trivial either. My starting point for QPS
is about 50.
But that's entirely "straw man" and (and as the link Shawn provided indicates)
only testing will determine if that's realistic.
So going for 1,000 queries/second, you're talking 20 replicas for
each shard.
And
Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the
moment would be in the 1000 reads/second. Guess finding out the right
number of shards would be my starting point.
Thanks,
Nishanth
On Wed, Jan 7, 2015 at 6:28 PM, Walter Underwood
wrote:
> This is described as “write
: It's a single Solr Instance, and in my files, I used 'doc_key' everywhere,
: but I changed it to "id" in the email I sent out wanting to make it easier
: to read, sorry don't mean to confuse you :)
https://wiki.apache.org/solr/UsingMailingLists
- what version of solr?
- how exactly are you doi
This is described as “write heavy”, so I think that is 12,000 writes/second,
not queries.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 7, 2015, at 5:16 PM, Shawn Heisey wrote:
> On 1/7/2015 3:29 PM, Nishanth S wrote:
>> I am working on coming up with a solr a
On 1/7/2015 3:29 PM, Nishanth S wrote:
> I am working on coming up with a solr architecture layout for my use
> case.We are a very write heavy application with no down time tolerance and
> have low SLAs on reads when compared with writes.I am looking at around
> 12K tps with average index size
On 1/7/2015 2:26 PM, Joseph Obernberger wrote:
> Thank you Toke - yes - the data is indexed throughout the day. We are
> handling very few searches - probably 50 a day; this is an R&D system.
> Our HDFS cache, I believe, is too small at 10GBytes per shard. This
> comes out to 20GBytes of HDFS cac
Hi All,
I am working on coming up with a solr architecture layout for my use
case.We are a very write heavy application with no down time tolerance and
have low SLAs on reads when compared with writes.I am looking at around
12K tps with average index size of solr document in the range of 6kB.I
Indeed, it is all about the numbers. So, Danesh, what are your numbers -
number of tenants and number of documents per tenant. What is the expected
distribution curve of documents per tenant?
The only "limit" I would suggest is that you not have more than "low
hundreds" of cores/tenants.
Will ten
You shouldn't _have_ to keep track of this yourself since Solr 4.4,
see SOLR-4965 and the associated Lucene JIRA. Those are supposed to
make issuing a commit on an index that hasn't changed a no-op.
If you do issue commits and do open new searchers when the index has
NOT changed, it's worth a JIRA
: However the facets I am getting for the date is till last month, say today
: is 24th December and I am getting it till 24th November. How should I
: modify my query to obtain results till today? Tried a few options using HIT
: and TRIAL :) but could not arrive at a solution.
it's not clear what
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an R&D system.
Our HDFS cache, I believe, is too small at 10GBytes per shard. This
comes out to 20GBytes of HDFS cache per physical machine plus about 10G
each for the
> Is there a problem with multi-valued fields and distributed queries?
> No. But there are some components that don't do the right thing in
> distributed mode, joins for instance. The list is actually quite small and
> is getting smaller all the time.
Yes, joins is the main one. There used to be
Joseph Obernberger [j...@lovehorsepower.com] wrote:
[HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes]
> A typical query takes about 7 seconds to run, but we also do faceting
> and clustering. Those can take in the 3 - 5 minute range depends on
> what was queried, but can be as little as 10
I have implemented an update processor as described above.
On single solr instance it works fine.
When I testing it on solr cloud with several nodes and trying to index few
documents , when some of them are incorrect , each instance is creating its
response, but it is not aggregated by the ins
: I am exploring faceting in SOLR in collection1 example Faceting fields are
: defined in solrconfig.xml under browse request handler which is used in
: in-built "VelocityResponseWriter"
context is everything -- you cut out the key line that would answer &
explain your question...
No, that’s not mandatory. That is just an example of how a request handler
could spell that out, but those parameters can be (and often are, depending on
the nature of the application) specified per request.
Erik
> On Jan 7, 2015, at 1:27 PM, Vishal Swaroop wrote:
>
> Hi,
>
> I am e
Hi,
I am exploring faceting in SOLR in collection1 example Faceting fields are
defined in solrconfig.xml under browse request handler which is used in
in-built "VelocityResponseWriter"
...
on
cat
I think it is not at all mandatory to define facet fields in
solrconfig.xml, right ?
bq: I'm wondering if anyone has been using SolrCloud with HDFS at large scales
Absolutely, there are several companies doing this, see Lucidworks and
Cloudera for two instances.
Solr itself has the MapReduceIndexerTool for indexing to Solr's
running on HDFS FWIW.
About needing 3x the memory.. si
Kinda late to the party on this very interesting thread, but I'm
wondering if anyone has been using SolrCloud with HDFS at large scales?
We really like this capability since our data is inside of Hadoop and we
can run the Solr shards on the same nodes, and we only need to manage
one pool of st
Hey Ganesh,
This was not for clustering.I do not think you would need clustering with
solr cloud.With solr cloud when you create a collection from scratch it
creates the data directories under solr home.Now if your drives are mounted
as (/d/1,/d/2 etc) you would want to use all the storage ava
Sandy,
Export uses a very different approach then the normal select approach.
Export uses an incremental stream sorting approach that won't run out of
memory when sorting very large result sets. And Export does not use stored
fields to return results, it uses docValues caches to return results.
T
And keep in mind that starving the OS of memory to
give it to the JVM is an anti-pattern, see Uwe's
excellent blog on MMapDirectory here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Best,
Erick
On Wed, Jan 7, 2015 at 5:55 AM, Shawn Heisey wrote:
> On 1/6/2015 1:10 PM
See below:
On Wed, Jan 7, 2015 at 1:25 AM, Bram Van Dam wrote:
> On 01/06/2015 07:54 PM, Erick Erickson wrote:
>>
>> Have you considered pre-supposing SolrCloud and using the SPLITSHARD
>> API command?
>
>
> I think that's the direction we'll probably be going. Index size (at least
> for us) can
I am new to lucene-solr. I downloaded solr 4.10.3 and installed it in windows
server 2008.
I tried to start the server following README in example template DIH,
java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar
There is no error message in the command line console.
When I use a browser to
Not sure about AggressiveOpts, but G1 has been working for us nicely.
We've successfully used it with HBase, Hadoop, Elasticsearch, and other
custom Java apps (all still Java 7, but Java 8 should be even better). Not
sure if we are using in on our Solr instances.
e.g. see http://blog.sematext.com
I believe export is streaming and it avoids building various caches,
so it will not blow up Solr's memory on large datasets.
You can read a lot more details in the JIRA that introduced it:
https://issues.apache.org/jira/browse/SOLR-5244
I am not sure how it compares with deep-paging though.
Rega
I had a similar issue, which was caused by
https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC
pauses or similar before the leader mismatches occur?
Alan Woodward
www.flax.co.uk
On 7 Jan 2015, at 10:01, Thomas Lamy wrote:
> Hi there,
>
> we are running a 3 server cloud
On 1/6/2015 1:10 PM, Abhishek Sharma wrote:
> *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
> keep this low, my CPU hits 100% and response time for indexing increases a
> lot.. And i have hit OOM Error as well when this value is low..
>
> Is this too high? If so, how can
Hi Thomas,
I did not get these split brains (probably our use case is simpler) but we
got the spammed Zk phenomenon.
The easiest way to fix it is to:
1. Shut down all the Solr servers in the failing cluster
2. Connect to zk using its CLI
3. rmr overseer/queue
4. Restart Solr
Think is way faster
Hi there,
we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest collections
are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,
Tomcat 7.0.56, Oracle Java 1.7.0_72-b14
10 of the 12 collections (the small ones) get fil
One possibility is to have separate core for each tenant domain.
You could do that, and it's probably the way to go if you have a lot of
data.
However, if you don't have much data, you can achieve multi-tenancy by
adding a filter to all your queries, for instance:
query = userQuery
filterQ
On 01/06/2015 07:54 PM, Erick Erickson wrote:
Have you considered pre-supposing SolrCloud and using the SPLITSHARD
API command?
I think that's the direction we'll probably be going. Index size (at
least for us) can be unpredictable in some cases. Some clients start out
small and then grow exp
Hi all,
I need to use solr for multi-tenant application. What is the best way I
could achieve multi tenancy with solr?
One possibility is to have separate core for each tenant domain.
1. Is it recommended to do it?
2. Are there any issues with have a large number of Solr Cores?
Please sug
35 matches
Mail list logo