Hi,
I worked with other search solutions before, and cache management is
important in boosting performance. Apart from the cache generated due to
user's requests, loading the search index into memory is the very initial
step after the index is built. This is to ensure search results to be
bq: Does Solr automatically loads search index into memory after the index is
built?
No. That's what the autowarm counts on on your queryResultCache
and filterCache are intended to facilitate. Also after every commit,
a newSearcher event is fired and any warmup queries you have configured
in the
Hi,
I am currently investigating the queries with a much small index size (1M)
to see the grouping, faceting on the performance degradation. This will
allow me to do a lot of tests in a short period of time.
However, it looks like the query is executed much faster the second time.
This is tested
I'd set filterCache and queryResultCache to zero (size and autowarm count)
Leave documentCache alone IMO as it's used to store documents on disk
as the pass through various query components and doesn't autowarm anyway.
I'd think taking it out would skew your results because of multiple
Test_results_round_2.doc
http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html
Sent from the Solr - User mailing list archive
Hi All,
I did many tests with very consistent test results. Each query was executed
after re-indexing, and only one request was sent to query the index. I
disabled filterCache and queryResultCache for this test based on Erick's
recommendation.
The test document was posted to this email list
bq: The index size is only 1 M records. A 10 times of the record size ( 10M)
will likely bring the total response time to 1 second
This is an extrapolation you simply cannot make. Plus you cannot really tell
anything from just a few queries about system performance. In fact you must
disregard
Hmmm, indeed it does. Never mind ;)
I guess the thing I'd be looking at is garbage
collection, here's a very good writeup:
http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/
Kind of a shot in the dark, but it's possible.
Good luck!
Erick
On Thu, Jun 25, 2015 at 3:26 PM, Wenbin Wang
bq: Try not to store fields as much as possible.
Why? Storing fields certainly adds lots of size to the _disk_ files, but have
much less effect on memory requirements than one might think. The
*.fdt and *.fdx files in your index are used for the stored data, and they're
only read for the top N
schema.xml http://lucene.472066.n3.nabble.com/file/n4213864/schema.xml
solrconfig.xml
http://lucene.472066.n3.nabble.com/file/n4213864/solrconfig.xml
--
View this message in context:
Hi Erick,
The configuration is largely the default one, and I have not made much
change. I am also quite new to Solr although I have a lot of experience in
other search products.
The whole list of fields need to be retrieved, so I do not have much of a
choice. The total size of the index files
You're missing the point. One of the things that can really affect
response time is too-frequent commits. The fact that the commit
configurations have been commented out indicate that the commits
are happening either manually (curl, HTTP request or the like) _or_
you have, say, a SolrJ client that
On 6/25/2015 10:27 AM, Wenbin Wang wrote:
To clarify the work:
We are very early in the investigative phase, and the indexing is NOT done
continuously.
I indexed the data once through Admin UI, and test the query. If I need to
index again, I can use curl or through the Admin UI.
The Solr
To clarify the work:
We are very early in the investigative phase, and the indexing is NOT done
continuously.
I indexed the data once through Admin UI, and test the query. If I need to
index again, I can use curl or through the Admin UI.
The Solr 4.7 seems to have a default setting of
Hi Guys,
I have no problem changing it to 2. However, we are talking about two
different applications.
The Solr 4.7 has two applications: example and example-DIH. The application
example-DIH is the one I started with since it works with database.
The example-DIH has the default setting to 4.
1GB is too small to start. Try starting the same on both:
-Xms8196m -Xmx8196m
We use 12GB for these on a similar sized index and it works good.
Send schema.xml and solrconfig.xml.
Try not to store fields as much as possible.
On Wed, Jun 24, 2015 at 8:08 AM, wwang525 wwang...@gmail.com wrote:
Hi All,
I built the Solr index with 14 M records.
I have 20 G RAM in my local machine, and the Solr instance was started
with -Xms1024m -Xmx8196m
The following query:
http://localhost:8983/solr/db-mssql/select?q=*:*fq=GatewayCode:(YYZ)fq=DestCode:(CUN)fq=Duration:(5
OR 6 OR 7 OR
As stated previously, using Field Collapsing (group parameters) tends to
significantly slow down queries. In my experience, search response gets even
worst when:
- Requesting facets, which more often than not I do in my query formulation
- Asking for the facet counts to be on the groups via the
Hi Wenbin,
To me, your instance appears well provisioned. Likewise, your analysis of test
vs. production performance makes a lot of sense. Perhaps your time would be
well spent tuning the query performance for your app before resorting to
sharding?
To that end, what do you see when you
Grouping does tend to be expensive. Our regular queries typically return in
10-15ms while the grouping queries take 60-80ms in a test environment ( 1M
docs).
This is ok for us, since we wrote our app to take the grouping queries out of
the critical path (async query in parallel with two
I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or
computer disk bound. In addition, the Solr was started with maximal 4G for
JVM, and index size is 2G. In a typical test, I made sure enough free RAM
of 10G was available. I have not tuned any parameter in the configuration,
it
First and most obvious thing to try:
bq: the Solr was started with maximal 4G for JVM, and index size is 2G
Bump your JVM to 8G, perhaps 12G. The size of the index on disk is very
loosely coupled to JVM requirements. It's quite possible that you're spending
all your time in GC cycles. Consider
As for now, the index size is 6.5 M records, and the performance is good
enough. I will re-build the index for all the records (14 M) and test it
again with debug turned on.
Thanks
On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson erickerick...@gmail.com
wrote:
First and most obvious thing to
Also, since you are tuning for relative times, you can tune on the smaller
index. Surely, you will want to test at scale. But tuning query, analyzer
or schema options is usually easier to do on a smaller index. If you get a 3x
improvement at small scale, it may only be 2.5x at full scale.
Do be aware that turning on debug=query adds a load. I've seen the
debug component
take 90% of the query time. (to be fair it usually takes a much
smaller percentage).
But you'll see a section at the end of the response if you set
debug=all with the time each
component took so you'll have a sense
You've repeated your original statement. Shawn's
observation is that 10M docs is a very small corpus
by Solr standards. You either have very demanding
document/search combinations or you have a poorly
tuned Solr installation.
On reasonable hardware I expect 25-50M documents to have
sub-second
Hi,
We probably would like to shard the data since the response time for
demanding queries at 10M records is getting 1 second in a single request
scenario.
I have not done any data sharding before. What are some recommended way to
do data sharding. For example, may be by a criteria with a list
The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.
We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.
What are some best practice to shard the
10M doesn't sound too demanding.
How complex are your queries?
How complex is your data - like number of fields and size, like very large
documents?
Are you sure you have enough RAM to fully cache your index?
Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If
29 matches
Mail list logo