Hi Kimchy,
I rerun the benchmark using ES1.3 with default settings (just disable the
_source _all ) and it makes a great progress on the performance. However
Solr still outperforms ES 1.3:
Number of different meta data field
ES
ES with disable _all/codec bloom filter
*ES 1.3 *
Solr
Yes, this is the equivalent of using RAMDirectory. Please, don't use this,
Mmap is optimized for random access and if the lucene index can fit in heap
(to use ram dir), it can certainly fit in OS RAM, without the implications
of loading it to heap.
On Monday, July 7, 2014 6:26:07 PM UTC+2,
Hi, thanks for running the tests!. My tests were capped at 10k fields and
improve for it, any more than that, I, and anybody here on Elasticsearch
(+Lucene: Mike/Robert) simply don't recommend and can't really be behind
when it comes to supporting it.
In Elasticsearch, there is a conscious
Thanks Shay for updating us with perf improvements.
Apart from using the default parameters, should we follow the guideline
listed in
http://elasticsearch-users.115913.n3.nabble.com/Is-ES-es-index-store-type-memory-equivalent-to-Lucene-s-RAMDirectory-td4057417.html
Lucene supports
Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant
perf improvements now for this case (including improvements Lucene wise,
that are for now in ES, but will be in Lucene next version). Those include:
6648: https://github.com/elasticsearch/elasticsearch/pull/6648
6714:
Added the Solr benchmark as well:
Number of different meta data field
ES with disable _all/codec bloom filter
ES (Ingestion Query concurrently)
Solr
Solr(Ingestion Query concurrently)
Scenario 0: 1000
13 secs -769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion
I run the benchmark where search and ingest runs concurrently. Paste the
results here:
Number of different meta data field
ES with disable _all/codec bloom filter
ES disabled params (Ingestion Query concurrently)
Scenario 0: 1000
13 secs -769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Some responses below:
On Tue, Jun 24, 2014 at 7:04 PM, Cindy Hsin cindy.h...@gmail.com wrote:
Looks like the memory usage increased a lot with 10k fields with these two
parameter disabled.
Based on the experiment we have done, looks like ES have abnormal memory
usage and performance
Hi Jörg,
I rerun the benchmark with disabling the _all and codec bloom filter: just
the index data size got reduced dramatically but ingestion speed is still
similar as previous:
Number of different meta data field
ES
ES with disable _all/codec bloom filter
Scenario 0: 1000
12secs -
Looks like the memory usage increased a lot with 10k fields with these two
parameter disabled.
Based on the experiment we have done, looks like ES have abnormal memory
usage and performance degradation when number of fields are large (ie.
10k). Where Solr memory usage and performance remains
Thanks!
I have asked Maco to re-test ES with these two parameter disabled.
One more question regard Lucene's capability with large amount of metadata
fields. What is the largest meta data fileds Lucene supports per Index?
What are different strategy to solve the large metadata fields issue? Do
Hi Cindy,
There isn't a hard limit on the number of field Lucene supports, it's more
than per-field there is highish heap used, added CPU/IO cost for merging,
etc. It's just not a well tested usage of Lucene, not something the
developers focus on optimizing, etc.
Partitioning by _type won't
Two things to add, to make Elasticsearch/Solr comparison more fair.
In the ES mapping, you did not disable the _all field.
If you have _all field enabled, all tokens will be indexed twice, one for
the field, one for _all.
On Fri, Jun 20, 2014 at 8:00 PM, Cindy Hsin cindy.h...@gmail.com wrote:
Hi, Mike:
Since both ES and Solr uses Lucene, do you know why we only see big ingest
performance degradation with ES but not Solr?
I'm not sure why: clearly something is slow with ES as you add more and
more fields. I
I tried your script with setting iwc.setRAMBufferSizeMB(4)/ and 48G
heap size. The speed can be around 430 docs/sec before the first flush and
the final speed is 350 docs/sec. Not sure what configuration Solr uses and
its ingestion speed can be 800 docs/sec.
Maco
On Wednesday, June 18,
On Wed, Jun 18, 2014 at 2:38 AM, Maco Ma mayaohu...@gmail.com wrote:
I tried your script with setting iwc.setRAMBufferSizeMB(4)/ and 48G
heap size. The speed can be around 430 docs/sec before the first flush and
the final speed is 350 docs/sec. Not sure what configuration Solr uses and
Hi,
Could you post the scripts you linked to (new_ES_config.sh,
new_ES_ingest_threads.pl, new_Solr_ingest_threads.pl) inlined? I can't
download them from where you linked.
Optimizing every 10 seconds or 10 minutes is really not a good idea in
general, but I guess if you're doing the same with
I tested roughly your Scenario 2 (100K unique fields, 100 fields per
document) with a straight Lucene test (attached, but not sure if the list
strips attachments). Net/net I see ~100 docs/sec with one thread ... which
is very slow.
Lucene stores quite a lot for each unique indexed field name and
The way we make Solr ingest faster (single document ingest) is by turn off
the engine soft commit and hard commit and use a client to commit the
changes every 10 seconds.
Solr ingest speed remains at 800 docs per second where ES ingest speed
drops in half when we increase the fields (ie. from
Hi Mike,
new_ES_config.sh(define the templates and disable the refresh/flush):
curl -XPOST localhost:9200/doc -d '{
mappings : {
type : {
_source : { enabled : false },
dynamic_templates : [
{t1:{
match :
It's not surprising that the time increases when you have an order of
magnitude more fields.
Are you using the bulk API?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 13 June 2014 15:57, Maco Ma
I used the curl command to do the ingestion(one command, one doc) and
flush. I also tried the Solr(disabled the soft/hard commit do the commit
with client program) with the same data commands and its performance did
not degrade. Lucene are used for both of them and not sure why there is a
Hi, Mark:
We are doing single document ingestion. We did a performance comparison
between Solr and Elastic Search (ES).
The performance for ES degrades dramatically when we increase the metadata
fields where Solr performance remains the same.
The performance is done in very small data set (ie.
I try to measure the performance of ingesting the documents having lots of
fields.
The latest elasticsearch 1.2.1:
Total docs count: 10k (a small set definitely)
ES_HEAP_SIZE: 48G
settings:
24 matches
Mail list logo