Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread Maco Ma
Hi Kimchy, I rerun the benchmark using ES1.3 with default settings (just disable the _source _all ) and it makes a great progress on the performance. However Solr still outperforms ES 1.3: Number of different meta data field ES ES with disable _all/codec bloom filter *ES 1.3 * Solr

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread kimchy
Yes, this is the equivalent of using RAMDirectory. Please, don't use this, Mmap is optimized for random access and if the lucene index can fit in heap (to use ram dir), it can certainly fit in OS RAM, without the implications of loading it to heap. On Monday, July 7, 2014 6:26:07 PM UTC+2,

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread kimchy
Hi, thanks for running the tests!. My tests were capped at 10k fields and improve for it, any more than that, I, and anybody here on Elasticsearch (+Lucene: Mike/Robert) simply don't recommend and can't really be behind when it comes to supporting it. In Elasticsearch, there is a conscious

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-07 Thread Mahesh Venkat
Thanks Shay for updating us with perf improvements. Apart from using the default parameters, should we follow the guideline listed in http://elasticsearch-users.115913.n3.nabble.com/Is-ES-es-index-store-type-memory-equivalent-to-Lucene-s-RAMDirectory-td4057417.html Lucene supports

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-05 Thread kimchy
Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant perf improvements now for this case (including improvements Lucene wise, that are for now in ES, but will be in Lucene next version). Those include: 6648: https://github.com/elasticsearch/elasticsearch/pull/6648 6714:

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-26 Thread Maco Ma
Added the Solr benchmark as well: Number of different meta data field ES with disable _all/codec bloom filter ES (Ingestion Query concurrently) Solr Solr(Ingestion Query concurrently) Scenario 0: 1000 13 secs -769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G Index Size: 248K Ingestion

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-25 Thread Maco Ma
I run the benchmark where search and ingest runs concurrently. Paste the results here: Number of different meta data field ES with disable _all/codec bloom filter ES disabled params (Ingestion Query concurrently) Scenario 0: 1000 13 secs -769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-25 Thread Michael McCandless
Some responses below: On Tue, Jun 24, 2014 at 7:04 PM, Cindy Hsin cindy.h...@gmail.com wrote: Looks like the memory usage increased a lot with 10k fields with these two parameter disabled. Based on the experiment we have done, looks like ES have abnormal memory usage and performance

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-24 Thread Maco Ma
Hi Jörg, I rerun the benchmark with disabling the _all and codec bloom filter: just the index data size got reduced dramatically but ingestion speed is still similar as previous: Number of different meta data field ES ES with disable _all/codec bloom filter Scenario 0: 1000 12secs -

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-24 Thread Cindy Hsin
Looks like the memory usage increased a lot with 10k fields with these two parameter disabled. Based on the experiment we have done, looks like ES have abnormal memory usage and performance degradation when number of fields are large (ie. 10k). Where Solr memory usage and performance remains

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-23 Thread Cindy Hsin
Thanks! I have asked Maco to re-test ES with these two parameter disabled. One more question regard Lucene's capability with large amount of metadata fields. What is the largest meta data fileds Lucene supports per Index? What are different strategy to solve the large metadata fields issue? Do

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-23 Thread Michael McCandless
Hi Cindy, There isn't a hard limit on the number of field Lucene supports, it's more than per-field there is highish heap used, added CPU/IO cost for merging, etc. It's just not a well tested usage of Lucene, not something the developers focus on optimizing, etc. Partitioning by _type won't

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-22 Thread Jörg Prante
Two things to add, to make Elasticsearch/Solr comparison more fair. In the ES mapping, you did not disable the _all field. If you have _all field enabled, all tokens will be indexed twice, one for the field, one for _all.

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-21 Thread Michael McCandless
On Fri, Jun 20, 2014 at 8:00 PM, Cindy Hsin cindy.h...@gmail.com wrote: Hi, Mike: Since both ES and Solr uses Lucene, do you know why we only see big ingest performance degradation with ES but not Solr? I'm not sure why: clearly something is slow with ES as you add more and more fields. I

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-18 Thread Maco Ma
I tried your script with setting iwc.setRAMBufferSizeMB(4)/ and 48G heap size. The speed can be around 430 docs/sec before the first flush and the final speed is 350 docs/sec. Not sure what configuration Solr uses and its ingestion speed can be 800 docs/sec. Maco On Wednesday, June 18,

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-18 Thread Michael McCandless
On Wed, Jun 18, 2014 at 2:38 AM, Maco Ma mayaohu...@gmail.com wrote: I tried your script with setting iwc.setRAMBufferSizeMB(4)/ and 48G heap size. The speed can be around 430 docs/sec before the first flush and the final speed is 350 docs/sec. Not sure what configuration Solr uses and

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Michael McCandless
Hi, Could you post the scripts you linked to (new_ES_config.sh, new_ES_ingest_threads.pl, new_Solr_ingest_threads.pl) inlined? I can't download them from where you linked. Optimizing every 10 seconds or 10 minutes is really not a good idea in general, but I guess if you're doing the same with

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Michael McCandless
I tested roughly your Scenario 2 (100K unique fields, 100 fields per document) with a straight Lucene test (attached, but not sure if the list strips attachments). Net/net I see ~100 docs/sec with one thread ... which is very slow. Lucene stores quite a lot for each unique indexed field name and

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Cindy Hsin
The way we make Solr ingest faster (single document ingest) is by turn off the engine soft commit and hard commit and use a client to commit the changes every 10 seconds. Solr ingest speed remains at 800 docs per second where ES ingest speed drops in half when we increase the fields (ie. from

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Maco Ma
Hi Mike, new_ES_config.sh(define the templates and disable the refresh/flush): curl -XPOST localhost:9200/doc -d '{ mappings : { type : { _source : { enabled : false }, dynamic_templates : [ {t1:{ match :

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Mark Walkom
It's not surprising that the time increases when you have an order of magnitude more fields. Are you using the bulk API? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 13 June 2014 15:57, Maco Ma

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Maco Ma
I used the curl command to do the ingestion(one command, one doc) and flush. I also tried the Solr(disabled the soft/hard commit do the commit with client program) with the same data commands and its performance did not degrade. Lucene are used for both of them and not sure why there is a

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Cindy Hsin
Hi, Mark: We are doing single document ingestion. We did a performance comparison between Solr and Elastic Search (ES). The performance for ES degrades dramatically when we increase the metadata fields where Solr performance remains the same. The performance is done in very small data set (ie.

ingest performance degrades sharply along with the documents having more fileds

2014-06-12 Thread Maco Ma
I try to measure the performance of ingesting the documents having lots of fields. The latest elasticsearch 1.2.1: Total docs count: 10k (a small set definitely) ES_HEAP_SIZE: 48G settings: