SSD will improve overall performance very much, yes. Disk drives are the
slowest part in the chain and this will help. No more low IOPS, so it will
significantly reduce the load on CPU (less IO waits).
More RAM will not help that much. In fact, more RAM will slow down
persisting, it increases
My use case is bibliographic data indexing of academic and public
libraries. There are ~100m records from various sources that I regularly
extract, transform into JSON-LD, and load into Elasticsearch. Some are
files, some are fetched by JDBC. I have six 32-core servers in our place,
organized in 2
Good to know, I will keep this in mind, even though I will try to go for
SSD as I personally had great success with them in the past! When you say
10-12 MB/sec, is this with doc parsing/processing or just ES index time.
For my humble test on a quadcore labtop, I am pushing 6 MB/sec with
SSD is the best you can do for the persistence layer. I have such an ES
4xSSD RAID0 server at home, with 800 MB/sec sustained write I/O rate. My
servers for my day job are some years old when some TB in SSD costed a
fortune.
The higher the writing rate and IOPS capacity of the drives are, the
Jörg,
Just so I understand this, if I were to index 100 MB worth of data total
with chunk volumes of 5 MB each, this means I have to index 20 times.If I
were to set the bulk size to 20 MB, I will have to index 5 times.
This is a small data size, picture I have millions of documents. Are you
Not sure if I understand.
If I had to index a pile of documents, say 15M, I would build bulk request
of 1000 documents, where each doc is in avg ~1K so I end up at ~1MB. I
would not care about different doc size as they equal out over the total
amountThen I send this bulk request over the wire.
Thanks again for clarifying this, I think I understand this, what I was
referring to in my prior posts was the difference between setting 1000
documents vs 1 documents, I was thinking the bigger the chunk volume
will produce less over the wire index requests, but I understand your
What is the default of JVM 64 MB limit? Elasticsearch uses by default 1
GB heap, not 64 MB. Maybe you have an extra JVM with your bulk client that
uses 64 MB? This is much too few. Use 4-6 GB heap if your machine allows
that.
Note, JVM 7 of OpenJDK/Oracle, which is recommended, uses 25% of your