I'm trying to seed about 500 million rows from a Spark DataFrame into a clean Ignite database, but running into serious performance issues once Ignite runs out of durable memory. I'm running 4 Ignite Nodes on Kubernetes cluster backed by AWS i3.2xl instances (8 CPUs per node, 60 GB Memory, 2TB SSD Local Storage). My configuration parameters per node are as follows:
- 40GB of Available Memory - 20GB of Durable Memory - 8GB on the Java Heap While the nodes still have durable memory, they're writing to the cache at the rate of about 35k-40k a second. I'm expecting to take a performance hit once I run out of Durable Memory, but as soon as peristence kicks in, write performance spikes to about 15k/sec and steadily decreases over time, while the number of writes to disk steadily increases. My CPU usage also starts to slow down until Ignite is writing less than 5k a second and the CPU drops to less than a half core. I haven't let the performance continue to degrade to see what happens, but it shows no sign of beginning to stablize. I've tried all of Ignite's performance suggestions, including - breaking off the WAL and Storage into different disks - using local storage instead of EBS - checkpoint throttling - write throttling - adjusting the checkpoint page buffer size - adjusting the number of threads - adjusting swappiness In the end, I get the same results no matter what I do. Is write performance really this bad with persistence, and if not, what kind of performance should I be expecting and what can I do to improve it? Is there an alternative way to seed data that doesn't rely on the DataStreamers? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/