Ignite Performance Issues when seeding data from Spark

kellan Sun, 02 Dec 2018 14:55:17 -0800

I'm trying to seed about 500 million rows from a Spark DataFrame into a clean
Ignite database, but running into serious performance issues once Ignite
runs out of durable memory. I'm running 4 Ignite Nodes on Kubernetes cluster
backed by AWS i3.2xl instances (8 CPUs per node, 60 GB Memory, 2TB SSD Local
Storage). My configuration parameters per node are as follows:


- 40GB of Available Memory
- 20GB of Durable Memory
- 8GB on the Java Heap

While the nodes still have durable memory, they're writing to the cache at
the rate of about 35k-40k a second. I'm expecting to take a performance hit
once I run out of Durable Memory, but as soon as peristence kicks in, write
performance spikes to about 15k/sec and steadily decreases over time, while
the number of writes to disk steadily increases. My CPU usage also starts to
slow down until Ignite is writing less than 5k a second and the CPU drops to
less than a half core. I haven't let the performance continue to degrade to
see what happens, but it shows no sign of beginning to stablize.

I've tried all of Ignite's performance suggestions, including 

- breaking off the WAL and Storage into different disks
- using local storage instead of EBS
- checkpoint throttling
- write throttling
- adjusting the checkpoint page buffer size
- adjusting the number of threads
- adjusting swappiness

In the end, I get the same results no matter what I do. Is write performance
really this bad with persistence, and if not, what kind of performance
should I be expecting and what can I do to improve it?

Is there an alternative way to seed data that doesn't rely on the
DataStreamers?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Ignite Performance Issues when seeding data from Spark

Reply via email to