On 10/29/13, 12:28 PM, Terry P. wrote:

What are your thoughts on doing an hourly flush of the table in the
shell to ensure entries are flushed to disk more frequently to help
minimize the replay required if connectivity to a node is lost?

If you want to go the route of flushing more frequently, I would probably suggest dropping the configuration for tserver.walog.max.size from the default of 1G to something else (maybe 256M or 512M?).

My gut is telling me that this still isn't going to help you in the end. What does the distribution on your ingest look like?

Looking back at some old emails from you, if you're ingesting UUIDs as the row key, most likely you're ingesting to a "small" amount of data to many servers. If this is the case, it's more likely that you're just playing the odds as to whether you happen to catch a flush the exact moment before you lose the N servers that contained your WALs.

Increasing the WAL replication is likely the best solution you can get for yourself. Hoping that your failures only occur after a flush but before you ingest more data seems unlikely to happen. If you still want data flushed more often, reducing the WAL size will be automatic over your manual cron job to flush the table (one less thing to manage).

And, as you likely know, this would all be at the expense of ingest performance.

Reply via email to