Why data tripled in size after repair?

Andrey Ilinykh Wed, 26 Sep 2012 09:30:45 -0700

Hello everybody!
I have 3 node cluster with replication factor of 3.
each node has 800G disk and it used to have 100G of data.
What is strange every time I run repair data takes almost 3 times more
- 270G, then I run compaction and get 100G back.
Unfortunately, yesterday I forget to compact and run repair again (at
that moment I had around 270G). As result I have 720G on each node.
I run compaction again and get a lot of warnings like this


WARN [CompactionExecutor:732] 2012-09-26 16:13:00,745
CompactionTask.java (line 84) insufficient space to compact all
requested files

which makes sense, because I'm almost out of disk space.

So, I have two questions.

1. Why repair almost triples data size?

2. How to compact my data back to 100G?

Thank you,
  Andrey

Why data tripled in size after repair?

Reply via email to