Hello,

We are implementing ceph as storage backend for some systems. 
Unfortunately we have to use a posix filesystem for storing the data.

To accomplish this we have implemented a solution quite similar to what 
Sebastien Han has described on his blog here 
http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/

Now to our problem. We want to be sure that a write is replicated before we get 
a ack. Therefor we have set pg size to 2, and min_size to 2 as we have seen 
that a sudden removal of one osd can lead to data loss with min_size set to 1.

The problem now is that if one osd goes down some pg's will end up incomplete, 
and no io operations will be allowed to the rbd. 

This problem could be solved a couple of ways

1) An option could be set so that writes always is done to the number of 
replicas as size before the write is acknowledged.
2) If a situation where one a pg ends up in a incomplete state ceph tries to 
resolv the situation by doing a recovery of the pg's in question.

For us adding a third replica isn't a feasible solution, 1) we have our data in 
two locations 2) The cost would be to high.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to