
I'm trying to understand the peering algorithm based on [1] and [2]. There
are things that aren't really clear or I'm not entirely sure if I
understood them correctly, so I'd like to ask some clarification on the
points below:

1, Is it right, that the primary writes the operations to the PG log
immediately upon its reception?

2, Is it possible that an operation is persisted, but never acknowledged?
Imagine this situation: a write arrives to an object, the operation is
copied to and get written to the journal by the replicas, but the primary
OSD dies and never recovers before it could acknowledge to the user. Upon
the next peering, this operations will make part of the authoritative

3, Quote from the second step of the peering algorithm: "generate a list of
past intervals since last epoch started"
If there was no peering failure, than there is exactly one past interval?

4, Quote from the same step: "the subset for which peering could have
completed before the acting set changed to another set of OSDs".
The other intervals are ignored, because we can be sure that no write
operations were allowed during those?

5, In each moment, the Up set is either equals to, or a strict subset of
the Acting set?

6, When does OSDs repeer? Only when an OSD goes from in -> out, or even if
an OSD goes down (but not yet marked automatically out)?

7, For what reasons can the peering fail? If the OSD map changes before the
peering completes, then it's a failure? If the OSD map doesn't change, then
a reason for failure is not being able to contact "at least one OSD from
each of past interval‘s acting set"?

8, up_thru: is a per OSD value in the OSD map, which is updated for the
primary after successfully agreeing on the authoritative history, but
before completing the peering. What about the secondaries?

Balázs Kossovics

[1] http://docs.ceph.com/docs/master/dev/peering/
[2] http://docs.ceph.com/docs/master/dev/osd_internals/last_epoch_started/
ceph-users mailing list

Reply via email to