Re: [SSSD] mmap_cache problem

Michal Židek Tue, 10 Sep 2013 08:18:26 -0700

On 09/10/2013 03:43 PM, Lukas Slebodnik wrote:

On (10/09/13 09:18), Simo Sorce wrote:

On Tue, 2013-09-10 at 12:24 +0200, Lukas Slebodnik wrote:

On (09/09/13 09:22), Simo Sorce wrote:

On Mon, 2013-09-09 at 13:37 +0200, Lukas Slebodnik wrote:

On (07/09/13 13:43), Simo Sorce wrote:

On Fri, 2013-09-06 at 18:55 -0400, Simo Sorce wrote:

On Fri, 2013-09-06 at 19:45 +0200, Lukas Slebodnik wrote:

On (06/09/13 12:48), Simo Sorce wrote:

On Fri, 2013-09-06 at 16:26 +0200, Lukas Slebodnik wrote:

Step 6) Remove record R3(hash_1:4, hash_2:3) stored at slot index 0x43
    a) Remove record R3 from chain starting by hash R3->hash1 (value 4)
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL


    b) Remove record R3 from chain starting by hash R3->hash2 (value 3)
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

I do not see how this happens, in hash[3] you have another record after
R3 so hash[3] should be

This behaviour was introduced by patch 4662725ffef62b3b2502481438effa7c8fef9f80
and it is very similar to reinvalidation.


hash[3] -> 0x54(R4) -> MC_INVALID_VAL

however I do see how R3 would remain in hash[1] and hash[2].

And that is a problem, the problem I was trying to wrap my mind around
with the patch you proposed last time. Wouldn't the revalidation idea I
had last time fix this issue ?

Simo.


record R1(hash_1:1, hash_2:2) stored at slot index 0x12
record R2(hash_1:3, hash_2:1) stored at slot index 0x31
record R3(hash_1:4, hash_2:3) stored at slot index 0x43
record R4(hash_1:5, hash_2:4) stored at slot index 0x54

Last state:
-------------------------
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

I will try to describe last removing with your idea about reinvalidation.

Step 6) Remove record R3(hash_1:4, hash_2:3) stored at slot index 0x43
     a) Remove record R3 from the 1st chain starting by hash R3->hash1 (value 4)
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

     b) Revalidate chain for R3->hash1 (value 4)
        -- chain for hash[4] refers to 0x54(R4) and R4 has hash with value 4

     c) Remove record R3 from the 2nd chain starting by hash R3->hash2 (value 3)
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

     d) Revalidate chain for R3->hash2 (value 3)
        -- chain for hash[3] refers to 0x54(R4)
           and R4 *DOES NOT have* hash with value 4
        --hash[3] will refer wo successor of R4, MC_INVALID_VAL
hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

     e) Revalidate all succesors of record R3. R3->next == 0x54 (R4)
        in this case, only record R4(hash_1:5, hash_2:4) is in chain
        -- chain for hash[5] refers to 0x54(R4) and R4 has hash with value 5
        -- chain for hash[4] refers to 0x54(R4) and R4 has hash with value 4

     f) slots for record R2 will be invalidated.
hash[1] -> 0x12(R1) -> INVALIDATED_SLOT ??-> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> INVALIDATED_SLOT ??-> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

As you can see the, result is the same as from my previous mail.

This is a special case, but it can happen.
I spent a lot of time with reproducing this case and with analysis.
So I hope this explanation helps to understand this special case.
Is it clear now?


It is clear to me, but what you describe here is not my original
proposal.

Revalidation was the rechecking of chains when you remove an object.
In your example revalidation of the chains when you remove R2 (not R3)
would lead to removal of R3 from hash[1] and hash[2] chains resolving
the problem.

I am only sorry I wasn't more firm on the revalidation idea now.

I will send some examples later.

Simo.


Just a refresher of my original proposal:
After we remove a record we re-validate the 2 chains it belonged to, and
prune any record that does not match the chain record or any of the
chains referenced by the records that precede.


Ok here is the example.

---
This was the problem:
Add record R1(hash_1:1, hash_2:2) stored at slot index 0x12
Add record R2(hash_1:3, hash_2:1) stored at slot index 0x31
Add record R3(hash_1:4, hash_2:3) stored at slot index 0x43
Add record R4(hash_1:5, hash_2:4) stored at slot index 0x54
Remove record R2
Remove record R3

---
This was the state at step 4 of your first email, after everything was
added and before anything is removed:

hash[1] -> 0x12(R1) -> 0x31(R2) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x31(R2) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x31(R2) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

---
So let's remove R2.

This is the state (as reported by you) after R2 is removed from all its
chains:

hash[1] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[3] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

R2 was in chain 3 and 1, so *now* let's re-validate the chains.

First we do [3]
- First entry is R3 (4, 3), 3 matches so we keep it.
  And now the chains to match are [3, 4] ([3] + (4, 3))
- Second is R4 (5, 4), 4 matches so we keep it.
  And now the chains to match are [3, 4, 5] ([3, 4] + (5, 4))
- We are done, no more elements on hash[3]

Then we do hash[1]
- First entry is R1 (1, 2), 1 matches so we keep it.
  And now the chains to match are [1, 2] ([1] + (1, 2))
- Second entry is R3 (4, 3), it does *NOT* match any, so
   we skip it, which means pointing R1 to the next one (R4)
  Two chains are affected this way [1] and [2]:
  hash[1] -> 0x12(R1) -> 0x54(R4) -> MC_INVALID_VAL
  hash[2] -> 0x12(R1) -> 0x54(R4) -> MC_INVALID_VAL
- Third now is R4 (5, 4) and the matching table is still [1, 2]
  It does *NOT* match once again, so we skip this too.
  R1 is now pointing at MC_INVALID_VAL which is the next value for R4
  Two chains are affected this way [1] and [2]:
  hash[1] -> 0x12(R1) -> MC_INVALID_VAL
  hash[2] -> 0x12(R1) -> MC_INVALID_VAL
- We are done, no more elements on hash[1]

So the final status for step 5 (removal of R2) now is:
hash[1] -> 0x12(R1) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> MC_INVALID_VAL
hash[3] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x43(R3) -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

Now we remove R3 (4, 3), which is a quite simple operation:

hash[1] -> 0x12(R1) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> MC_INVALID_VAL
hash[3] -> 0x54(R4) -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

We have to do revalidation for chains [4], [3]

First we do hash[4]
- First entry is R4 (5, 4), 4 matches so we keep it.
  And now the chains to match are [4, 5] ([4] + (5, 4))
- We are done, no more elements on hash[4]

Then we do hash[3]
- First entry is R4 (5, 4), it does *NOT* match any, so
  we skip it, which means pointing the head to MC_INVALID_VAL
  hash[3] -> MC_INVALID_VAL
- We are done, no more elements on hash[3]

Final status after step 6 is:
hash[1] -> 0x12(R1) -> MC_INVALID_VAL
hash[2] -> 0x12(R1) -> MC_INVALID_VAL
hash[3] -> MC_INVALID_VAL
hash[4] -> 0x54(R4) -> MC_INVALID_VAL
hash[5] -> 0x54(R4) -> MC_INVALID_VAL

Which is the correct status.

In this case, It is correct status, but I have a counterexample.

So re-validation of chains solves the problem as I had originally
proposed a few patches ago :)


We have records: R2(2,3) R3(4,3) R4(4,5) R5(5,6) R6(6,7)

Steps: add R4, add R3, add R5, add R2, add R6, rm R3


state after addind recors: add R4, add R3, add R5, add R2, add R6

h_table | chain
   2    -> R2 -> R6
   3    -> R3 -> R5 -> R2 -> R6
   4    -> R4 -> R3 -> R5 -> R2 -> R6
   5    -> R4 -> R3 -> R5 -> R2 -> R6
   6    -> R5 -> R2 -> R6
   7    -> R6

Remove record R3(3,4)
a) remove R3 from chain h_table[4]
b) remove R3 from chain h_table[3]

h_table | chain
   2    -> R2 -> R6
   3    -> R5 -> R2 -> R6
   4    -> R4 -> R5 -> R2 -> R6
   5    -> R4 -> R3 -> R5 -> R2 -> R6
   6    -> R5 -> R2 -> R6
   7    -> R6


This is wrong, removing R3, removes it from chain 5 too (it was and
remains identical to chain 4).

c)Revalidate chain h_table[4]

- First entry is R4 (4, 5), 4 matches so we keep it.
   And now the chains to match are [4, 5] ([4] + (4, 5))
- Second is R5 (5, 6), 5 matches so we keep it.
   And now the chains to match are [4, 5, 6] ([4, 5] + (5, 6))
- Third is R2 (2, 3), no match in  [4, 5, 6]
   so we skip this.
   R5 is now pointing at R6 which is the next value for R2

Current state:

h_table | chain
   2    -> R2 -> R6
   3    -> R5 -> R6
   4    -> R4 -> R5 -> R6
   5    -> R4 -> R3 -> R5 -> R6
   6    -> R5 -> R6
   7    -> R6

R2(2, 3) should be reachable with hash key 3, bu it isn't
It is wrong.


True, when thinking about re-validation I was wondering if we need to
validate both chains before allowing removal of any entry.
I guess we need to.

So what we can do is to simply keep track of all record we would remove

>from the first chain we validate, then parse the second and if any pair

is 'revalidated' we mark it as to not be deleted in the first chanin
either. If at the end of revalidation we have left anything to remove we
do.

The first entry in the chain can always be removed, so we only keep
middle entries.

so we keep the list of links we want to sever and what we want to keep,

>from the first chain, and if any is revalidated by the second one we

scan we skip it.


Situation after removal of R3 and before revalidation:

R2(2,3) R3(4,3) R4(4,5) R5(5,6) R6(6,7)

h_table | chain
  2    -> R2 -> R6
  3    -> R5 -> R2 -> R6
  4    -> R4 -> R5 -> R2 -> R6
  5    -> R4 -> R5 -> R2 -> R6
  6    -> R5 -> R2 -> R6
  7    -> R6

During revalidation of chain 4 we have (R5, R2) as one of the links to
remove after the first chain pass (chain 4), but we do not remove it
yet.
We also have marked as keep (R4, R5), (R5, R6) (note that I skipped R2
here between R5 and R6)

Then we revalidate chain 3:

- first entry is R5(5,6), it doesn't match so we remove it.
- then we check R2(2, 3) which matches, it is now the first of the list
so we do not need to keep track of it.
Check is now for [2, 3] = [3 + (2, 3)]
- now we check R6, it doesn't match so we mark it for removal (R2, R6)
- done

Interim situation where we removed only heads of lists:
h_table | chain
  2    -> R2 -> R6
  3    -> R2 -> R6
  4    -> R4 -> R5 -> R2 -> R6
  5    -> R4 -> R5 -> R2 -> R6
  6    -> R5 -> R2 -> R6
  7    -> R6


ok now we have the following 2 lists:

1. to keep: (R4, R5), (R5, R6)
2. to remove: (R5, R2) (R2, R6)

What we need to do here is check if any of those to remove is in the
list to keep, if it is strike it out from the remove list.

We start with (R5, R2) in the removal-list and check it against the
keep-list.
It doesn't match, so we go on and link R5->(R2->next)

h_table | chain
  2    -> R2 -> R6
  3    -> R2 -> R6
  4    -> R4 -> R5 -> R6
  5    -> R4 -> R5 -> R6
  6    -> R5 -> R6
  7    -> R6

Then we go to the second element (R2, R6), we scan the keep list and we
do not find it either, so we proceed to change R2 -> (R6->next):

h_table | chain
  2    -> R2
  3    -> R2
  4    -> R4 -> R5 -> R6
  5    -> R4 -> R5 -> R6
  6    -> R5 -> R6
  7    -> R6

And we are done, we have no more removal to process and the chains look sane 
and well trimmed down.

Simo.


We have records: R0(0,1); R1(1,2); R2(2,3); R3(3,4); R4(4,5); R5(5,6); R6(6,7)
                  R7(7,8);

Steps: +R4, +R3, +R5, +R2, +R6, +R1, +R0, +R7,
        -R1

state after addind recors: +R4, +R3, +R5, +R2, +R6, +R1, +R0, +R7,

h_table | chain
   0    -> R0 -> R7
   1    -> R1 -> R0 -> R7
   2    -> R2 -> R6 -> R1 -> R0 -> R7
   3    -> R3 -> R5 -> R2 -> R6 -> R1 -> R0 -> R7
   4    -> R4 -> R3 -> R5 -> R2 -> R6 -> R1 -> R0 -> R7
   5    -> R4 -> R3 -> R5 -> R2 -> R6 -> R1 -> R0 -> R7
   6    -> R5 -> R2 -> R6 -> R1 -> R0 -> R7
   7    -> R6 -> R1 -> R0 -> R7
   8    -> R7

As you can see, last added record R7 is part of each
chain. Actually, it is only one chain and few sub-chains.


Remove R1 (it will be devided in few steps

-- remove R1 from chains h_table[1] and h_table[2]
h_table | chain
   0    -> R0 -> R7
   1    -> R0 -> R7
   2    -> R2 -> R6 -> R0 -> R7
   3    -> R3 -> R5 -> R2 -> R6 -> R0 -> R7
   4    -> R4 -> R3 -> R5 -> R2 -> R6 -> R0 -> R7
   5    -> R4 -> R3 -> R5 -> R2 -> R6 -> R0 -> R7
   6    -> R5 -> R2 -> R6 -> R0 -> R7
   7    -> R6 -> R0 -> R7
   8    -> R7

-- there isn't problem with the first entry in chains

-- marking records for removing in chain h_table[1]
Keep :
Remove: (R0, R7)

-- marking records for removing in chain h_table[2]
Keep :
Remove: (R0, R7); (R2, R6); (R6, R0);

-- removing records from chain h_table[1]
What we need to do here is check if any of those to remove is in the
list to keep, if it is strike it out from the remove list.

We start with (R0, R7) in the removal-list and check it against the
keep-list.
It doesn't match, so we go on and link R0->(R7->next)
                                        R0-> MC_INVALID_VAL

Current state:

h_table | chain
   0    -> R0
   1    -> R0
   2    -> R2 -> R6 -> R0
   3    -> R3 -> R5 -> R2 -> R6 -> R0
   4    -> R4 -> R3 -> R5 -> R2 -> R6 -> R0
   5    -> R4 -> R3 -> R5 -> R2 -> R6 -> R0
   6    -> R5 -> R2 -> R6 -> R0
   7    -> R6 -> R0
   8    -> R7

In this state, record R7 is *not* reachable with hash key "7"

Revalidation of chain is very dangerous. The main problem is with single chain
(link-list) for different hash keys and situation is much more complicated
with higher collision of hash keys. This is a reason why I proposed to have
two next pointers in record (rec->next1, rec->next2)


You got the computations wrong, however I found a different way in which
the revalidation fails :-(

I draw examples on paper with pencil.
So, I could do a mistake when I rewrote example into mail.


I was trying to keep one chain list as is to avoid changing the on disk
format, but at this point I agree it's getting complicated beyond what
is reasonable. If you propose a patch that adds 2 next elements I will
review it.

Simo.

At the moment, structure sss_mc_rec looks like
struct sss_mc_rec {
     uint32_t b1;            /* barrier 1 */
     uint32_t len;           /* total record length including record data */
     uint64_t expire;        /* record expiration time (cast to time_t) */
     rel_ptr_t next;         /* ptr of next record rel to data_table */
     uint32_t hash1;         /* val of first hash (usually name of record) */
     uint32_t hash2;         /* val of second hash (usually id of record) */
     uint32_t b2;            /* barrier 2 - 32 bytes mark, fits a slot */
     char data[0];
};

I can see two possible solution,
1) sizeof(struct sss_mc_rec) will increase
    --new member will be added to this tructure

struct sss_mc_rec {
     uint32_t b1;
     uint32_t len;
     uint64_t expire;
     rel_ptr_t next1;    // "next" renamed to next1
     rel_ptr_t next2;    // "next2" is new member of structure
     uint32_t hash1;
     uint32_t hash2;
     uint32_t b2;
     uint32_t padding;   // unused
     char data[0];
};
    -- MC_SLOT_SIZE will be increased from 32 -> to 40
    -- reason: MC_SLOT_SIZE % sizeof(void*) should be 0 on each platform

2) sizeof(struct sss_mc_rec) will not change
struct sss_mc_rec {
     uint32_t b1;
     uint32_t len;
     uint32_t expire_offset;  // type was changed from uint64_t -> uin32_t
     rel_ptr_t next1;    // "next" renamed to next1
     rel_ptr_t next2;    // "next2" is new member of structure
     uint32_t hash1;
     uint32_t hash2;
     uint32_t b2;
     char data[0];
};

    --expire_offset will be time since creation of mmap_cache.
    --creation time of mmap_cache(uint64_t) will be stored in
      "sss_mc_header mcc_ctx"
    --reason: Year 2038 problem

Which one do you prefer?

LS


I would keep the "uint64_t expire" member even if the size of record
grows a little.

Michal
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] mmap_cache problem

Reply via email to