Hi,

Just looking for some pointers on a problem with partition corruption, 
apparently caused by concurrent bind requests.

For context, we’re using ApacheDS embedded within an application - so it’s a 
custom ApacheDS instance, based on your Default directory service, but the 
configuration is all ours (so I have to allow that we’ve just got something 
wrong and need to understand what that is).

We sometimes see the familiar "ERR_554 double get for block 0” error - rarely, 
but often enough to be a concern. The partition is never been repairable, we 
have to restore from backup or start over.

I was able to reproduce the corruption on-demand with a quick test case - some 
number of concurrent failed bind requests reliably trigger the issue (it’s 
usually ERR_554, but I was able to generate a few other ERR conditions in JDBM 
with repeated tests). I used a bad ldapsearch (using an incorrect password) to 
generate the failed bind requests, and ran several copies in a loop to generate 
a little load.

After digging around, I concluded that the bind requests are making concurrent 
updates to the JDBM partition due to our use of password policies. i.e. login 
failures are being tracked in the DIT, and each failed bind request was trying 
to update the DIT under a read-lock acquired for the bind operation by the 
default operation manager.

I’ve implemented a fix in our instance - I added an interceptor for the bind 
operation, before the AuthenticationInterceptor, that upgrades the read lock to 
a write lock and then forwards the bind to the AuthenticationInterceptor (and 
then downgrades back to a read lock before returning to the operation manager).

It seems a little heavy handed, but that’s fixed it in our implementation, 
albeit at the expense of bind scalability. That’s not an issue in our 
implementation.

Am I missing something in our configuration ? I’ve dug through the code, but 
didn’t spot anything that might alter the locking strategy for bind operations 
; yet it seems to be inevitable that concurrent binds will touch the partition 
at some point when password policies are enabled - so I figure we must be 
missing something.

And I feel I must restate - this is very much a custom ApacheDS instance (not 
generated by Directory Studio ; configuration is thru code, not LDIF, etc), so 
whilst it’s close to the default configuration in many respects, I’m entirely 
expecting that we’ve just missed something obvious or are otherwise just doing 
it wrong.

Cheers, Carl.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to