Re: RFR 6913047: SunPKCS11 memory leak

Valerie Peng Wed, 03 Oct 2018 13:37:47 -0700

Hi Martin,

I found the problem causing the one regression test failure and fixedit. Now Mach5 run is clean.


http://cr.openjdk.java.net/~valeriep/6913047Exp/webrev.02

I also made various changes hoping to improve things. You can comparethe files in above webrev with yours for differences. General principalis to minimize the changes as all new code may introduce regressionsespecially with changes of this scale.One key difference is that in your code, you destroy the native keyhandle after extracting native key info in the key constructor, and thenre-creating the key handle when increase the reference count. I use adifferent approach, I keep the key handle in the key constructor whichwill then be destroyed when the reference count goes down to 0. From theregression test output that I observed, most keys are created and usedonce. Keeping the keyID in constructor seems more efficient. Besides, Ialso disable native key info extraction for all token keys.

One thing that I am debating is whether we should add some property todisable this. I am aware that this will only be enabled if the key infoextraction succeeds. However, would there be cases where the extractionsucceeds but the re-creation fails? P11 Key objects are quite widelyused, if something goes wrong, the impact may be significant.


Thanks,
Valerie

On 10/1/2018 6:48 PM, Valerie Peng wrote:

Hi Martin,
For the KeyStore case, they are mostly token objects which the extractkey info approach does not apply?
For your changes in p11_keymgmt.c, I ran into compiler error andSIGBUS errors on two OS (mac and solaris sparc), I ended up changingvariable initializations as well as memset(...). With the updatednative changes, I adapted and re-tested my prototype changes. Forreference, you can find the updated prototype changes at:http://cr.openjdk.java.net/~valeriep/6913047Exp/webrev.01/
Besides making changes in the keymgmet.c for getting rid ofplatform-specific compilation error and SIGBUS error, I noticed thatyou hardcoded the key wrapping mechanism in native code for bothgetNativeKeyInfo(...)/createNativeKey() methods, it seems better tostoring the mechanism object at java side, i.e. P11Key and itsassociated classes, and then pass the object to JNI code (please alsosee my webrev.01)
In addition, I switched the reference counting to your model, i.e.increase in init() and decrease in reset(), instead of thetry-n-finally model in prototype webrev.00. My earlier comment onP11Cipher class which you should not replace the initialize() callwith ensureInitialized() call applies to all other PKCS11 classes as well.
With this approach, the KeyID field of P11Key should not be freelyaccessible and directly referenced outside of P11Key class. Also, theincrease and decrease of reference counting must be paired up.Supposedly, the reference count should not go negative, right? If thereference counting isn't correct, the key may be freed pre-maturely?Lastly, the reference counting is an implementation detail and I thinkit's better to keep it inside the P11Key class/file and not exposingit, i.e. through method names.
I have spent time verifying my updated prototype changes and trace thereference counting. All look fine, except there is one regression testfailure (sun/security/tools/keytool/NssTest.java) on linux-x64 which Iam still troubleshooting. However, I will be on vacation from 10/4 to10/21, so I want to update you on what I have so you can continueduring my vacation.
Thanks,

Valerie



On 9/18/2018 4:48 PM, Valerie Peng wrote:
Hi Martin,
I am ok with your conservation choice of only applying this whenusing NSS. If we are only applying this for NSS, we should reallyrefactor the code to minimize the impact on callers and P11Key class.My prototype code may be on the extreme end of minimizing changes.But the current webrev can use some refactoring also. With yourexplanation, I now understand your model better. How about therefactoring in P11Key class? Is there a reason for not doing this? Idid test my prototype code against existing regression tests (exceptthe KeyStore ones as more API changes are needed for persistent keyswhich I have not covered in prototype) but I ran into some strangeerrors in some native p11 calls which I did not touch so I commentedthem out and just checked the part of reference count, etc.
I will take a closer look at the KeyStore case and let you know.
Thanks,
Valerie

On 9/18/2018 7:29 AM, Martin Balao wrote:
Hi Valerie,

Thanks for your comments.

Here it is Webrev.11:
*http://cr.openjdk.java.net/~mbalao/webrevs/6913047/6913047.webrev.11/<http://cr.openjdk.java.net/%7Embalao/webrevs/6913047/6913047.webrev.11/> *http://cr.openjdk.java.net/~mbalao/webrevs/6913047/6913047.webrev.11.zip<http://cr.openjdk.java.net/%7Embalao/webrevs/6913047/6913047.webrev.11.zip>
<src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Cipher.java>
L397: That's right. I was trying to simplify the code but missedthis. Thanks.L471: The key reference counter has to be decremented under anyexception (P11Key.decNativeKeyRef method call). But, yes, noexception different than PKCS11Exception should be thrown. Revertedthis change.
<src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java>
L99: Comment changed. It should be better now.
L148-L149: In fact, I'd enforce this and disable the feature for alltoken keys. Token keys are permanent and extracting them is risky.This criteria was already applied when dealing with key stores(P11Keystore class).
Yes, this feature is enabled for NSS only because it's the onlybackend we currently know that is affected by this memory "leak"issue. If there were any other software-token backend affected, wecan try this feature there too. HSMs shouldn't have any problem. Iprefer to take a more conservative approach and enable the featureonly in those cases in which it's really necessary. All other cases,default to the previous mechanism for freeing memory.
This does not replace the PhantomReference approach; both worktogether and are complementary. In cases where temporary keysfeature is disabled or when a temporary key client is not behavingcorrectly (i.e.: leaking stateful operations like "cipher" or"signature" in an intermediate state with the native keyinitialized), PhantomReference approach will be the last chance tofree memory. The native key object can be destroyed (C_DestroyObjectcall) either from the PhantomReference mechanism or from thetemporary keys mechanism. There shouldn't be any conflict betweenthem. If it's destroyed through temporary keys mechanism, then weknow that the P11Key object is alive (refereced) and thusPhantomReference destruction won't be taking place at the same time.Once the key is deleted, keyID is set to 0 and session to null.Thus, PhantomReference destruction won't have any effect whenexecuted later. If we think of the other case (when the key is freedby PhantomReference), we have a P11Key object with a native keyinitialized but with no references to it. Thus, destroyNativeKeymethod won't be called and SessionKeyRef.disposeNative is the onlymethod that will delete the key.
L157: that's right, synchronization has to be at class level. Fixed.
L1343: It's not the same session: this.session was assigned a newvalue (this.session = session;) before calling addObject.
L1363: removeObject is called for the session, insidesetKeyIDAndSession: "this.session.removeObject();". Null is set tothis.session instance variable after this call.
In regards to the refactorings you proposed, the problem I see withmoving key reference incrementing/decrementing to PKCS11.java isthat some operations are stateful. I.e.: encryption. When weinitialize the operation with C_EncryptInit, the key id is the 3rdparameter. Destroying the key id and then doing C_EncryptUpdatesounds incorrect to me. Have you tried the regression testing suiteafter this refactoring? (I see some parts commented). In regards toremoving the tmpNativeKey parameter (used to explicitly disable thefeature for new P11Key objects), how do you handle the P11KeyStorecase? We don't want temporary keys there.
Kind regards,
Martin.-

Re: RFR 6913047: SunPKCS11 memory leak

Reply via email to