"...since at the time the industry believed that distributed objects were going to save us from complexity.) Many of the sins of serialization were committed in the desire to get that last .1%, but the cost and benefit of that last .1% are woefully out of balance."

The following are probably a non goals, but something to consider or keep in mind, relating to distributed objects:

The are three types of distributed objects:

  1. Immutable value / data Object types.
  2. Shared Mutable Objects.
  3. Unshared Mutable Objects.
  4. Remote Objects / Services (best for managing shared mutable state).

The second type of distributed object causes much pain and should be discouraged. The first three types of distributed objects can have class resolution issues, but these are solveable.

A lot of folks also have problems with deserialization Objects when class visibility is different at both ends, I'm guessing this would be the same for value types.

For example OSGi folk recommend using primitive parameter types for remote OSGi services.

RMI annotates streams with codebase annotations. Jini Extensible Remote Invocation used to do that too.

The problem with RMI codebase and Jini codebase annotations is if you resolve your classes locally, you lose the codebase annotations, when re-serializing data and because class visibility can be different at different endpoints, end up with all sorts of class resolution issues. "Class Loading Issues in Java™ RMIand Jini™ Network Technology" by Michael Warres
https://pdfs.semanticscholar.org/143f/468fcbdafd20f2b8c27fe5e0a869913b641a.pdf

The solution of course is simple, ensure that you deserialize into the same module that you serialized from, especially when deserializing in another jvm, so class resolution is identical.

We serialize a lot of complex object graphs, none are circular. The module used for serialization should have visiblity of the entire graph of object classes.

So if we're using OSGi modules, and provide a network / remote service (not to be confused with an OSGi remote service) we ensure the proxy's for these services have the same module installed at the client and server endpoints. The service is represented by a Java interface and the client makes calls on the interfaces methods. This interface may be implemented by what is called a smart proxy, which is encapsulated by a module which is dynamically downloaded at runtime, or a reflection Proxy using an InvocationHandler that is generated dynamically.

We still provide an option for codebase annotations for client parameter objects, where a client subclasses parameter types and pass them to the service, but this is discouraged, it is provided for backward compatibility only. Where the parameters are also interfaces, the client can implement a remote object and pass it as a parameter instead, in our system, this will cause a module to be loaded in the server identical to that at the client to resolve the remote object classes, without using stream codebase annotations.

Incidentally, if you're curious how this happens, a proxy is sent {I guess you can call it a serialization proxy :) } and authenticated by the remote end, security constraints applied, then the remote end asks the proxy for a codebase URL,which is loaded into a ClassLoader with controlled visibility, this is extensible using a ServiceProvider or OSGi service, then the proxy is deserialized into this by calling a method on the serialization proxy.

By limiting scope, we can still have 99% of the benefits of distributed objects, without the pain.

Incidentally apart from the complexity of class resolution, what really limited distributed computing was IPv4. IPv6 removes the network addressing limitations placed on distributed computing.

So I'd make the following qualifications:

  1. Use only primitive types when serializing between different languages.
  2. Serialize Java language Object types and primitives only between
     jvm's when class visibility is uncontrolled.
  3. When serializing other object types, ensure they are immutable if
     shared and that class visibility is identical and managed at both
     endpoints.
  4. Do not serialize objects whose classes may not be resolveable
     (when you need to depend on annotated streams and uncontrolled
     class resolution for example), find another way to solve the problem.

We've had a 20 years to iron out the wrinkles. :)

Regards,

Peter.

On 23/08/2019 7:36 AM, Peter Firmstone wrote:
Hi Sean,

Regarding the section entitled "Why not write a new serialization library?", unlike the serialization libraries listed, our purpose was to be able to securely deserialize untrusted data, while maintaining backward serial form compatibility with Java Serialization, provided it didn't compromise security.

We don't use blacklists or whitelists, we use permissions to grant DeserializationPermission, it doesn't have the granularity of white lists, but then, classes that implement @AtomicSerial are supposed to be hardened implementations in any case.

If it can be of use, feel free to experiment with it, hopefully it might help with some of your design decisions:

https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/io

Much of the code on this site provides implementation examples as well.

Regards,

Peter.

On 20/08/2019 7:55 AM, Sean Mullan wrote:
Brian Goetz (copied) has done a lot of thinking in the serialization area, so I have copied him. Not sure if you have seen it but he recently posted a document about some of his ideas and possible future directions for serialization: http://cr.openjdk.java.net/~briangoetz/amber/serialization.html

--Sean

On 8/17/19 10:22 PM, Peter Firmstone wrote:
Thanks Sean,

You've gone to some trouble to answer my question, which demonstrates you have considered it.

I donate some time to help maintain Apache River, derived from Sun's Jini. Once Jini depended on RMI, today, not so much, it still has some dependencies on some RMI interfaces, but doesn't utilise JRMP although it provides some backward compatibilty enable it.

But my point is, we heavily utilise java Serialization, and have an independant implementation of a subset of Java Serialization (originating from Apache Harmony). We do this for security as we use an annotated serialization constructor. Serial form is unchanged, we have Serializers for commonly used java library objects, for example, we have a "PermissionSerializer", but we don't have a "PermissionCollectionSerializer" or "PermissionsSerializer" (for java.security.Permissions). Incidentally, we have found we do not need the ability to serialize circular object graphs. Throwable is an object that has a circular object graph, but that circular object graph can be linked up after deserialization.

Permission implementing Serializable is probably not too much of a threat, as these objects are effectively immutable after lazy initialization.

ProtectionDomain calls java.security.Permissions::setReadOnly during it's construction.

ProtectionDomain::getPermissions returns internal java.security.Permissions. If this is serialized, then the readOnly internal state can be written to as the internal object references are accessible from within the stream.

Admitedly, the attacker would already need to have some privilege, to have access to a ProtectionDomain, so it's a path of privilege escallation. I'm not talking about gadget attacks and deserialization of untrusted data, I'm talking about breaking encapsulation.

Even though we are heavily dependant on Java Serialization, we are very careful when we implement it, and avoid implementing it when possible. Hindsight is 20:20, but given we are now seeing some Java SE backward compatibility breakages, perhaps it might be worth considering breaking serialization. I don't mean we need to necessarily break object serial form, but making the Java serialization API explicit with subset of existing api features, that makes long term maintenace and security less of a burden and removing support for Serialization of some objects, where it is seldom used, perhaps using a JEP that requests developers to consider which library objects actually need to be serializable.

Something we do in our Java Serialization API is require that mutable deserialized objects are defensively copied during object construction (serial fields are deserialized before an object is constructed, the deserialized fields are accessible via a parameter passed in during construction. We have tools that assist developers to check deserialized Java Collections contain the expected object types for example, so during object construction the developer has to replace the Collection with a new instance and copy the contents to the new Collection after checking the type of each object contained therein. Also we don't actually serialize Java Collections, we have standard serial forms for List, Set and Map, so these serial forms are equal, similar to the List, Set and Map contracts. By doing this, Collections don't actually need to implement Serializable at all, as a Serializer becomes responsible for their serialization. This also means that all Collections must be accessed by interfaces, rather than implementation classes, so the deserialization constructor, must defensively copy them into their preferred Collection instance. It's a bit like dependency injection.

I know it would take time, and there would be some pain, but long term it would save a lot of maintenance developer time.

Regards,

Peter.

On 17/08/2019 12:50 AM, Sean Mullan wrote:
On 8/15/19 8:18 PM, Peter Firmstone wrote:
Hi Roger,

+1 for writeReplace

Personally I'd like to see some security classes break backward compatibility and remove support for serialization as it allows someone to get references to internal objects, especially since these classes are cached by the JVM. Which makes PermissionCollection.setReadOnly() very easy to bypass, by adding permissions to internal collections once you have a reference to them.

Does anyone have any use cases for serializing these objects?

These objects are easy to re-create by sending or recieving and parsing strings, because they are built from text based policy files, and when you do that, you are validating input, so I never did fully understand why they were made serializable.

This is briefly explained on page 61 in the "Inside Java 2 Platform Security" book [1]:

"The Permission class implements two interfaces: java.security.Guard and java.io.Serializable. For the latter, the intention is that Permission objects may be transported to remote machines, such as via Remote Method Invocation (RMI), and thus a Serializable representation is useful."

The Permission class was introduced in Java SE 1.2 so there were different motivations back then :)

--Sean

[1] https://www.oracle.com/technetwork/java/javaee/index-141918.html



Reply via email to