I probably should have vetted this before hitting send... let me know if
you need any clarifications.
Cheers,
Peter.
On 23/08/2019 12:59 PM, Peter Firmstone wrote:
"...since at the time the industry believed that distributed objects
were going to save us from complexity.) Many of the sins of
serialization were committed in the desire to get that last .1%, but
the cost and benefit of that last .1% are woefully out of balance."
The following are probably a non goals, but something to consider or
keep in mind, relating to distributed objects:
The are three types of distributed objects:
1. Immutable value / data Object types.
2. Shared Mutable Objects.
3. Unshared Mutable Objects.
4. Remote Objects / Services (best for managing shared mutable state).
The second type of distributed object causes much pain and should be
discouraged. The first three types of distributed objects can have
class resolution issues, but these are solveable.
A lot of folks also have problems with deserialization Objects when
class visibility is different at both ends, I'm guessing this would be
the same for value types.
For example OSGi folk recommend using primitive parameter types for
remote OSGi services.
RMI annotates streams with codebase annotations. Jini Extensible
Remote Invocation used to do that too.
The problem with RMI codebase and Jini codebase annotations is if you
resolve your classes locally, you lose the codebase annotations, when
re-serializing data and because class visibility can be different at
different endpoints, end up with all sorts of class resolution
issues. "Class Loading Issues in Java™ RMIand Jini™ Network
Technology" by Michael Warres
https://pdfs.semanticscholar.org/143f/468fcbdafd20f2b8c27fe5e0a869913b641a.pdf
The solution of course is simple, ensure that you deserialize into the
same module that you serialized from, especially when deserializing in
another jvm, so class resolution is identical.
We serialize a lot of complex object graphs, none are circular. The
module used for serialization should have visiblity of the entire
graph of object classes.
So if we're using OSGi modules, and provide a network / remote service
(not to be confused with an OSGi remote service) we ensure the proxy's
for these services have the same module installed at the client and
server endpoints. The service is represented by a Java interface and
the client makes calls on the interfaces methods. This interface may
be implemented by what is called a smart proxy, which is encapsulated
by a module which is dynamically downloaded at runtime, or a
reflection Proxy using an InvocationHandler that is generated
dynamically.
We still provide an option for codebase annotations for client
parameter objects, where a client subclasses parameter types and pass
them to the service, but this is discouraged, it is provided for
backward compatibility only. Where the parameters are also
interfaces, the client can implement a remote object and pass it as a
parameter instead, in our system, this will cause a module to be
loaded in the server identical to that at the client to resolve the
remote object classes, without using stream codebase annotations.
Incidentally, if you're curious how this happens, a proxy is sent {I
guess you can call it a serialization proxy :) } and authenticated by
the remote end, security constraints applied, then the remote end asks
the proxy for a codebase URL,which is loaded into a ClassLoader with
controlled visibility, this is extensible using a ServiceProvider or
OSGi service, then the proxy is deserialized into this by calling a
method on the serialization proxy.
By limiting scope, we can still have 99% of the benefits of
distributed objects, without the pain.
Incidentally apart from the complexity of class resolution, what
really limited distributed computing was IPv4. IPv6 removes the
network addressing limitations placed on distributed computing.
So I'd make the following qualifications:
1. Use only primitive types when serializing between different
languages.
2. Serialize Java language Object types and primitives only between
jvm's when class visibility is uncontrolled.
3. When serializing other object types, ensure they are immutable if
shared and that class visibility is identical and managed at both
endpoints.
4. Do not serialize objects whose classes may not be resolveable
(when you need to depend on annotated streams and uncontrolled
class resolution for example), find another way to solve the
problem.
We've had a 20 years to iron out the wrinkles. :)
Regards,
Peter.
On 23/08/2019 7:36 AM, Peter Firmstone wrote:
Hi Sean,
Regarding the section entitled "Why not write a new serialization
library?", unlike the serialization libraries listed, our purpose was
to be able to securely deserialize untrusted data, while maintaining
backward serial form compatibility with Java Serialization, provided
it didn't compromise security.
We don't use blacklists or whitelists, we use permissions to grant
DeserializationPermission, it doesn't have the granularity of white
lists, but then, classes that implement @AtomicSerial are supposed to
be hardened implementations in any case.
If it can be of use, feel free to experiment with it, hopefully it
might help with some of your design decisions:
https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/io
Much of the code on this site provides implementation examples as well.
Regards,
Peter.
On 20/08/2019 7:55 AM, Sean Mullan wrote:
Brian Goetz (copied) has done a lot of thinking in the serialization
area, so I have copied him. Not sure if you have seen it but he
recently posted a document about some of his ideas and possible
future directions for serialization:
http://cr.openjdk.java.net/~briangoetz/amber/serialization.html
--Sean
On 8/17/19 10:22 PM, Peter Firmstone wrote:
Thanks Sean,
You've gone to some trouble to answer my question, which
demonstrates you have considered it.
I donate some time to help maintain Apache River, derived from
Sun's Jini. Once Jini depended on RMI, today, not so much, it
still has some dependencies on some RMI interfaces, but doesn't
utilise JRMP although it provides some backward compatibilty enable
it.
But my point is, we heavily utilise java Serialization, and have an
independant implementation of a subset of Java Serialization
(originating from Apache Harmony). We do this for security as we
use an annotated serialization constructor. Serial form is
unchanged, we have Serializers for commonly used java library
objects, for example, we have a "PermissionSerializer", but we
don't have a "PermissionCollectionSerializer" or
"PermissionsSerializer" (for java.security.Permissions).
Incidentally, we have found we do not need the ability to serialize
circular object graphs. Throwable is an object that has a
circular object graph, but that circular object graph can be linked
up after deserialization.
Permission implementing Serializable is probably not too much of a
threat, as these objects are effectively immutable after lazy
initialization.
ProtectionDomain calls java.security.Permissions::setReadOnly
during it's construction.
ProtectionDomain::getPermissions returns internal
java.security.Permissions. If this is serialized, then the
readOnly internal state can be written to as the internal object
references are accessible from within the stream.
Admitedly, the attacker would already need to have some privilege,
to have access to a ProtectionDomain, so it's a path of privilege
escallation. I'm not talking about gadget attacks and
deserialization of untrusted data, I'm talking about breaking
encapsulation.
Even though we are heavily dependant on Java Serialization, we are
very careful when we implement it, and avoid implementing it when
possible. Hindsight is 20:20, but given we are now seeing some Java
SE backward compatibility breakages, perhaps it might be worth
considering breaking serialization. I don't mean we need to
necessarily break object serial form, but making the Java
serialization API explicit with subset of existing api features,
that makes long term maintenace and security less of a burden and
removing support for Serialization of some objects, where it is
seldom used, perhaps using a JEP that requests developers to
consider which library objects actually need to be serializable.
Something we do in our Java Serialization API is require that
mutable deserialized objects are defensively copied during object
construction (serial fields are deserialized before an object is
constructed, the deserialized fields are accessible via a parameter
passed in during construction. We have tools that assist
developers to check deserialized Java Collections contain the
expected object types for example, so during object construction
the developer has to replace the Collection with a new instance and
copy the contents to the new Collection after checking the type of
each object contained therein. Also we don't actually serialize
Java Collections, we have standard serial forms for List, Set and
Map, so these serial forms are equal, similar to the List, Set and
Map contracts. By doing this, Collections don't actually need to
implement Serializable at all, as a Serializer becomes responsible
for their serialization. This also means that all Collections
must be accessed by interfaces, rather than implementation classes,
so the deserialization constructor, must defensively copy them into
their preferred Collection instance. It's a bit like dependency
injection.
I know it would take time, and there would be some pain, but long
term it would save a lot of maintenance developer time.
Regards,
Peter.
On 17/08/2019 12:50 AM, Sean Mullan wrote:
On 8/15/19 8:18 PM, Peter Firmstone wrote:
Hi Roger,
+1 for writeReplace
Personally I'd like to see some security classes break backward
compatibility and remove support for serialization as it allows
someone to get references to internal objects, especially since
these classes are cached by the JVM. Which makes
PermissionCollection.setReadOnly() very easy to bypass, by adding
permissions to internal collections once you have a reference to
them.
Does anyone have any use cases for serializing these objects?
These objects are easy to re-create by sending or recieving and
parsing strings, because they are built from text based policy
files, and when you do that, you are validating input, so I never
did fully understand why they were made serializable.
This is briefly explained on page 61 in the "Inside Java 2
Platform Security" book [1]:
"The Permission class implements two interfaces:
java.security.Guard and java.io.Serializable. For the latter, the
intention is that Permission objects may be transported to remote
machines, such as via Remote Method Invocation (RMI), and thus a
Serializable representation is useful."
The Permission class was introduced in Java SE 1.2 so there were
different motivations back then :)
--Sean
[1] https://www.oracle.com/technetwork/java/javaee/index-141918.html