Brian,

Thanks for picking up on my frustration ;)

I have something in mind for Serializable2 to address cyclic data structures and the possibility of independant evolution of super and child classes, while retaining a relatively clean public api, with one optional private method. The methods and interfaces proposed are suitable for any alternative ObjectInput and ObjectOutput implementation.

An interface exists in Apache River, it's called Startable, it has one method:

public void start() throws Exception;

It's called by a framework to allow an Object to start threads, publish "this" or throw an exception after construction. The intent is to allow an object to be immutable with final fields and be provided with a thread of execution after construction and before publication.

Something similar can be used to wire up circular relations, let met explain:

Every class that implements Serializable has one thing in common, the Serialization protocol and every Object instance of a Serializable class has an arbitrary serial form.

I propose a final class representing SerialForm for an object, that cannot be extended, requires privilege to instantiate and also performs method guard security checks, for all callers with the exception of a calling class reading or writing its own serial form. SerialForm needs a parameter field key identity represented by the calling Class, the method name and the field's Class type, this key would be used for both writing and retrieving a field entry in SerialForm. SerialForm will also provide a method to advise if a field key contains a circular relation, any field entry in SerialForm that would contain a circular relation is not populated until after construction of the current object is complete.

An arbitrary Serializable2 Object instance may be composed of a hierarchy of classes, each belonging to a separate ProtectionDomain.

For the following interface:

public interface Serializable2 {

    void writeObject(SerialForm serial) throws IOException;

}

Implementers of Serializable2 must:

  1. Implement writeObject
  2. Implement a constructor with the signature:  (SerialForm serial).

Implementors that need to check invariants, delay throwing an Exception, publish "this" or set a circular reference after construction should:

  4. Implement: private void readObjectNoData() throws
     InvalidObjectException;

Child class implementations should:

  5. Call their super class writeObject method and superclass
     constructor, but may call any super class constructor or methods.

Compatibility and Evolution:

  1. Fields can be included or omitted from SerialForm, by an
     implementation, without breaking compatibility, provided a null
     reference is accepted during deserialization.
  2. Child classes in a hierarchy;  all Serializable2 implementing
     superclass constructors have the same signature; the superclass
     implementation can be substituted, without breaking child class
     deserialization (provided this is the constructor used by the
     child class).
  3. There is no serialVersionUID.
  4. Child class Serializable2 implementations can extend a superclass
     without a zero arg constructor that doesn't itself implement
     Serializable2.
  5. Child classes that do not override writeObject will not be
     serialized, so can effectively opt out.
  6. Because implementations are required to implement public methods,
     there is no "Magic".
  7. Serializable2 shouldn't extend Serializable, allowing classes to
     implement both interfaces for a period of time (for that reason
     the signature for readObjectNoData may need to be changed for
     Serializable2).
  8. ObjectInputStream and ObjectOutputStream can be extended to
     support both implementations for compatibility, however
     alternative stream implementations would be preferable for
     Serializable2 to avoid Serializable security issues.  The new
     implementations should be possible to substitute because both
     types would use the same Stream Protocol, provided the classes
     being deserialized implement Serializable2.


My reasoning for retaining readObjectNoData() and for updating field entry's in SerialForm that contain circular relations after construction, is:

  1. An object reference for the object currently being deserialized
     can be passed to another object's constructor (via a SerialForm
     instance) after the current Object's constructor completes,
     allowing safe publication of final field freezes that occur at the
     end of construction.
  2. When the Serialization2 Framework becomes aware of an object that
     contains a circular relationship while that object is in the
     process of being deserialized, the second object will not be
     instantiated until after the constructor of the first object in
     the relationship completes.  Data read in from the stream can be
     stored in a SerialForm without requiring object instantation.
  3. After construction completes, the object that has just been
     deserialized can retain a copy of its SerialForm and look up the
     field containing a circular relationship, the Serialization
     framework will update its SerialForm with the new object that
     holds a circular relationship, prior to calling readObjectNoData()
     on the first object.
  4. If the developer of the implementing class is not aware of the
     possibility of a circular relationship, then the worst consequence
     is a field will be set to null during construction, "this" will
     not escape.
  5. The second Object holding a link to an object that apears earlier
     in the stream, may not be aware that the object it holds a
     reference to also needs a reference to it.  The first object will
     not obtain a reference to the second until both Object
     constructors have completed.  The second object may not need to
     implement readObjectNoData().
  6. readObjectNoData() needs to be called on every class belonging to
     a single Object's inheritance hierarchy, when defined, after all
     constructors have completed, it should be called in the order of
     superclass to child class.

Thoughts?

Regards,

Peter.

On 10/08/2014 3:20 AM, Brian Goetz wrote:
I've noticed there's not much interest in improving Serialization on
these lists.  This makes me wonder if java Serialization has lost
relevance in recent years with the rise of protocol buffers apache
thrift and other means of data transfer over byte streams.

I sense your frustration, but I think you may be reaching the wrong conclusion. The lack of response is probably not evidence that there's no interest in fixing serialization; its that fixing serialization, with all the constraints that "fix" entails, is just really really hard, and its much easier to complain about it (and even say "let's just get rid of it") than to fix it.

Should Serializable eventually be deprecated? Should Serialization be
disabled by default? Should a new mechanism be developed? If a new
mechanism is developed, what about circular object relationships?

As I delved into my own explorations of serialization, I started to realize why such a horrible approach was the one that was ultimately chosen; while serialization is horrible and awful and leaky and insecure and complex and brittle, it does address problems like cyclic data structures and independent evolution of subclass and superclass better than the "clean" models.

My conclusion is, at best, a new mechanism would have to live side-by-side with the old one, since it could only handle 95% of the cases. It might handle those 95% much better -- more cleanly, securely, and allowing easier schema evolution -- but the hard cases are still there. Still, reducing the use of the horrible old mechanism may still be a worthy goal, even if it can't be killed outright.


Reply via email to