The future of Serialization

Peter Firmstone peter.firmstone at
Mon Aug 11 10:12:14 UTC 2014


Thanks for picking up on my frustration ;)

I have something in mind for Serializable2 to address cyclic data 
structures and the possibility of independant evolution of super and 
child classes, while retaining a relatively clean public api, with one 
optional private method.  The methods and interfaces proposed are 
suitable for any alternative ObjectInput and ObjectOutput implementation.

An interface exists in Apache River, it's called Startable, it has one 

public void start() throws Exception;

It's called by a framework to allow an Object to start threads, publish 
"this" or throw an exception after construction.  The intent is to allow 
an object to be immutable with final fields and be provided with a 
thread of execution after construction and before publication.

Something similar can be used to wire up circular relations, let met 

Every class that implements Serializable has one thing in common, the 
Serialization protocol and every Object instance of a Serializable class 
has an arbitrary serial form.

I propose a final class representing SerialForm for an object, that 
cannot be extended, requires privilege to instantiate and also performs 
method guard security checks, for all callers with the exception of a 
calling class reading or writing its own serial form.  SerialForm needs 
a parameter field key identity represented by the calling Class, the 
method name and the field's Class type, this key would be used for both 
writing and retrieving a field entry in SerialForm. SerialForm will also 
provide a method to advise if a field key contains a circular relation, 
any field entry in SerialForm that would contain a circular relation is 
not populated until after construction of the current object is complete.

An arbitrary Serializable2 Object instance may be composed of a 
hierarchy of classes, each belonging to a separate ProtectionDomain.

For the following interface:

public interface Serializable2 {

     void writeObject(SerialForm serial) throws IOException;


Implementers of Serializable2 must:

   1. Implement writeObject
   2. Implement a constructor with the signature:  (SerialForm serial).

Implementors that need to check invariants, delay throwing an Exception, 
publish "this" or set a circular reference after construction should:

   4. Implement: private void readObjectNoData() throws

Child class implementations should:

   5. Call their super class writeObject method and superclass
      constructor, but may call any super class constructor or methods.

Compatibility and Evolution:

   1. Fields can be included or omitted from SerialForm, by an
      implementation, without breaking compatibility, provided a null
      reference is accepted during deserialization.
   2. Child classes in a hierarchy;  all Serializable2 implementing
      superclass constructors have the same signature; the superclass
      implementation can be substituted, without breaking child class
      deserialization (provided this is the constructor used by the
      child class).
   3. There is no serialVersionUID.
   4. Child class Serializable2 implementations can extend a superclass
      without a zero arg constructor that doesn't itself implement
   5. Child classes that do not override writeObject will not be
      serialized, so can effectively opt out.
   6. Because implementations are required to implement public methods,
      there is no "Magic".
   7. Serializable2 shouldn't extend Serializable, allowing classes to
      implement both interfaces for a period of time (for that reason
      the signature for readObjectNoData may need to be changed for
   8. ObjectInputStream and ObjectOutputStream can be extended to
      support both implementations for compatibility, however
      alternative stream implementations would be preferable for
      Serializable2 to avoid Serializable security issues.  The new
      implementations should be possible to substitute because both
      types would use the same Stream Protocol, provided the classes
      being deserialized implement Serializable2.

My reasoning for retaining readObjectNoData() and for updating field 
entry's in SerialForm that contain circular relations after 
construction, is:

   1. An object reference for the object currently being deserialized
      can be passed to another object's constructor (via a SerialForm
      instance) after the current Object's constructor completes,
      allowing safe publication of final field freezes that occur at the
      end of construction.
   2. When the Serialization2 Framework becomes aware of an object that
      contains a circular relationship while that object is in the
      process of being deserialized, the second object will not be
      instantiated until after the constructor of the first object in
      the relationship completes.  Data read in from the stream can be
      stored in a SerialForm without requiring object instantation.
   3. After construction completes, the object that has just been
      deserialized can retain a copy of its SerialForm and look up the
      field containing a circular relationship, the Serialization
      framework will update its SerialForm with the new object that
      holds a circular relationship, prior to calling readObjectNoData()
      on the first object.
   4. If the developer of the implementing class is not aware of the
      possibility of a circular relationship, then the worst consequence
      is a field will be set to null during construction, "this" will
      not escape.
   5. The second Object holding a link to an object that apears earlier
      in the stream, may not be aware that the object it holds a
      reference to also needs a reference to it.  The first object will
      not obtain a reference to the second until both Object
      constructors have completed.  The second object may not need to
      implement readObjectNoData().
   6. readObjectNoData() needs to be called on every class belonging to
      a single Object's inheritance hierarchy, when defined, after all
      constructors have completed, it should be called in the order of
      superclass to child class.




On 10/08/2014 3:20 AM, Brian Goetz wrote:
>> I've noticed there's not much interest in improving Serialization on
>> these lists.  This makes me wonder if java Serialization has lost
>> relevance in recent years with the rise of protocol buffers apache
>> thrift and other means of data transfer over byte streams.
> I sense your frustration, but I think you may be reaching the wrong 
> conclusion.  The lack of response is probably not evidence that 
> there's no interest in fixing serialization; its that fixing 
> serialization, with all the constraints that "fix" entails, is just 
> really really hard, and its much easier to complain about it (and even 
> say "let's just get rid of it") than to fix it.
>> Should Serializable eventually be deprecated? Should Serialization be
>> disabled by default? Should a new mechanism be developed? If a new
>> mechanism is developed, what about circular object relationships?
> As I delved into my own explorations of serialization, I started to 
> realize why such a horrible approach was the one that was ultimately 
> chosen; while serialization is horrible and awful and leaky and 
> insecure and complex and brittle, it does address problems like cyclic 
> data structures and independent evolution of subclass and superclass 
> better than the "clean" models.
> My conclusion is, at best, a new mechanism would have to live 
> side-by-side with the old one, since it could only handle 95% of the 
> cases.  It might handle those 95% much better -- more cleanly, 
> securely, and allowing easier schema evolution -- but the hard cases 
> are still there.  Still, reducing the use of the horrible old 
> mechanism may still be a worthy goal, even if it can't be killed 
> outright.

More information about the core-libs-dev mailing list