Improve Java serialization with APPCDS (as a fast path)
timberonce at gmail.com
Thu May 24 14:32:33 UTC 2018
I am targeting on JVMs on both the same machine and network.
To solve the problem where different JVMs may use different archives,
we can introduce some JVM options.
For example, a possible option can be:
-XX:FastSerializationFile=/tmp/a.jsa. In this way, the JVM will only
use a.jsa for serialization. Besides, the serializer can encode a hash
value of the content in a.jsa into serialized bytes. In this way, only
JVMs use a.jsa for fast serialization can deserialize successfully.
Failure on hash value check will cause a runtime exception.
Even if these settings may cause exceptions, I think since users are
aware that they are using APPCDS, they can naturally extend the
-XX:SharedArchiveFile to -XX:FastSerializationFile (which is much
easier than using application-level serializer) to avoid unnecessary
exceptions. Besides, since APPCDS is mainly designed to save memory
space and class loading time for multiple JVMs, we can reduce the
communication traffic among them by the way with fast serialization.
In my opinion, this option will not be hard for users to understand
On 5/23/18 9:17 PM, Ioi Lam wrote:
> Hi Mingyu,
> I think this is a very interesting idea. Are you thinking about
> serialize/deserialize with the same JVM on the same host, or different
> JVMs across the network.
> Currently different JVM installations can have different CDS archives.
> You can even use different archives for the same JVM by running with
> $JAVA_HOME/bin/java -XX:SharedArchiveFile=/tmp/a.jsa
> $JAVA_HOME/bin/java -XX:SharedArchiveFile=/tmp/b.jsa
> So there's no guarantee that the same class will have the same ID in
> these 2 JVM process.
> - Ioi
> On 5/23/18 7:55 PM, Mingyu Wu wrote:
> > Hi all,
> > APPCDS is a very interesting optimization aiming at reducing memory
> > footprint and class loading overhead for multiple Java processes. In my
> > opinion, it can also be used in other scopes such as serialization.
> > Currently, Java serializer is slow and induces a large footprint (compared
> > to application-level serializer). A major problem is that the serializer
> > should write the description of classes into the serialized bytes, which
> > increases the total memory consumption.
> > On the other hand, application-level (or 3rd-party) serializers like Kryo
> > can reduce the memory footprint by requiring users to assign IDs to certain
> > classes manually.
> > This assignment step should be finished very carefully to avoid
> > inconsistency problem among different JVMs, so application-level
> > serializers are not that easy to use.
> > However, we can actually borrow the idea from application-level serializers
> > with APPCDS (or even CDS) enabling. Consider we already have dumped a class
> > list below:
> > java/lang/Object
> > java/lang/String
> > ......
> > We can assign IDs directly to those classes according to the order in the
> > class list:
> > java/lang/Object 0
> > java/lang/String 1
> > ......
> > Since multiple JVMs will share the same APPCDS archive correspondence with
> > the class list, those JVMs can directly use IDs to serialize/deserialize
> > the classes stored in the archive. This avoids writing class descriptions
> > into serialization bytes and simplifies the serialization/deserialization
> > phase. Furthermore, it also saves users from manually assigning IDs to
> > classes.
> > Note that APPCDS only provides a fast path for ser/deser. If a class is not
> > on the class list (and the archive), the serializer falls back to class
> > description. However, the fast path can become more efficient with more
> > advanced features, such as supporting custom classloaders.
> > Anyway, I think APPCDS is a good fit to improve Java serialization.
> > I am willing to take suggestions!
> > Mingyu
Institute of Parallel and Distributed Systems
School of Software Engineering
Shanghai Jiao Tong University
More information about the hotspot-runtime-dev