JDK9 project: XML/JAXP Approachability / Ease of Use

huizhe wang huizhe.wang at oracle.com
Fri Jun 27 05:39:40 UTC 2014

Thanks Michael! And, welcome to core-libs-dev! :-)

On 6/26/2014 4:02 AM, Michael Kay wrote:
> Here are some quick thoughts about the state of XML support in the JDK:
> 1. XML Parser. The version of Xerces in the JDK has long been buggy, and no-one has been fixing the bugs. It needs to be replaced with a more recent version of Apache Xerces if that hasn't already been done.

Yes, as Alan pointed out, we do have a project going on at the moment. 
The goal is to upgrade to the current version of Xerces, 2.11.0.  Also, 
we made updates during JDK 7 development, bringing in all of the 
blockers, critical fixes and half of the major fixes along the way.

> 2. DOM. From a usability perspective DOM is awful. There are much better alternatives available, for example JDOM2 or XOM. The only reason anyone uses DOM rather than these alternatives is either (a) because they aren't aware of the alternatives, or (b) because of some kind of perception that DOM is "more standard". If we want to address the usability of XML processing in the JDK then an alternative/replacement for DOM would seem to be a high priority. If someone wants a summary of the badness of DOM then I'll address that in a separate thread.

I agree that DOM is not particularly user/developer friendly. I don't 
have data to support an estimate on how popular DOM is, but since it's 
been a "standard", and we value compatibility so much, the first goal in 
the proposal is to allow users to quickly get to such objects and 
continue using their existing code to process them.

When we get into more low level then, what I would propose is for us to 
take a step back from the existing technologies/standards such as 
DOM/SAX/StAX and think like a developer would. For example, as a 
developer, all I want maybe is to search an xml file and find a piece of 
information, I don't necessarily need to know whether it's DOM or SAX, 
just as I don't need to know what technology is behind Google.

> 3. JAXP factory mechanism. While the goal of allowing multiple implementations of core interfaces such as DOM, XPath, and XSLT is laudable, the current mechanism has many failings. It's hideously expensive because of the classpath search; it's fragile because the APIs are so weakly defined that the implemntations aren't 100% interoperable (for example you have no way of knowing whether the XPath factory will give you an XPath 1.0 or 2.0 processor, and no way of finding out which you have got); so in practice we see lots of problems where applications get a processor that doesn't work as the applications expects, just because it happens to be lying around on the classpath.

Agree. It's an important mechanism to give users freedom of choice of 
impls they prefer, but has room to improve. In the case of XPath, should 
we start a separate thread to discuss how we can improve it?

> 4. XQJ. The XQJ spec never found its way into the JDK, so there is no out-of-the-box XQuery support. The licensing terms for XQJ are also unsatisfactory (the license doesn't allow modification, which purists say means it's not a true open source license).

True, it's in the DB line of products. I'm not familiar with the 
licensing terms for the spec.

> 5. General lack of integration across the whole XML scene, e.g, separate and potentially incompatible factories for different services;

We can explore more on this.  The example in the proposal is a possible 
case for parser & xpath integration.

> a lack of clarity as to whether the XPath API is supposed to handle object models other than DOM, etc;

The spec required impl to support the default object model and made it 
free for impls to introduce others.

> weak typing of interfaces such as setParameter() in order to accomodate variation, at the cost of interoperability.

StAX did better in this regard, with a list of specified properties. In 
case of setParameter(), it almost seemed to me that the author wanted to 
give impls room to specify their own parameters.

> 6. Failure to keep up to date with the W3C specs; if you want support for recent versions of XSLT or XPath then you need to go to third-party products. Even at the DOM level, namespaces are a bolt-on optional extra rather than an intrinsic capability.

We can discuss this in a separate thread as well.

> 7. Inconsistent policy on concrete classes versus interfaces.

Could you provide a few examples?

> Is this project attempting to address the fundamental problems, or only to paper over the cracks?

The goal of the project is to improve usability, making it more 
approachable: easy tasks should be easy to do. Not mean to completely 
redesign all the features, but focus on common use cases to provide 
better APIs to handle them.


>> We're planning on a jaxp project to address usability issues in the JAXP
>> API. One of the complaints about the JAXP API is the number of lines of
>> code that are needed to implement a simple task. Tasks that should take
>> one or two lines often take ten or twelve lines instead.
> Michael Kay
> Saxonica
> mike at saxonica.com
> +44 (0118) 946 5893

More information about the core-libs-dev mailing list