RFR: 8043592: The basic XML parser based on UKit fails to read XML files encoded in UTF-16BE or LE

huizhe wang huizhe.wang at oracle.com
Thu May 22 16:30:01 UTC 2014

Refer to 8042889, while verifying/testing 8042889, we noticed that the 
tiny XML parser failed on UTF-16BE or LE. The cause of the failure was 
that the parser was actually implemented to abide by the XML 
specification that required entities encoded in UTF-16 to begin with 
BOM. The test we used sent a byte array to the parser without BOM, thus 

Since it's not uncommon for a XML to not have BOM, I borrowed the 
technique used in Xerces to add an additional check for UTF-16 
encoding.  Please review.



More information about the core-libs-dev mailing list