<i18n dev> RFR: 8261418: Reduce decoder creation overheads for sun.nio.cs.ext Charsets
naoto at openjdk.java.net
Tue Feb 9 19:24:40 UTC 2021
On Tue, 9 Feb 2021 12:54:12 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> This refactor some `sun.nio.cs.ext` charsets, such as ISO-2022-CN-GB, ISO-2022-CN-CNS, ISO-2022-KR and a few others to use static rather than per-instance auxiliary decoders. Doing so reduce overheads of calling `charset.newDecoder()`. This reduce or eliminate regressions on `new String(byte, String)` operations due the removal of thread-local decoder caching in [JDK-8259842](https://bugs.openjdk.java.net/browse/JDK-8259842)
> Most ISO-2022 Charsets define a specialized decoder already. The `ISO2022.Decoder` class was only used by `ISO2022_KR`, so folding it into that implementation and simplifying the code brings a rather significant speed-up, both to decoder creation and on actual decoding.
> Testing: tier1-3, manual runs of sun.nio.cs tests
Thanks, Claes, for making the performance improvement. I have a few comments/questions.
src/jdk.charsets/share/classes/sun/nio/cs/ext/EUC_JP.java.template line 116:
> 114: int sp = src.arrayOffset() + src.position();
> 115: int sl = src.arrayOffset() + src.limit();
I see these are removed from encode/decodeArrayLoop(s). Any reason behind those?
src/jdk.charsets/share/classes/sun/nio/cs/ext/ISO2022.java line 64:
> 63: protected final byte maximumDesignatorLength = 4;
This implementation moved to KR concrete implementation class. IIUC, this is the default impl for generic ISO2022 spec, so I believe the code being here is more reasonable. Any performance gain by moving this to KR specific class?
src/jdk.charsets/share/classes/sun/nio/cs/ext/ISO2022_CN.java line 88:
> 86: private byte currentSODesig;
> 88: private static final DoubleByte.Decoder gb2312Decoder =
Since this is a shared constant, I'd use all caps field name.
More information about the i18n-dev