fastpath for StringCoding.java
Xueming.Shen at Sun.COM
Sat Mar 21 07:03:15 UTC 2009
The charsets being benchmarked here are all singlebyte charsets built on
top of SingleByte.De/Encoder
(excluding ASCII and 8859-1), which have exactly the same lookup
mechanism of a b2c for decoding
and a c2bIndex+c2b table for encoding. For these charsets what/which
codepoint(s) are used to benchmark
the performance actually is irrelevant as long as they are "mappable",
the benchmark only collect those
"mappable" codepoints (canEncode() returns true) as the testing data.
The rational behind this is that the "most
likely" real life use scenario of decoding/encoding is the "characters"
in and out are all "mappable" characters,
which is the performance really matters.
Ulf Zibis wrote:
> Am 20.03.2009 00:37, Xueming Shen schrieb:
>> Webrev has been updated to address your coments. I'm building the
>> binaries and will run the benchmark again
>> (up to 2048 this time) tonight.
> As I see, you take a seed from Random to create the test patterns.
> This approach would be good for testing stability of the coders, but
> for benchmark it wouldn't take to account the differing probability
> distribution of the code points in the particular charsets.
> For example see TIS-620. The code points from U+0e01 to U+0e5b would
> be probably more used than the latin chars and the unmappables.
> I accommodate this by different coders. See:
More information about the core-libs-dev