Lower overhead String encoding/decoding

Richard Warburton richard.warburton at gmail.com
Sun Oct 19 22:13:14 UTC 2014


Hi Richard, couple comments after a quick scan.

Thanks for your comments - only just had a post Javaone chance to look at
this stuff. I've pushed an update to:


(1)  #474, the IOBE probably is no longer needed in case of ByteBuffer.
> parameter.

Fixed in new push.

(2) for methods that have the ByteBuffer as input, it would be desirable to
> specify clearly that
>      the bytes are read from "position" to "limit", and whether or not the
> "position" will be advanced
>      after decoding/encoding (important).

Added Javadoc in new push.

(3) getBytes(byte[], offset, charset)
>      while I understand it might be useful to have such a method in
> certain circumstance, it is
>      usually complicated to make it right/easy for user to actually use
> it. Consider the fact that
>      the returned number of bytes encoded has no straightforward
> connection to how many
>      underlying chars have been encoded (so the user of this method really
> have no idea how
>      many underlying "chars" have been encoded into the dest buffer, is
> that enough? how big
>      the buffer need to be to encode the whole string? ....) especially
> the possibility that the last
>      couple byte space might be short of encoding a "char". Not like the
> getChars(), which has
>      a easy, clear and direct link between the out going chars and
> underlying chars. I would
>      suggest it might be better to leave it out.

I think you raise some good points here. In my mind this is how it works,
let me know what you think.

The client code of this API call should be able to infer the state of the
byte[] - the offset parameters is in terms of bytes and refers to the
destination buffer. If you want to know how many bytes have been used by
the encoding, the method returns an int that shows the length.

You can query the length of the String, which gives you the number of
outbound chars.

(4) StringCoding.decode() #239  "remaining()" should be used to return
> limit - position?

Fixed in new push.

(5) in case of "untrusted", it might be more straightforward to get all
> "bytes" out of the buffer
>      first (you are allocating a byte buffer here anyway, I don;t see
> obvious benefit to get a
>      direct buffer, btw) and then pass it to the original/existing
> byte[]->char[] decoding
>      implementation. We probably will take a deep look at the
> implementation later when
>      the public api settled.

Yes - I was hoping to remove the additional bytebuffer allocation within
the method. I'm still not sure of the security implications around removing
this code though.


  Richard Warburton

  @RichardWarburto <http://twitter.com/richardwarburto>

More information about the core-libs-dev mailing list