RFR 8025003: Base64 should be less strict with padding
xueming.shen at oracle.com
Wed Nov 13 16:28:20 UTC 2013
On 11/12/13 11:44 PM, Bill Shannon wrote:
> Xueming Shen wrote on 11/12/2013 09:24 PM:
>> On 11/12/13 8:21 PM, Bill Shannon wrote:
>>> Xueming Shen wrote on 11/12/2013 04:25 PM:
>>>> On 11/12/2013 03:32 PM, Bill Shannon wrote:
>>>>> This still seems like an inconsistent, and inconvenient, approach to me.
>>>>> You've decided that some encoding errors (i.e., missing pad characters)
>>>>> can be ignored. You're willing to assume that the missing characters aren't
>>>>> missing data but just missing padding. But if you find a padding character
>>>>> where you don't expect it you won't assume that the missing data is zero.
>>>> "missing pad characters" in theory is not an encoding errors. As the RFC
>>>> suggested, the
>>>> use of padding in base64 data is not required or used. They mainly serve the
>>>> purpose of
>>>> providing the indication of "end of the data". This is why the padding
>>>> character(s) is not
>>>> required (optional) by our decoder at first place. However, if the padding
>>>> character(s) is
>>>> present, they need to be correctly encoded, otherwise, it's a malformed base64
>>> I think we're interpreting the spec differently.
>> I meant to say "The RFC says the use of padding in base64 data is not required
>> nor used, in some circumstances".
>> I interpret it as the padding is optional in some circumstances.
> It's never optional. There's two specific cases in which it's required
> and one specific case in which it is not present.
My apology, It appears we are not talking about the same thing. What I'm
trying to say is
that whether or not to USE the padding characters "=" is optional for
base encoding "FOR
SOME CIRCUMSTANCES". Maybe it's more clear to just cite the original
In some circumstances, the use of padding ("=") in base encoded data
is not required nor used. In the general case, when assumptions on
size of transported data cannot be made, padding is required to yield
correct decoded data.
Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.
My interpretation is that it is possible for some types/styles of Base64
it is optional to not generate the "padding" character at the end of the
Though the RFC requires if it does omitting the padding character, it
need to explicitly
specify this in its spec.
When encoding the existing implementation, by default, always add the
at the end of the encoded stream, if needed (for xx==, xxx=). Decoder is
try to be "liberal"/
lenient in what your accept (with the assumption is that the encoded may
come from some
encoder that not generate the padding characters), so it accept data
with padding and
dta without padding. However, it requires that if padding characters are
used, it need
to be CORRECTLY encoded. That was the original specification and
Upon your original request, I made the compromise to give MIME type a
spec/implementation for "incorrect" padding character combination as
Patterns of possible incorrectly encoded padding final base64 unit are:
xxxx = unnecessary padding character at the end of encoded stream
xxxx xx= missing the last padding character
xxxx xx=y missing the last padding character, instead having a non-padding char
Now it appears this compromise became part of your complain.
Our difference is that I believe the "padding character" is not part of
data, we can be "liberal"/lenient for that. But "x===" (or simply a
is missing part of the original data for decoding, I'm concerned about to be
liberal on guessing what is missed.
More information about the core-libs-dev