# Thoughts on unified integer literal improvements

Joseph D. Darcy Joe.Darcy at Sun.COM
Wed Jul 15 18:46:32 PDT 2009

```Catching up on email...

Bruce Chapman wrote:
> Joe Darcy wrote:
>> Hello.
>>
>> On the set of improved integer literal features, I think combining
>> the underscores as separators and binary literals is straightforward
>> given separately correct grammars for each change.
>>
>> As an alternate to "y" and "s" suffices, I suggesting considering a
>> "u" suffix to mean unsigned.  Literals with a trailing "u" would have
>> type int; widening conversions of such literals would 0 extend and
>> narrowing conversions would range check on the width of set bits.
>> For example,
>>
> All,
>
> I have spent some time considering Joe's suggestions.
>
> While I really like the aesthetics of "u" means "unsigned" compared
> with "y" suffix for "byte", I am also aware of the considerable extra
> complexity of defining a new primitive type in the JLS. From a first
> scan I have identified the most obvious change points which are
> recorded in the google document

Hi Bruce.

Thanks for starting a more detailed analysis and comparison of the "u"
and "y" proposals.

Let's see, from the top in 3.10.1 "u" and "U" can be added to the list
of Integer Type Suffixes with some explanatory text like:
"An integer literal is of type int if it is suffixed with "u" or "U";
the trailing "u" or "U" indicates an unsigned conversion process
occurs.  Unsigned literals are converted as if int were an unsigned
32-bit 2's complement format and different widening and narrowing
primitive conversion rules are applied to unsigned literals.  For
example, 2147483648u (2^31) is equal to -2147483648 and 2147483649u
(2^31 + 1) is equal to -2147483647."

In terms of conversions, there are 11 categories of conversions and 5
conversion contexts.

In 5.1.2, there could be three new widening primitive conversions:

Unsigned int literal to long, float, or double

with appropriate rules to preserve the sign of the result: "A widening
conversion of an unsigned int literal to long zero-extends the converted
int value; meaning the low-order 32-bits of the long are equal to the
int value and the high-order 32-bits of the long are zero.  Widening
conversion of an unsigned int to float or double acts as if the value
first went through a widening conversion of unsigned int to long and
then a widening conversion of long to float or double, respectively."

And in 5.1.3, there could be three new narrowing primitive conversions:

Usigned int to byte, short, or char

These would actually be the same as the current narrowing conversions on
int; just grab the low-order n bits.  The real help would come in
section 5.2, Assignment Conversion, to redefine what "a constant
expression is representable in the type of the variable" for unsigned
literals to allow things like "byte b = 0xFFu;"

The text of the method invocation conversion would remain unchanged;
with the defined widening conversion, given the method declaration

public void foo(long ell) { System.out.println(ell);}

the call

foo(0xFFFFFFFFu)

would do the right thing, print out 4294967295.  Narrowing conversions
do not take place in a method invocation context.  So to call

public void bar(byte b) {...}

the call

bar(0xFFu) // no bar(int)

would not work, but

bar((byte)0xFFu)

would.

Unary numeric promotion (5.6.1) is not affected by unsigned literals
since only values narrowing than int are promoted.  Given the earlier
definitions of widening primitive conversion, I don't think any explicit
changes are needed in binary numeric promotion (5.6.2).

There are some potential puzzlers here.  Since 2's complement divide
doesn't give the same bit-wise result for divide, unlike for add,
subtract, and multiply  "unsignedLiteral1 / unsignedLiteral2" may give a
surprising answer and should be a compiler warning.  Another limitation
of this proposal is that only unsigned literals are regarded as
unsigned.  There is no analogy of an "unsigned expression" to parallel a
"constant expression" (15.28).  That means

long value1 = 0xFFFFFFFFu;
long value2 = 0xFFFFFFFEu + 1u;

will give different results.  It would be possible to define a subset of
the constant expressions to be unsigned constant expressions (starting
with unsigned literals, operation on them with unary operations, binary
+, -, and * (but not / ), shifts, bitwise logical operations, etc.).

Perhaps unsigned constant expression would need to be defined to reduce
the "gotcha" factor of some of the current results.

-Joe

```