# JDK 7u-dev review request 8024356: Double.parseDouble() is slow for long Strings

Thu Sep 12 05:17:00 UTC 2013

The short answer: because it is a backport to JDK 7 of part of this JDK 8
change:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-June/017958.html

The long answer is the theory behind the change.
Java conversion from decimal string to double is specified as if it happens
in two steps:
1) Convert decimal string to mathematical real number exactly;
2) Round the real number to nearest Java double number. If real number is
in the middle between
two adjacent Java double numbers then round to double with zero last bit.
The step between adjacent real numbers is called ULP (unit in the last
place or unit of least precision).
Each ULP segment between adjacent real numbers can be split in two half-ULP
segements.
All reals in each half-ULP segment are rounded to same Java double number.
So we can safely patch a long decimal string
when we sure that the real value of patched string lies in the same
half-ulp segment as the real value of the original string.

The above patch keeps first MAX_NDIGIT of long decimal string, discards
tail and repalces it by a single non-zero decimal digit ('1').
It is correct when the half of binary ULP of double number is a multiple of
the decimal ULP of truncated decimal string with MAX_NDIGITS kept.

When I prepared the patch I didn't try to find the optimal MAX_NDIGITS, but
I was sure that MAX_NDIGITS = 1100 is enough.
Here is an expanation:
1) If value of decimal string is larger than pow(10,309) then it is
converted to positive infinity before the test of nDigits;
2) If value of decimal string is smaller than pow(10,309) and larger than
pow(2,53) then it rounds to Java double with ulp >= 2.
Half-ulp >= 1.
The patch is correct when decimal ULP of kept digits is 1 or less. It is
true beacuse MAX_NDIGITS = 1100 > 309;
3) If value of decimal string is smaller than pow(2,53) but larger than
0.5*Doulle.MIN_VALUE = pow(2,-1022-52-1), than
binary ULP is a multiple of pow(2,-1074).
Half-ULP is a multiple of pow(2,-1075).
The patch is correct when decimal ULP of kept digits is pow(10,-1075) of
less.
pow(2,53) has 17 decimal decimal digits. MAX_NDIGITS = 1100 > 1075 + 17 .
4) If value of decimal string is smaller than 0.5*Double.MIN_VALUE then it
is converted to zero before the test of nDigits.

You can treat replacement of 1075 + 17 by 1100 as my paranoia.
It still was interesting for me what is the optimal truncation of decimal
string.
I made an explration after the change was accepted to JDK 8, but I want to
postpone this refinement to JDK 9.

The exploration was accompanied by a formal proof.
I used ACL2 - both a programming language and a proof tool.
http://www.cs.utexas.edu/~moore/acl2/
It's language is Lisp-like.
Here you can find definitions and the proof:
http://cr.openjdk.java.net/~bpb/nDigits/acl2/
The snipped portion of this:
<<<<

; This function returns such "e" that all decimal digits below "e" can be
; be safely ignnored.
; Informally, 10^e must divide binary half-ulp

(defun find-exp (dec-exponent p q)
(declare (xargs :guard (and (integerp dec-exponent)
(integerp p) (> p 1)
(integerp q) (> q 0))))
(let* ((m (expt 10 (- dec-exponent 1)))
(e (expo m)))
(if (>= e p) 0 (- (ulp-e m p q) 1)))
)

<<snipped>>

; Final theorem. All numbers in ( m*h, (m+1)*h ) rounds to the same
floating-point number
; if h = 10^(find-exp decexponent)

(defthmd ulp-rnd-rat-near-eq
(implies (and (integerp p) (> p 1)
(integerp q) (> q 0)
(rationalp h) (> h 0)
(integerp dec-exponent)
(rationalp x)
(<= (expt 10 (- dec-exponent 1)) x)
(rationalp y)
(<= (expt 10 (- dec-exponent 1)) y)
(= h (expt 10 (find-exp dec-exponent p q)))
(not (integerp (/ x h)))
(not (integerp (/ y h)))
(= (fl (/ x h)) (fl (/ y h))))
(= (rnd-rat x 'near p q)
(rnd-rat y 'near p q)))
:hints (
("Goal" :in-theory (disable rnd-rat find-exp ulp))
("Goal'" :use (:instance ulp-rnd-rat-near+-h-find-exp (x x)))
("Goal''" :use (:instance ulp-rnd-rat-near+-h-find-exp (x y)))
)
)

>>>>

A function find-exp is defined above in the snipplet.
It has three arguments:
find-exp(dec-exponent,p,q) .
p and q are parameters of floating-point format.
p is number of bits in significand, q is number of bits in exponent.
Java double: p = 53, q = 11 .
Java float: p = 24, q = 8.
dec-exponent specifies a range of real numbers with the same decimal
exponent:
pow(10,dec-exponent-1) <= x < pow(10,dec-exponent)
The function returns the position of lowest decimal digit to be kept.
The algorithm is "take m - minimal value of the range, and return binary
exponent of minimum of its binary half-ulp and 1".

The theorem says that
for any binary floating-point format with parameters p and q.
for any dec-exponent
for step h=pow(10, find-exp(dec-exponent,p,q))
for any pair of rational x and y withing dec-exponent range
and such that x and y are in the same open segment ( m*h, (m+1)*h )
we can state that x and y are rounded-to-nearest to the same floating-point
number.
The are also hint to the theorem prover how to search the proof.

We might want to explore find-exp graphicaly.
The plot of sufficient nDigits as function of dec-exponent is here:
http://cr.openjdk.java.net/~bpb/nDigits/java/double.pdf
and here:
http://cr.openjdk.java.net/~bpb/nDigits/java/float.pdf
You can see that 1100 for double is conservative but not optimal.
The optimal constant for double conversion could be 768 ,
the optimal constant for float conversion could be 142,
but I leave this optimization to JDK 9.

On Wed, Sep 11, 2013 at 9:48 PM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:

> On 09/11/2013 12:58 AM, Brian Burkhalter wrote:
> >
> > Issue:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8024356
> > Webrev:               http://cr.openjdk.java.net/~bpb/8024356/
>
> "MAX_NDIGITS = 1100"
>  Stupid question. Why exactly this magic number?
>
> -Aleksey.
>