JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Uwe Schindler uschindler at apache.org
Sat Mar 26 19:11:11 UTC 2016

Hi Alan, hi Robert, Hi Lucene developers,

I was able to reproduce the bug in isolation. The reason why Robert and you did not see it was quite simple:
- You need to enable a security manager
- You need to list all locales before

When you print the class name of the returned break iterator, with Java 8 or Java 9 b110 it returns: "class sun.util.locale.provider.DictionaryBasedBreakIterator"
With build 111 and no security manager, it prints: "class sun.util.locale.provider.DictionaryBasedBreakIterator" (all fine).
With build 111 and security manager enabled, it prints: "class sun.util.locale.provider.RuleBasedBreakIterator" (which is the wrong one for Thai).

Here is my test code:

import java.text.BreakIterator;
import java.util.*;
public class Test {
  public static void main(String... args) throws Exception {
    String[] availableLanguageTags = Arrays.stream(Locale.getAvailableLocales())
    BreakIterator iterator = BreakIterator.getWordInstance(new Locale("th"));

The availableLanguageTags is the code our test framework does before running a test. This is needed to trigger the bug.

The other problem around Farsi is the same: If you run without a security manager all passes. With security manager it fails. The reason is the same: The Collator returned is just a default Collator, not the one for Arabic/Farsi text.

So it looks like the initialization code for locales misses to do some doPrivileged() somewhere. Maybe that one was lost during the merge.


Uwe Schindler
uschindler at apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany

> -----Original Message-----
> From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
> Sent: Saturday, March 26, 2016 3:10 PM
> To: Uwe Schindler <uschindler at apache.org>
> Cc: 'Rory O'Donnell' <rory.odonnell at oracle.com>; 'Core-Libs-Dev' <core-libs-
> dev at openjdk.java.net>; 'Robert Muir' <rcmuir at gmail.com>
> Subject: Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail
> with Farsi and Thai language
> On 26/03/2016 11:56, Uwe Schindler wrote:
> > Hi,
> >
> > after also testing the separate "Jigsaw" build on jdk9.java.net I see the
> same problems. So both builds 111 are wrong.
> >
> > To me it looks like the Unicode data files are missing some information -
> which could again be a packaging bug. As said before, build 110 does not have
> this problem, so it seems to be a side-effect of Jigsaw merging.
> >
> > The following stuff does not work:
> >
> > (1) Thai's locale does not have working dictionary-based BreakIterator
> available. The following "check" in Lucene for this fails, because it cannot
> detect a boundary correctly:
> >
> >    /**
> >     * True if the JRE supports a working dictionary-based breakiterator for
> Thai.
> >     * If this is false, this tokenizer will not work at all!
> >     */
> >    public static final boolean DBBI_AVAILABLE;
> >    private static final BreakIterator proto =
> BreakIterator.getWordInstance(new Locale("th"));
> >    static {
> >      // check that we have a working dictionary-based break iterator for thai
> >      proto.setText("ภาษาไทย");
> >      DBBI_AVAILABLE = proto.isBoundary(4);
> >    }
> >
> > After this static initializer, DBBI_AVAILABLE is false. This makes some tests
> to be ignored, but 2 fail because of this (which might be an oversight on our
> side). But nevertheless, this is a bug in build 111.
> I just tried to duplicate this on OSX and Linux without success. The log
> you linked to suggests this is Linux, is that right? Is this the JDK
> bundle, I haven't checked the JRE bundle but would be surprise anything
> is missing. The JDK has several tests for Thai so if it was completely
> broken then I would have expected it would have been seen. I've no doubt
> that it is not working in your environment, we just need to figure out
> what is different.
> >
> > (2) The collator for Arabic (Farsi) language fails to work correctly. This also
> looks like missing data.
> >
> > Collator collator = Collator.getInstance(new Locale("ar"));
> >
> Are there any exceptions or anything here? Or maybe it tests the
> collector with compare?
> -Alan

More information about the core-libs-dev mailing list