<i18n dev> Converting tr18 "strong recommendations" into RFEs

Tom Christiansen tchrist at perl.com
Tue Jan 25 10:02:35 PST 2011


Since you're looking through my messages for potential RFEs,
I thought I would point a pair of low-hanging fruit for you.

tr18 contains two distinct strong recommendations, both of which should
be quite easy to convert into RFEs.  As recommendations, even strong
ones, they are of course not requirements--as such.  But they are
obviously considered important enough for usability as to merit not
just a recommendation, but a strong recommendation.

The first strong recommendation is in tr18 section 1.2:

    The recommended names for UCD properties and property values are in
    PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
    There are both abbreviated names and longer, more descriptive names.
    It is strongly recommended that both names be recognized, and that
    loose matching of property names be used, whereby the case
    distinctions, whitespace, hyphens, and underbar are ignored.

Because that strong recommendation governs two separate independent
clauses, it might translate into two different RFEs:

    It is strongly recommended that both versions of UCD properties
    and property aliases be recognized per their definitions in
    PropertyAliases.txt and PropertyValueAliases.txt.

    It is strongly recommended that loose matching of property
    names be used, whereby the case distinctions, whitespace,
    hyphens, and underbar are ignored.

This strong recommendation from tr18 is again repeated in tr44, 
so it's clear that they mean for it to be paid due attention to
if at all possible:

    5.7 Matching Rules

    When matching Unicode character property names and values,
    it is strongly recommended that all Property and Property Value
    Aliases be recognized.  For best results in matching, rather
    than using exact binary comparisons, the following loose
    matching rules should be observed.

Mentioning the same thing as a strong recommendation not once but twice
in The Unicode Standard seems to run as close as you can get to making it
a requirement without actually doing so.  I know it isn't entirely trivial,
but I really do think it ought to be done: it greatly aids usability.

The second strong recommendation in tr18 is from its section
1.6 Line Boundaries, where it reads:

    It is strongly recommended that there be a regular expression
    meta-character, such as "\R", for matching all line ending
    characters and sequences listed above (e.g. in #1). It would
    thus be shorthand for:

      ( \u000D\u000A | [\u000A\u000B\u000C\u000D\u0085\u2028\u2029] )

(My regex rewriting code for Level 1 conformance and especially
 for RL1.2a, also handles \R, amongst many other things.  Yes,
 I know that I still owe you mail regarding that code, its causes
 and its consequences.)

I figure that because one can cite exact language for these two strong
recommendations straight out of The Unicode Standard, that it should
be significantly easier to get the two or three associated RFEs looked
into for possible implementation than it would be if they were just
somebody's pie-in-the-sky desiderata.


More information about the i18n-dev mailing list