<i18n dev> Converting tr18 "strong recommendations" into RFEs
tchrist at perl.com
Tue Jan 25 10:02:35 PST 2011
Since you're looking through my messages for potential RFEs,
I thought I would point a pair of low-hanging fruit for you.
tr18 contains two distinct strong recommendations, both of which should
be quite easy to convert into RFEs. As recommendations, even strong
ones, they are of course not requirements--as such. But they are
obviously considered important enough for usability as to merit not
just a recommendation, but a strong recommendation.
The first strong recommendation is in tr18 section 1.2:
The recommended names for UCD properties and property values are in
PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
There are both abbreviated names and longer, more descriptive names.
It is strongly recommended that both names be recognized, and that
loose matching of property names be used, whereby the case
distinctions, whitespace, hyphens, and underbar are ignored.
Because that strong recommendation governs two separate independent
clauses, it might translate into two different RFEs:
It is strongly recommended that both versions of UCD properties
and property aliases be recognized per their definitions in
PropertyAliases.txt and PropertyValueAliases.txt.
It is strongly recommended that loose matching of property
names be used, whereby the case distinctions, whitespace,
hyphens, and underbar are ignored.
This strong recommendation from tr18 is again repeated in tr44,
so it's clear that they mean for it to be paid due attention to
if at all possible:
5.7 Matching Rules
When matching Unicode character property names and values,
it is strongly recommended that all Property and Property Value
Aliases be recognized. For best results in matching, rather
than using exact binary comparisons, the following loose
matching rules should be observed.
Mentioning the same thing as a strong recommendation not once but twice
in The Unicode Standard seems to run as close as you can get to making it
a requirement without actually doing so. I know it isn't entirely trivial,
but I really do think it ought to be done: it greatly aids usability.
The second strong recommendation in tr18 is from its section
1.6 Line Boundaries, where it reads:
It is strongly recommended that there be a regular expression
meta-character, such as "\R", for matching all line ending
characters and sequences listed above (e.g. in #1). It would
thus be shorthand for:
( \u000D\u000A | [\u000A\u000B\u000C\u000D\u0085\u2028\u2029] )
(My regex rewriting code for Level 1 conformance and especially
for RL1.2a, also handles \R, amongst many other things. Yes,
I know that I still owe you mail regarding that code, its causes
and its consequences.)
I figure that because one can cite exact language for these two strong
recommendations straight out of The Unicode Standard, that it should
be significantly easier to get the two or three associated RFEs looked
into for possible implementation than it would be if they were just
somebody's pie-in-the-sky desiderata.
More information about the i18n-dev