Unix paths as bytes
pjenvey at underboss.org
Tue May 5 19:14:28 PDT 2009
On May 4, 2009, at 11:24 PM, Martin Buchholz wrote:
>> There's no case where 2 different sets of bytes would convert to
>> the same
> I don't understand this. There are many locales with encodings with
> representations. Until the UTF-8 security reform,
> even UTF-8 had non-unique representations.
> The Python PEP seems designed to be used with
> any system encoding, not just UTF-8.
Ok, like ISO-2022-JP, ShiftJIS. These did come up in the PEP
discussion on the python-dev ML.
They weren't highly regarded as they're pretty broken as Unix locales.
The POSIX spec describes these "locking shift encodings" as fishy/
invalid for its character set  and they're incompatible with ASCII.
RedHat, Debian and others disable them as locales by default.
These are indeed problematic, I guess they just weren't a deal breaker
for the simpler scheme -- designed to be used with any system encoding
that isn't annoying. The PEP mentions:
"Encodings that are not compatible with ASCII are not supported by
this specification; bytes in the ASCII range that fail to decode will
cause an exception. It is widely agreed that such encodings should not
be used as locale charsets."
More information about the nio-dev