Unix paths as bytes
pjenvey at underboss.org
Mon May 4 00:20:04 PDT 2009
On May 3, 2009, at 5:02 PM, Martin Buchholz wrote:
> The python proposal is interesting,
> but also does not provide real access to the underlying bytes,
> and appears to have round-trip preservation problems.
Python does provide direct access to paths as bytes via different
APIs. Byte versions of the environment and the command line args have
been discussed and may happen in the future, even with PEP 383.
I mention this new PEP because it's made for the general case of
working with strings and expecting strings back from these APIs. Our
UNIX APIs will encode these paths back to their original bytes via the
filesystem's encoding + the PEP's new encoder error handler, and
Python code can also encode them back to bytes in the same way. There
are no round-trip preservation issues.
This scheme is similar to what Mono does: http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
Whereas Mono uses a NUL followed by the offending byte as a char. The
Mono scheme wasn't chosen because external libs (e.g. PyGTK) would
truncate the strings at NUL upon rendering or reject them entirely.
Which is worse than just rendering lone surrogates as garbage.
> The Paths API seems to be parallel to the environment variable API
> in that it catches most of the places where file names would be
> corrupted by round-trip encoding/decoding, but it is easy to
> construct sample code where the abstraction is leaky,
> E.g. if you try to construct a file name from the concatenation of
> an existing file name and a suffix defined in Java code as a string.
> (Correct me if I'm wrong)
This example does work for paths as long as you're concatenating via
Path objects (and the value of suffix is valid according to
file.encoding). In the case of other JVM languages we just don't have
that luxury of always representing paths with nio's Path objects.
We'd also love access to environment variables as bytes but that's a
whole different story.
More information about the nio-dev