funny characters in identifiers?
per at bothner.com
Fri Dec 31 13:25:38 PST 2010
On 12/28/2010 01:58 PM, Charles Oliver Nutter wrote:
> On Tue, Dec 28, 2010 at 12:21 PM, Per Bothner<per at bothner.com> wrote:
>> Is there a plan/consensus for how to handle "illegal" characters
>> in identifiers? I'm primarily interested in the bytecode level,
>> not the Java source level. For example identifiers like '/'
>> used for division in Scheme. It would be good to have a standard
>> way to deal with this.
> See John Rose's post on this here:
> We have implemented it in JRuby, and it works well. The down side is
> that Java backtraces can be a little hard to read when there's lots of
> symbolic identifiers.
A problem with this mangling is that it isn't "safe" for class names,
or at least not for class files. Using '\' in a filename is obviously
problematical, especially on Windows. On Posix-based file system the
funny characters are in principle allowed, but will of course be awkward
to access from shells and other tools.
Windows disallows the following in file names:
< (less than)
> (greater than)
" (double quote)
/ (forward slash)
| (vertical bar or pipe)
? (question mark)
(And of course we have problems with-insensitive file systems.)
Now of course we can use an annotation to specify the source class name
in case the source class name is invalid - but then we still need to
mangle the class name somehow.
I think a better prefix character would be '%'. It's not reserved
for Posix or Windows or JVM, while not being a valid Java character.
Even better might be '~' or '!' since those are also unreserved for URIs.
I will assume '~' in the following.
If we want names that a "safe for filenames" or even "safe for URIs"
then the problem is that there are too many unsafe characters to
encode as '~' followed a safe non-alphanumeric. Which means that
we need to use '`' followed by a *letter*.
'/' -> '~s' (mnemonic: slash)
'.' -> '~d' (dot)
'<' => '~l' (less)
What about non-Ascii characters? I don't know enough to know if
such characters might cause a problem, but don't know of any reason.
They might technically be disallowed by URIs, but my impression
%-mangling is handled somewhat universally and semi-transparently.
per at bothner.com http://per.bothner.com/
More information about the mlvm-dev