Groovy identifiers play by the rules of Cavinball
Today, we’ll be talking about how Groovy doesn’t seem to have a consistent idea of what an identifier is. It’s not a very large topic, and it really isn’t a problem, but it is a topic that nevertheless merits mention, and it allows me to introduce…
The Base Project, and a brief intro to Gradle
Here you can find the “base project”, which is a
Gradle project which I’ll use to demonstrate Groovyisms as we move along. After
you download the file, extract it to yield a directory named base-project
.
Most examples will provide a Main.java
or Main.groovy
to use in this
project. This should be placed under src/main/groovy/gl/lin
, replacing the
Main.java
file already there.
All you need to run the project is Java 6 (it probably works on 7, too) and a relatively modern Unix environment. You can build the project by running
./gradlew build
from within the project directory. Running
./gradlew run
will execute the Main
class, which in the bare base project is just hello
world.
1:compileJava UP-TO-DATE
2:compileGroovy
3:processResources UP-TO-DATE
4:classes
5:run
6hello world
7
8BUILD SUCCESSFUL
9
10Total time: 5.17 secs
If this is your first experience with Gradle, take a moment to contemplate just
how long — and how many resources — it takes Gradle to build hello world. It
takes 5 seconds on my 4.2 GHz desktop, with the JVM already in cache and
without needing to download anything. My netbook can compile 10,000 lines of C
in that time. (Actually, it can compile Java that quickly if the Java compiler
is invoked directly.) At the same time, it requires a 300 MB working set of
memory. What is Gradle doing in all that time and space? At this point it is
twice as heavy as Windows XP. Just to print hello world
.
Those already familiar with Gradle might notice that the gradlew
script has
been lightly modified. That’s because it normally has /bin/bash
hardwired,
which is about as portable as expecting the first character of uname
to be
‘L’.
Groovy Identifiers
Groovy’s identifiers are just like Java’s, at least if you stick to ASCII. Which you should — there’s no reason to ever use non-ASCII characters in your identifiers, so any issues therewith aren’t really a problem with Groovy in my book.
Java, in a decision far more sane than allowing Unicode characters in identifiers, does ban the non-breaking-space chacter. For example, the following file will not compile.
Nbsp.java
1package gl.lin;
2
3public class Nbsp {
4 public static int foo bar; // non-breaking space
5}
1Nbsp.java:5: illegal character: \160
2 public static int foo bar;
3 ^
4Nbsp.java:5: <identifier> expected
5 public static int foo bar;
6 ^
72 errors
You’ll get a similar error if you try to use a “small non-breaking space” (I’ll
call it SNBSP from here on out). Groovy, of course, also forbids these
characters. Except that they seem to have forgotten about the non-breaking
space characters other than the non-breaking space character, but only in
some cases. The following file will compile and run with Groovy. You can use
it to replace Main.java
in the base project.
Main.groovy
1package gl.lin;
2
3class Main {
4 static class Hello World Printer {
5 def print hello world(String whom) {
6 println("hello $whom");
7 }
8 }
9
10 public static void main(String[] command line arguments) {
11 new Hello World Printer().print hello world("world");
12 }
13}
Those apparent spaces between identifiers are SNBSPs (handily typable with
Super+Hyper+Space on the Neo2 keyboard). You may notice, however, that I didn’t
actually reference the variable command line arguments
in main()
. That’s
because the SNBSP character is illegal for identifiers referencing the variable
namespace which occur in expressions. Try it!
Main2.groovy
1package gl.lin;
2
3class Main2 {
4 static class Hello World Printer {
5 def print hello world(String whom) {
6 println("hello $whom");
7 }
8 }
9
10 public static void main(String[] command line arguments) {
11 new Hello World Printer().print hello world(command line arguments[0]);
12 }
13}
Trying to compile this results in
1src/main/groovy/gl/lin/Main2.groovy: 11: Invalid variable name. Invalid characte
2r at position: 8 of value: in name: command line arguments. At [11:49] @ lin
3e 11, column 49.
4 d Printer().print hello world(command li
5 ^
6
71 error
This happens with other characters, too, like the rather esoteric Capital Eszett, ‘ẞ’. In Groovy’s defense, Java considers the character punctuation. On the other hand, this means that Groovy sometimes allows punctuation in identifiers.
Main3.groovy
1package gl.lin;
2
3class Main3 {
4 static class Hello World Printer {
5 def print hello world(String whom) {
6 println("hello $whom");
7 }
8 }
9
10 public static void main(String[] ẞ) {
11 new Hello World Printer().print hello world(ẞ[0]);
12 }
13}
1src/main/groovy/gl/lin/Main.groovy: 11: Invalid variable name. Must start with a
2letter but was: ẞ. At [11:49] @ line 11, column 49.
3 d Printer().print hello world(ẞ[0]);
4 ^
5
61 error
Notice how, again, Groovy only complains when we try to use our variable named ‘ẞ’.
Even more fun:
Bidi.groovy
1package gl.lin;
2
3class Bidi {
4 def (x) { return x==0 || !(x-1); }
5 def (y) { return y==1 || !(y-1); }
6}
Yes, the file compiles without complaint. Drop it into the base project if you need to see for yourself. In case you aren’t Unicode-privvy, this file defines a function whose name is the right-to-left override, and another whose name is the left-to-right override, and the two call each other. Groovy is perflectly OK with this.
This honestly has to be one of the strangest, but most harmelss, Groovyisms I’ve encountered so far. While it is partly explainable by the fact that the Java standard library doesn’t consider any non-breaking space character to be a blank, it’s really weird that the handling of these characters is different based on how the identifier is being used. It would appear that identifier validation is performed by the parser, rather than by the lexer as would be done in any sane compiler.