Groovy identifiers play by the rules of Cavinball

2013-10-13 in groovy gradle java
Wat

Today, we’ll be talking about how Groovy doesn’t seem to have a consistent idea of what an identifier is. It’s not a very large topic, and it really isn’t a problem, but it is a topic that nevertheless merits mention, and it allows me to introduce…

The Base Project, and a brief intro to Gradle

Here you can find the “base project”, which is a Gradle project which I’ll use to demonstrate Groovyisms as we move along. After you download the file, extract it to yield a directory named base-project.

Most examples will provide a Main.java or Main.groovy to use in this project. This should be placed under src/main/groovy/gl/lin, replacing the Main.java file already there.

All you need to run the project is Java 6 (it probably works on 7, too) and a relatively modern Unix environment. You can build the project by running

  ./gradlew build

from within the project directory. Running

  ./gradlew run

will execute the Main class, which in the bare base project is just hello world.

 1:compileJava UP-TO-DATE
 2:compileGroovy
 3:processResources UP-TO-DATE
 4:classes
 5:run
 6hello world
 7
 8BUILD SUCCESSFUL
 9
10Total time: 5.17 secs

If this is your first experience with Gradle, take a moment to contemplate just how long — and how many resources — it takes Gradle to build hello world. It takes 5 seconds on my 4.2 GHz desktop, with the JVM already in cache and without needing to download anything. My netbook can compile 10,000 lines of C in that time. (Actually, it can compile Java that quickly if the Java compiler is invoked directly.) At the same time, it requires a 300 MB working set of memory. What is Gradle doing in all that time and space? At this point it is twice as heavy as Windows XP. Just to print hello world.

Those already familiar with Gradle might notice that the gradlew script has been lightly modified. That’s because it normally has /bin/bash hardwired, which is about as portable as expecting the first character of uname to be ‘L’.

Groovy Identifiers

Groovy’s identifiers are just like Java’s, at least if you stick to ASCII. Which you should — there’s no reason to ever use non-ASCII characters in your identifiers, so any issues therewith aren’t really a problem with Groovy in my book.

Java, in a decision far more sane than allowing Unicode characters in identifiers, does ban the non-breaking-space chacter. For example, the following file will not compile.

Nbsp.java

1package gl.lin;
2
3public class Nbsp {
4  public static int foo bar; // non-breaking space
5}
1Nbsp.java:5: illegal character: \160
2  public static int foo bar;
3                       ^
4Nbsp.java:5: <identifier> expected
5  public static int foo bar;
6                           ^
72 errors

You’ll get a similar error if you try to use a “small non-breaking space” (I’ll call it SNBSP from here on out). Groovy, of course, also forbids these characters. Except that they seem to have forgotten about the non-breaking space characters other than the non-breaking space character, but only in some cases. The following file will compile and run with Groovy. You can use it to replace Main.java in the base project.

Main.groovy

 1package gl.lin;
 2
 3class Main {
 4  static class HelloWorldPrinter {
 5    def printhelloworld(String whom) {
 6      println("hello $whom");
 7    }
 8  }
 9
10  public static void main(String[] commandlinearguments) {
11    new HelloWorldPrinter().printhelloworld("world");
12  }
13}

Those apparent spaces between identifiers are SNBSPs (handily typable with Super+Hyper+Space on the Neo2 keyboard). You may notice, however, that I didn’t actually reference the variable command line arguments in main(). That’s because the SNBSP character is illegal for identifiers referencing the variable namespace which occur in expressions. Try it!

Main2.groovy

 1package gl.lin;
 2
 3class Main2 {
 4  static class HelloWorldPrinter {
 5    def printhelloworld(String whom) {
 6      println("hello $whom");
 7    }
 8  }
 9
10  public static void main(String[] commandlinearguments) {
11    new HelloWorldPrinter().printhelloworld(commandlinearguments[0]);
12  }
13}

Trying to compile this results in

1src/main/groovy/gl/lin/Main2.groovy: 11: Invalid variable name. Invalid characte
2r at position: 8 of value:    in name: command line arguments. At [11:49]  @ lin
3e 11, column 49.
4   d Printer().print hello world(command li
5                                 ^
6
71 error

This happens with other characters, too, like the rather esoteric Capital Eszett, ‘ẞ’. In Groovy’s defense, Java considers the character punctuation. On the other hand, this means that Groovy sometimes allows punctuation in identifiers.

Main3.groovy

 1package gl.lin;
 2
 3class Main3 {
 4  static class HelloWorldPrinter {
 5    def printhelloworld(String whom) {
 6      println("hello $whom");
 7    }
 8  }
 9
10  public static void main(String[] ) {
11    new HelloWorldPrinter().printhelloworld([0]);
12  }
13}
1src/main/groovy/gl/lin/Main.groovy: 11: Invalid variable name. Must start with a
2letter but was: ẞ. At [11:49]  @ line 11, column 49.
3   d Printer().print hello world(ẞ[0]);
4                                 ^
5
61 error

Notice how, again, Groovy only complains when we try to use our variable named ‘ẞ’.

Even more fun:

Bidi.groovy

1package gl.lin;
2
3class Bidi {
4  def (x) { return x==0 || !(x-1);  }
5  def (y) { return y==1 || !(y-1); }
6}

Yes, the file compiles without complaint. Drop it into the base project if you need to see for yourself. In case you aren’t Unicode-privvy, this file defines a function whose name is the right-to-left override, and another whose name is the left-to-right override, and the two call each other. Groovy is perflectly OK with this.

This honestly has to be one of the strangest, but most harmelss, Groovyisms I’ve encountered so far. While it is partly explainable by the fact that the Java standard library doesn’t consider any non-breaking space character to be a blank, it’s really weird that the handling of these characters is different based on how the identifier is being used. It would appear that identifier validation is performed by the parser, rather than by the lexer as would be done in any sane compiler.