Groovy's idea of “private” rivals the NSA's

2014-01-05 in groovy java
A fractal of bad implementation

The private access modifier is arguably the most important in the context of software engineering in Java, and more so in Groovy where everything is public (sort of) by default. By declaring something private, the programmer can be certain that no code outside the current lexical scope can see the member; thus, there will never be any external dependencies on its presence or behaviour, allowing it to be easily changed or removed without widespread consequences. Of course, it could be expected of Groovy to merely parse and discard the private keyword, substituting it with the “implicit-public” access modifier instead. Fortunately, it doesn’t do that.

It does something much worse.

Take a look at the following pure-Java class.

Example.java

1package gl.lin;
2
3class Example {
4  private int var;
5  private void doSomething() {
6    System.out.println("Doing something!");
7  }
8}

Anyone who knows Java could instantly tell you that

  • var is never accessed and can safely be removed.

  • doSomething() is dead code and can be removed.

In other words, the class could se simplified to class Example {}.

Now, let’s introduce Groovy to the picture.

Main.groovy

 1package gl.lin;
 2
 3public class Main {
 4  static main(args) {
 5    Example ex = new Example();
 6    ex.var = 42;
 7    println("ex.var=${ex.var}");
 8    ex.doSomething();
 9  }
10}

If you run these two files, you may be surprised to get the output

1$ ./gradlew run
2
3...
4
5ex.var=42
6Doing something!

While Groovy does emit bytecode that properly declares private members, when it accesses members at run-time, it takes a reflection path that allows it to bypass any present access modifiers, which it summarily ignores. Thus, our Groovy program is able to poke at the seemingly-dead parts of the Example class.

This issue primarily affects only code within one project, where programmers usually look at other parts’ source code instead of proper documentation. However, in a large code-base, this greatly impedes refactoring, as any member could be a dependency point for other parts of the project, rather than just the hopefully-intentionally-public members. Since Groovy code is generally untyped, one cannot even rely on static analysis to find dependencies of a supposed-to-be-private member, especially if its name happens to be shared with other identifiers in the project — the only real option is to rename the member and see what breaks at run-time.

Of course, this can also affect inter-project code, as can be seen in the below (contrived) example.

HelloWorld.java

1package gl.lin;
2
3public class HelloWorld {
4  public static void main(String args[]) {
5    Auditing.showingToUser("Hello World");
6    System.out.println("Hello World");
7  }
8}

Auditing.groovy

 1package gl.lin;
 2
 3class Auditing {
 4  public static showingToUser(String what) {
 5    /* Auditing specs require all strings to be logged in uppercase */
 6
 7    for (int i = 0; i < what.value.length; ++i)
 8      /* Do not try this at home */
 9      what.value[i] = Character.toUpperCase(what.value[i]);
10
11    System.out.println("SHOWING STRING TO USER: $what");
12  }
13}

Running these files produces the output

1SHOWING STRING TO USER: HELLO WORLD
2HELLO WORLD

Yes, our Groovy code was able to modify a seemingly separate string literal in the Java code. It’s reminiscent of Fortran. I’d say something to the effect that nobody would ever write something like that in real code, but then I recall the programmers behind the reason this blog exists.

And we’re still not done! Groovy’s ignoring of access modifiers interacts with method overloading. Yes, we’re back to this topic again. If an object has both public and private members of the same name, which easly happens in the case of sub-classing, Groovy may elect to use the private member instead of the public one.

HashCodePrinter.java

 1package gl.lin;
 2
 3public class HashCodePrinter {
 4  public void printHashCode(Object o) {
 5    printHashCode(""+o.hashCode());
 6  }
 7
 8  private void printHashCode(String i) {
 9    System.out.println("hash code = " + i);
10  }
11}

Overloading.groovy

1package gl.lin;
2
3class Overloading {
4  static void main(args) {
5    HashCodePrinter printer = new HashCodePrinter();
6    printer.printHashCode("not an integer"); // !
7  }
8}
1hash code = not an integer

In this particular case, because the run-time type of the value we passed in happened to be more specific to a private member, it was called instead of the only public member with that name, and as a result we get nonsensical output.

By failing to honour Java access modifiers, Groovy entirely undermines the safety of encapsulation provided by the Java object system. Only final variables are safe from being written — any variable can be read, any non-final written, and any method called, by any Groovy code anywhere. It could be likened to poking around in objects’ private area in C++, except that not only does Groovy make it easy, it makes it possible to do so by accident or even to access a private member when trying to access a public member that has the same name.

Because of this issue, it is impossible to tell at a glance whether Groovy code is actually correct from an access-control standpoint — usage of private APIs, whether deliberate or accidental, will be undetected except by thorough audit, or, far more commonly, when the code breaks because those APIs change.