“Groovy Strings” are here to make puns and cause chaos

2014-07-10 in groovy grails java
And their only pun is horrible.

Mundane Java has two types of character-type literals: character literals, enclosed in single-quotes, and string literals, enclosed in double-quotes.

1 /* Character literal, enclosed in single-quotes */
2 char ch = 'c';
3 /* String literal, enclosed in double-quotes */
4 String s = "This is a string.";

One of the first things not backwards-compatible with Java that a new Groovy user may notice is that single-quotes now delimit strings as well.

1 /* Single-quotes now delimit a String instead of a character. */
2 String str = 'foobar';
3 /* 'a' is still a String; Groovy just truncates it to a char implicitly */
4 char ch = 'a';
5 /* Double-quotes still work as in Java... or do they? */
6 String s = "This is a string.";

Note that the char line in the above does in fact work at run-time: Groovy tries to keep the Java-compatibility-theatre up by implicit-casting Strings to chars by truncating them to their first character. At least that isn’t too surprising, and it’s fairly hard to shoot yourself in the foot with. However, the fact that the type of a Java character literal is String does still cause problems porting Java to Groovy — an early preview of what comes further down that staircase.

But why have two types of string literals? In double-quoted strings, Groovy supports a feature known as string interpolation, in which values can be concisely stringified and inserted into a string literal. This is accomplised by writing a dollar sign in the string followed by either a bare variable name or an arbitrary expression enclosed in braces.

InterpExample.groovy

1 package gl.lin;
2 
3 public class InterpExample {
4   public static void main(args) {
5     long millis = System.currentTimeMillis();
6     println("This program was invoked at $millis (${new Date()}).");
7   }
8 }
1 This program was invoked at 1404960473511 (Wed Jul 09 20:47:53 MDT 2014).

In most languages with string interpolation (such as Tcl or Bourne Shell), this functionality is fundamentally built into the language, and simply results in building a constant string containing the interpolated values at the point where the literal occurs. That is, one might expect the code

  System.out.println("Hello $target!");

to be the equivalent of the Java

  System.out.println("Hello " + target + "!");

How wrong you’d be.

First off, the type of a string literal in double quotes is not actually a String. It is a GString, a totally professional name that has never been used as a pun in any joke before. (Yes, this name was intentional.)

But names aren’t that important here. GString implicitly casts to a String, so most programmers just pretend the two are equal until the type system explodes as a result, which happens quite rarely. The real problem is something confusingly worse.

For a baseline for a language that does string interpolation reasonably, I’ve provided a simple (but useless) Tcl program that builds a list of strings, each of which is the stringification of a list of integers.

list-string-list.tcl

 1 #! /usr/bin/env tclsh8.5
 2 
 3 set accum {} ;# NB empty list
 4 
 5 proc accumulate {i} {
 6   global accum
 7 
 8   lappend accum $i ;# NB add $i to the accum list
 9   return $accum
10 }
11 
12 set strings {}
13 for {set i 0} {$i < 5} {incr i} {
14   lappend strings "Accum $i: [accumulate $i]"
15 }
16 
17 puts $strings
1 {Accum 0: 0} {Accum 1: 0 1} {Accum 2: 0 1 2} {Accum 3: 0 1 2 3} {Accum 4: 0 1 2
2 3 4}

The output shouldn’t be too surprising; each iteration just adds one element to the list, then splices that list into a string which it saves for later. At the end, it dumps all of them to output.

Below is a seemingly-equivalent Groovy program.

ListStringList.groovy

 1 package gl.lin;
 2 
 3 public class ListStringList {
 4   static accum = [];
 5   static accumulate(i) {
 6     accum << i;
 7     return accum;
 8   }
 9 
10   public static void main(args) {
11     def strings = [];
12     for (int i = 0; i < 5; ++i) {
13       strings << "Accum = #${accumulate(i)}";
14     }
15 
16     println(strings);
17   }
18 }
1 [Accum = #[0, 1, 2, 3, 4], Accum = #[0, 1, 2, 3, 4], Accum = #[0, 1, 2, 3, 4], A
2 ccum = #[0, 1, 2, 3, 4], Accum = #[0, 1, 2, 3, 4]]

What happened here? It turns out that, while Groovy does evaluate the contents of ${} when the string when the literal is encountered, it doesn’t actually bother stringifying the resulting value until the GString itself is forced to a String, or any other method that requires string contents is invoked. And, in fact, it restringifies the values every time this is necessary. Of course, toString() is supposed to be side-effect free and for human consumption only, so this is far from the worst thing that happens.

The true consequences can be hilarously bad.

Let’s say we’re building a highly-concurrent data storage system in Groovy (despite having seen how bad an idea that is), wherein each piece of data is associated with a name and a global version number. The below might look like a reasonable implementation of the in-memory storage, if you know nothing about concurrency.

Gratabase.groovy

 1 package gl.lin;
 2 
 3 /* This is contrived, but some kind of mutable state was needed. */
 4 class NonAtomicInteger {
 5   int next = 0;
 6 
 7   int incrementAndGet() {
 8     return ++next;
 9   }
10 
11   String toString() {
12     return ""+ next;
13   }
14 }
15 
16 class Gratabase {
17   /* Pretend we're actually using a concurrent map here. A basic groovy map is
18    * used for brevity's sake.
19    */
20   def data = [:]; /* NB empty map literal */
21   def nextId = new NonAtomicInteger();
22 
23   def put(name, value) {
24     synchronized (nextId) {
25       int id = nextId.incrementAndGet();
26       data.put("$name:$nextId", value); /* Here be dragons */
27       return id;
28     }
29   }
30 
31   def get(name, id) {
32     return data["$name:$id"];
33   }
34 }

Let’s try using it.

Main.groovy

 1 package gl.lin;
 2 
 3 import java.util.concurrent.atomic.AtomicInteger;
 4 
 5 public class Main {
 6   static main(args) {
 7     Gratabase gb = new Gratabase();
 8     def v0 = gb.put("f", "foo");
 9     def v1 = gb.put("b", "bar");
10     def v2 = gb.put("f", "fum");
11 
12     println("gb[f,0] = " + gb.get("f", v0));
13     println("gb[b,1] = " + gb.get("b", v1));
14     println("gb[f,2] = " + gb.get("f", v2));
15   }
16 }

Running this, we get the perhaps disappointing output

1 gb[f,0] = null
2 gb[b,1] = null
3 gb[f,2] = null

It’s as if we never added those items to the map! That’s because, as it turns out, we didn’t. Hash maps (such as those produced by the [:] literal) are highly dependent on the keys having consistent hash codes and equality. (In fact, all maps with any kind of reasonable efficiency must necessarily depend on the stability of their keys.) However, GString’s hash code, equality, etc, functions are calculated by computing toString() at that moment and then doing the desired calculation upon that result.

From the map’s perspective, we create the following associations:

  • f:1 → foo
  • b:2 → bar
  • f:3 → fum

However, once these calls complete, the keys themselves have effectively mutated; the map now contains

  • f:3 → foo
  • b:3 → bar
  • f:3 → fum

Note in particular that we have duplicate keys — because we violated the invariant of stable keys, anything goes. The map might have stubled upon the first “f:3” if that string happens to hash into the same bucket as “f:1”, but this is unlikely. Also note that the mutated keys will not typically be locatable by their new values, since the correct hash bucket for their new value is almost certainly different from the hash bucket for the original value.

Howver, the last key didn’t actually change; it’s still “f:3”, right? You might have noticed how elements are inserted into the map via an explicit put() call, but the array syntax sugar was used for retrieving them.

If the latter is changed to an explicit get() call:

  return data["$name:$id"];

we do in fact get the final key/value pair, but the first two are still missing. If you instead do the opposite, and change the .put() call to use the syntax sugar:

  data["$name:$nextId"] = value;

everything works as it looks like it should.

It turns out that the “syntax sugar” of using array accessors actually does something else: If the subscript’s runtime type happens to be a GString, it is implicitly toString()ed first before being passed to the underlying method, in an apparent attempt to put a band-aid on the brokenness of GString.

As a result, Groovy Strings have some of the most surprising behaviour of any of the fundamental Groovy data-types, at least when string interpolation is used. The “lazy” evaluation, which may happen any number of times, makes it dangerous to interpolate any non-immutable value for any non-ephemeral purpose (such as a log statement). The fact that a mutable-via-indirection data-type is defined with a syntax customarily used for an immutable string value in virtually every other language makes it all the more surprising and counter-intuitive.

Bonus Content — Grails Configuration

I haven’t talked much about Grails here, mainly because setting up a meaningful Grails project is rather… unpleasant. I have not prepared an actual example Grails project for this section. I have not thouroughly tested this either; it might be somewhat sensitive to certain contexts, or somewhat incorrect.

Grails configuration — which a typical project has a tonne of — is input in a domain-specific language built atop Groovy; this includes front-ends for even more nightmarish configurations such as that of the Spring Framework. A (generally positive) side-effect of being based on Groovy, though, is that arbitrary Groovy code can be used, which allows for nicities such as variables, which can then be interpolated into other strings. So far, this is a relatively sane set-up; most of Groovy’s problems don’t manifest in configuration. If your configuration contains enough logic (ie, non-zero) to run into a true Groovyism, you’re Doing It Wrong anyway, and Groovy is the least of your problems.

The insanity is that Grails does its own form of substitution on configured string values, using the exact same syntax as Groovy string interpolation. That is, if a configured string is added with the vaule foo-$bar, Grails will replace $bar with the contents of the configuration named bar.

This means that the difference between single- and double-quoted strings is massive. The code '${foo}' refers to a config named “foo”, whereas "${foo}" refers to a variable named “foo”. Have fun if you need to refer to both a variable and a config in the same string — you’ll need to do something like "${foo}-\${bar}" to let Grails handle the config interpolation. It is unclear what happens if an interpolation by Groovy produces a potential interpolation for Grails; it is possible Grails knows about GStrings and won’t (effectively) recurse into substitutions.