Groovy method calls involve lock contention

2014-01-26 in groovy java
A fractal of bad implementation

Suppose that we have a stateless but computationally-intensive piece of code involving a lot of function calls, which is for some reason written in Groovy.

SlowFib.groovy

1package gl.lin
2
3class SlowFib {
4  static int fib(int n) {
5    n <= 1? n : fib(n-1) + fib(n-2)
6  }
7}

Obviously, the algorithm here is pretty much the worst one possible — it has exponential run-time, but calculating the Fibonacci sequence can be calculated in linear time. But imagine for the time being that this is real code. The important thing is that this code makes lots of function calls. Also important is how the function is stateless — it accesses no data outside of its locals, and is non-virtual due to its staticness (ie, it could be called with one instruction were this native code).

Now say we want to write a program that maps a sequence of input integers to the corresponding Fibonacci number. A simple single-threaded implementation is below.

Main.java

 1package gl.lin;
 2
 3public class Main {
 4  private static final int N = 4096;
 5
 6  public static void main(String args[]) {
 7    int inputs[] = new int[N];
 8    for (int i = 0; i < inputs.length; ++i)
 9      inputs[i] = (int)(Math.random() * 37);
10
11    int outputs[] = new int[inputs.length];
12    for (int i = 0; i < inputs.length; ++i)
13      outputs[i] = SlowFib.fib(inputs[i]);
14  }
15}

Since this is just for testing performance, we generate the input randomly and discard the outputs. Running the program produces less than satisfactory results in terms of run-time, however. Obviously what you might see varies greatly based on your processor, but on my system I get

1$ ./gradlew run
2
3...
4
5Total time: 52.13 secs

Definitely less than stellar performance. But we might note that the program is of the “embarassingly parallel” category — it has no interdependent outputs, so, barring memory bandwidth limits, it should parallelise perfectly. Let’s do that in a rather hackish way.

MainConcurrent.java

 1package gl.lin;
 2
 3public class MainConcurrent extends Thread {
 4  private static final int N = 4096;
 5  private static final int THREADS = 4;
 6
 7  public static void main(String args[]) {
 8    for (int i = 0; i < THREADS; ++i)
 9      /* These are non-daemon threads, so Java won't exit till they
10       * complete. start() does not block, so all four will run in parallel.
11       */
12      new MainConcurrent().start();
13  }
14
15  @Override
16  public void run() {
17    int inputs[] = new int[N/THREADS];
18    for (int i = 0; i < inputs.length; ++i)
19      inputs[i] = (int)(Math.random() * 37);
20
21    int outputs[] = new int[inputs.length];
22    for (int i = 0; i < inputs.length; ++i)
23      outputs[i] = SlowFib.fib(inputs[i]);
24  }
25}

Note how we start four separate threads, each of which does ¼ the work of the single thread in the original program. In theory, the program should now take about 16 seconds (assuming Gradle incurred 10 seconds of overhead in the original run) as long as the host system has at least 4 processors.

1$ ./gradlew run
2
3...
4
5Total time: 3 mins 22.743 secs

Well, as we see, it’s actually over four times slower.

The problem appears to be rooted in the “Meta Class” system. Since these objects must be consulted to determine which methods exist, and they may be modified at run-time, Groovy takes a lock on the “Meta Class” every time it issues a method call. This means that multiple threads attempting to call methods in that happen to be in the same class will slow each other down tremendously, as only one may perform method resolution at a time.

The problem is at least as bad on non-static members. In many cases, it is worse, due to the incredibly convoluted means by which Groovy performs member resolution.

For reference, the same programs with SlowFib translated to Java take 30 seconds for single-threaded and 12 seconds for quadruple-threaded, including several seconds of Gradle overhead in each.