Groovy method calls involve lock contention
Suppose that we have a stateless but computationally-intensive piece of code involving a lot of function calls, which is for some reason written in Groovy.
SlowFib.groovy
1package gl.lin
2
3class SlowFib {
4 static int fib(int n) {
5 n <= 1? n : fib(n-1) + fib(n-2)
6 }
7}
Obviously, the algorithm here is pretty much the worst one possible — it has
exponential run-time, but calculating the Fibonacci sequence can be calculated
in linear time. But imagine for the time being that this is real code. The
important thing is that this code makes lots of function calls. Also
important is how the function is stateless — it accesses no data outside of
its locals, and is non-virtual due to its static
ness (ie, it could be called
with one instruction were this native code).
Now say we want to write a program that maps a sequence of input integers to the corresponding Fibonacci number. A simple single-threaded implementation is below.
Main.java
1package gl.lin;
2
3public class Main {
4 private static final int N = 4096;
5
6 public static void main(String args[]) {
7 int inputs[] = new int[N];
8 for (int i = 0; i < inputs.length; ++i)
9 inputs[i] = (int)(Math.random() * 37);
10
11 int outputs[] = new int[inputs.length];
12 for (int i = 0; i < inputs.length; ++i)
13 outputs[i] = SlowFib.fib(inputs[i]);
14 }
15}
Since this is just for testing performance, we generate the input randomly and discard the outputs. Running the program produces less than satisfactory results in terms of run-time, however. Obviously what you might see varies greatly based on your processor, but on my system I get
1$ ./gradlew run
2
3...
4
5Total time: 52.13 secs
Definitely less than stellar performance. But we might note that the program is of the “embarassingly parallel” category — it has no interdependent outputs, so, barring memory bandwidth limits, it should parallelise perfectly. Let’s do that in a rather hackish way.
MainConcurrent.java
1package gl.lin;
2
3public class MainConcurrent extends Thread {
4 private static final int N = 4096;
5 private static final int THREADS = 4;
6
7 public static void main(String args[]) {
8 for (int i = 0; i < THREADS; ++i)
9 /* These are non-daemon threads, so Java won't exit till they
10 * complete. start() does not block, so all four will run in parallel.
11 */
12 new MainConcurrent().start();
13 }
14
15 @Override
16 public void run() {
17 int inputs[] = new int[N/THREADS];
18 for (int i = 0; i < inputs.length; ++i)
19 inputs[i] = (int)(Math.random() * 37);
20
21 int outputs[] = new int[inputs.length];
22 for (int i = 0; i < inputs.length; ++i)
23 outputs[i] = SlowFib.fib(inputs[i]);
24 }
25}
Note how we start four separate threads, each of which does ¼ the work of the single thread in the original program. In theory, the program should now take about 16 seconds (assuming Gradle incurred 10 seconds of overhead in the original run) as long as the host system has at least 4 processors.
1$ ./gradlew run
2
3...
4
5Total time: 3 mins 22.743 secs
Well, as we see, it’s actually over four times slower.
The problem appears to be rooted in the “Meta Class” system. Since these objects must be consulted to determine which methods exist, and they may be modified at run-time, Groovy takes a lock on the “Meta Class” every time it issues a method call. This means that multiple threads attempting to call methods in that happen to be in the same class will slow each other down tremendously, as only one may perform method resolution at a time.
The problem is at least as bad on non-static members. In many cases, it is worse, due to the incredibly convoluted means by which Groovy performs member resolution.
For reference, the same programs with SlowFib
translated to Java take 30
seconds for single-threaded and 12 seconds for quadruple-threaded, including
several seconds of Gradle overhead in each.