Java Anti-Pattern: Inconvenient Public Variables

2017-07-01 in java anti-patterns
The worst of both worlds

Encapsulation, in one form or another, is nearly universal in large-scale programming across languages. Isolating the code which can access certain state or data makes it easier to maintain and easier to reason about what impact changes to that state could have. In object-oriented languages, the unit of encapsulation is the class, and encapsulation is accomplished by making instance variables private and exposing a set of well-defined operations on that state in the public API.

JavaBean properties (a.k.a. “bean properties” or simply “getters and setters”) arose when the point about making instance variables private was applied universally without paying attention to the “well-defined operations” point.

Bean properties, particularly within an internal code-base or narrowly deployed library, are nothing more than inconvenient public variables.

To review, bean properties are a particular pattern/convention on Java classes to abstractly represent a “property” (read: field) of that class with a “getter” and a “setter” method on that class. For example, the property foo would correspond to SomeType getFoo() and void setFoo(SomeType) methods on the class.

There are valid use-cases for doing this. Read-only properties (those with no setter) may be conveniences to compute the value from other properties on the object. The object may actually be a bag of properties with non-trivial accessing semantics. A class may be used by third parties and need to preserve backwards-compatibility at all costs. This post is not about these cases.

I’m talking about human-macro-expander code like this:

Point.java

 1 /* package-private */ class Point {
 2   private int x;
 3   private int y;
 4 
 5   public int getX() {
 6     return x;
 7   }
 8 
 9   public void setX(int x) {
10     this.x = x;
11   }
12 
13   public int getY() {
14     return y;
15   }
16 
17   public void setY(int y) {
18     this.y = y;
19   }
20 }

Using properties makes it easy to change the underlying representation.

True, but in these cases you’re not going to be changing the representation. If you’re not writing a data structures library or something, 99% of classes only admit one obvious representation, at least as far as things backing your getters and setters goes. You’re not going to change the Point class to use polar coordinates internally. If you are writing a data structures library, you’re not going to have getters and setters, but rather meaningful operations on your structures.

Well, I really am sure I’m going to change the representation this time.

You probably can’t. You’ve already sold the farm by handing out direct access to the class’s fields. If you have both getter and setter, to maintain compatibility with a wide install-base, the new internal representation covering a particular property needs to be both covariant (because of the setter) and contravariant (because of the getter) with the old representation. I.e., you can’t make any changes other than obfuscation.

Let’s look at the Point class. Obviously you can’t narrow the ints to shorts because it would break code passing ints to the setters. You also can’t switch to polar coordinates with floating-point values, because there could be code expecting a getX() call after a setX() to return the value passed to setX(). Both of these fail covariance.

Then there’s contravariance. What if you want Point to hold String values for some reason? Unless literally all of these Strings are integers (in which case, why strings?), you now have “getters” which can fail which used not to, which both breaks compatibility and is extremely unusual.

My new representation is perfectly compatible with the old one.

This is pretty unusual. If this is a widely installed API shared with code you don’t control, maybe it was worth it.

If we’re talking about something internal to your organisation, you’ve literally created a non-trivial backwards-compatibility shim for code you already control. This doesn’t help anyone; now there’s just two ways to do the same thing and everyone reading the client code needs to be aware of both and how they interact.

Oh, validation! I want to add validation to my properties’ setters!

You literally can’t. There’s likely already code which is relying on the fact that the setter was trivial. This doesn’t mean that that code was broken before anyway by virtue of constructing invalid objects; it may have been constructing invalid objects but then not using them.

I’ll be super careful and add validation when I write the setter.

Most people won’t because they use their IDE to auto-generate trivial setters since almost nobody can stomach writing them by hand.

But OK, let’s say you’re perfect at writing validation, never forgetting to add it, never missing an aspect that needs to be validated, and never making bugs in the validation code, and you have an iron fist ensuring that collaborators (if any) do the same.

It’s still a losing proposition the moment some aspect requires consulting more than that one field. That validation either needs to happen somewhere else, doesn’t happen at all, or even worse, causes field initialisation order to matter.

When this kind of validation is needed, it’s better to just have a constructor on an immutable class and verify that the object is created in a valid state, or to do validation when the object as a whole crosses the boundary into the library code that uses it.

So bean properties aren’t a panacea, but what harm do they cause?

Lots.

The traditional argument here is that there’s a lot of boilerplate in writing the getters and setters, to which the IDE users reply that their IDE can auto-generate them, presumably while they check their compiled binaries into the repository and send canned texts to their friends. Lombok adds generation of getters and setters (plus a lot more) to the language itself and renders this argument largely moot.

The additional typing is annoying, but if it were just that, it wouldn’t be that big of a deal; for example, I have emacs set up to be able to write foo.getBar() as foo.barä and foo.setBar(x) as foo.baröx), so it’s usually just one extra keystroke anyway.

My beef with this pattern is that it tremendously obfuscates the client code in two ways:

  • Everything is a method call. Seeing identifier() in, e.g., Rust, indicates that something non-trivial is happening. In Java, you have to consider the identifier itself, and possibly use some knowledge of the API in particular. This is made worse by exceptions; foo.bar can’t throw if foo != null, but foo.getBar() might.

  • There’s so much more syntax. A getter involves 5 more characters of space per invocation compared with a normal field access, two of which are loud punctuation. Setters result in an extra level of nesting versus a naturally-flowing operator. Compound assignments are not available at all.

The payoff for all this is… almost nothing. This pattern simply produces inconvenient public variables. Trivial getters and setters provide direct access to a field of the class with exactly the same semantics of field accesses but with much less convenient syntax. You still get all the drawbacks of public variables — representational coupling, lack of abstraction, piecemeal mutability, etc, but in a package that takes longer to write and is much harder to read.

It is especially irksome when people use this in internal code, for example a class which exists solely to hold the configuration for a single application, or a struct to store intermediate state within a method. Literally none of the arguments in favour of using getters and setters even remotely apply.

What to use instead?

When you are hungry, eat; when you are thirsty, drink; when you need a public variable, just use a public variable. This applies particularly if you control all the client code (especially if it’s all in one project or something), which makes compatibility/extensibility concerns completely moot.

For “POJOs” which exist merely to hold deserialised values (e.g., from JSON via the Jackson 2 library), there’s literally no point in hiding anything. Everything on these types is public because anyone can just create a string to deserialise to the state they want anyway.

Then there are “bag of properties” classes, for example a SomethingRequest structure in an internal class. Here, there is also no use in hiding anything. Validation can be performed when the request is executed; compatibility shims can take place at that stage too. This approach does rely on nothing else consuming these structures, but in my opinion the fact that this makes consuming another API’s input structure uncomfortable is a plus. For output structures, validation is moot and compatibility can be achieved by simply populating the fields as needed. In both cases, using constructors/builders and making the classes immutable is often viable and even preferable. In these cases the fields would be public final.

A general exception to the above is in APIs that will be used by third parties. Even though the chance of most object’s structure changing is pretty low, it is sometimes still worth contortions to preserve backwards compatibility.

Other cases where trivial getters/setters arise are usually indicative of deeper problems like mutable resources or lack of abstraction. There also exist frameworks which somehow understand bean properties but not fields; on the occasion where this comes up, one can use Lombok to generate the getters and setters and then pretend they do not exist.

The pervasive mutability in Java is often perceived as a complication to making fields public. If you have a public final List<Widget> widgets field, you can’t stop anyone from mutating the List itself even though the variable is final. Trivial getters do not help here; a public List<Widget> getWidgets() can just as easily be used to gain access to the List and proceed to mutate it. Non-trivial getters that return an unmodifiable view, etc, are still useful to work around this limitation.

In closing, there are plenty of cases where getters and/or setters are the right solution. But applying getters and setters to literally every field simply devalues the legitimate cases and obfuscates the code that uses it.