The Generations of Java Style

2017-03-13 in java rambling

In the two decades of its existence, the Java programming language has gone through quite a few changes. Even more so, however, have the overall approaches to design and architecture in the Java ecosystem. I have found that these can by and large be separated into four generations which typically cut over relatively abruptly, immediately giving large code-bases of prior generations an archaic and poorly-designed feel.

This isn’t hugely useful information per se, but certain artefacts of this historical development can help explain current situations, and being able to reference overarching style by generation is sometimes useful.

Generation I — Pure, naïve objects

The first generation began with Java itself and mostly died out between Java 1.2 and 1.3. The approaches used in this generation are largely how the Java language was originally designed to be used.

There is relatively little in terms of first-generation code in shared libraries, partly because of how long ago the first generation died out and partly because of the fact that shared libraries in Java were not common at the time. Most examples still alive reside in the JDK; JavaMail is another example.

First-generation Java is typified by three significant properties:

A naïve interpretation of object-orientation

First-generation Java code generally views objects at a much higher level than most modern code. Constructors generally construct the object’s dependencies implicitly; classes often aim to provide all high-level functionality anyone could ever need through their own methods, and may omit access to the actual primitives of the abstraction; methods of a superclass are often overridden with properties that have additional side-effecting behaviours.

This creates a lot of problems for modern code.

The way constructors hide dependencies means it is nearly impossible to write tests which use live instances of a first-generation class; instead, one often simply adds another layer so it can be mocked away entirely.
The dependency hiding also often requires a lot of hidden state, such as taking configuration via Java properties (essentially stringly-typed global variables) or needing to share heavyweight handles implicitly.
The high-level operations exposed by the API become bloated and may eventually become incorrect due to age.
The lack of access to the abstraction’s actual primitives causes the API to be self-circumscribing. That is, it may be impossible to write client code to do some otherwise natural thing with the API but which would be reasonably easy to do by modifying the implementation.
Side-effecting overrides, especially when overriding methods not normally expected to have side-effects, causes many surprises and often actually breaks the contract of the method originally being overwritten.

Perhaps the best common examples of these problems can be found in the incestuous family of URL, HttpURLConnection, and the other URLConnection subclasses. (A full discussion of all the problems of these classes will need to wait for another post, though.)

First we have URL. Many programmers make the mistake of trying to use this class when they want to do something simple like parse a URL. But this class is quickly found to have a lot of problems.

It is used as the gateway to create URLConnections, and so it refuses to parse any URL with a scheme that has not been registered in a global table.
Most people would expect URLs to be compared on the basis of literal equality. URL, on the other hand, tries to compare based on what content it references. It overrides Object.equals() and Object.hashCode() to call into DNS and compare IP addresses. Besides being something that literally nobody expects (both in terms of semantics and because now putting stuff into a Hashtable can result in network operations), it’s also amusingly wrong now, since HTTP/1.1 introduced vhosts (so the same IP address can serve different host names) and load balancers are now common (so many IP addresses could serve the same data). The implementation for the equals() method also contains a comment lamenting that it isn’t practical to actually fetch the data from the URL to compare that instead.

One does not construct an HttpURLConnection directly. Instead, a URL is created, the openConnection() method called, and then the result downcast to HttpURLConnection on the assumption that an http:// URL will cause the method to return that class. At this point we’ve gone through mutable global state already and made an assumption about it. There’s also no way to write tests that use URL.openConnection() other than mutating this global state, but even then trying to substitute a new class will likely just break whatever was blindly downcasting to HttpURLConnection.

Construction method aside, HttpURLConnection is still a problematic class. It has hard-wired but undocumented opinions of how various HTTP status codes are interpreted. It supports a myriad of authentication mechanisms, but has no API to define new ones. It takes a String for the HTTP method name, but restricts the method to a closed set of names which may be further constricted by mutable runtime state — no WebDAV or coffee for you.

Arrays are the one true data structure

Until Java 1.2, the JDK’s collections were quite sparse and inconvenient, besides being completely type-unsafe. This leads to a lot of code using arrays for various purposes directly, particularly in APIs that would nowadays return collections. This can be seen quite extensively in, for example, the JavaMail API.

This use of direct arrays is not in practise a huge problem for client code, though it does cause inconvenience to modern code since the Java language has largely left arrays behind.

In implementation, however, some other artefacts crop up, such as a tendency to implement data structures in-situ with arrays instead of using some shared implementation.

Make everything thread-safe

The synchronized keyword seems to have been originally seen as a panacea for thread-related issues, which led to a large number of classes being made fully thread-safe where this was neither necessary nor beneficial.

For example, many of the original data structures provided by the JDK, such as StringBuffer, Hashtabl, and Vector make all their methods synchronized.

This was later understood to be a mistake, since it adds overhead to all code even though these types are rarely modified concurrently, and when they are, the synchronisation is almost always required at a higher level (such as around a series of mutations or around iteration).

Generation II — All aboard the Enterprise

The second generation likely began in Java’s early days, but rose to prominence around Java 1.2 and lasted through 1.5. The second generation poisoned the culture surrounding the Java language and is largely responsible for Java’s reputation as an obscenely verbose language. A number of anti-patterns were introduced in this era as “best practises”, some of which still persist today.

The second generation is defined by several components and patterns:

Design patterns for the sake of design patterns

Early Java’s lack of expressiveness led to general embrace of “design patterns”, i.e., the practise of working around the language’s limitations by becoming a human macro expander.

The design patterns are not in and of themselves a problem. The problem was a growing practise of seemingly using a design pattern for the sake of having used the pattern.

The most well known form of this is the factory, which is now sufficiently notorious that even non-Java programmers recoil at the term. A major surviving example of this misuse is the Java XML framework. For example, to construct an XMLGregorianCalendar, one must call a factory method on XMLDataTypeFactory. However, to get the XMLDataTypeFactory, one must call the newInstance() method on the XMLDataTypeFactory class itself — the XMLDataTypeFactoryFactory, if you will. This for a pure-data class that is literally a handful of integer fields.

Useless singletons also come up a lot in code from this era. The JDK itself has thankfully few instances of this. One example, though from before the second generation, is the Runtime class, which has absolutely no reason to exist as an instance rather than a set of static methods.

XML

The second generation coincided with a period in which most of the world had a shared delusion that a regularisation of the format used to mark up text in a webpage would also be a good way to represent data. As a result, second generation code likes to use XML for a wide variety of things.

While not solely a hallmark of the second generation as XML has continued to bleed into newer things, this is ultimately where ideas like using XML for configuration, as a wire format in protocols and files, and having it as an intrinsic data type in certain relational databases originated.

Ultimate extensibility

During this generation also arose a certain culture of making things “extensible” in odd ways. This is different from versatility or the object-oriented concept of class extension; instead, this often involved contorted approaches to dependency injection to dynamically determine how the parts of an application should fit together even though there is often only one reasonable way for this to be.

The most extreme examples of this manifest as “soft coding”, wherein what is essentially programming is moved into another layer, such as configuration.

A certain baroque framework also ties this factor into XML, RPC, and beans.

Beans

In object-oriented culture up through the first generation, it had generally been agreed that public variables were a Bad Thing and that state should be exposed in a control manner through meaningful methods, and initialising that state through constructors. Most people still consider this good engineering practise.

However, at some point, most of the Java world made this into a “Thou shalt not use public variables” commandment. Apparently carrying this out in spirit was too hard, so instead we got the JavaBeans specification which allows for following the commandment to the letter without worrying about good engineering practise.

In general, a JavaBean refers to a class fulfilling a couple points:

It has a no-args constructor. If your class needs input to be constructed, too bad, just make an invalid instance and make the caller deal with it.
Direct access is provided to the fields (which are private, of course) of the class through public getter and setter methods.

With this in place, you now don’t have to make any of your fields public but also don’t need to think about abstractions and can get the same easy code as when using public variables.

Beans alone massively increase Java’s verbosity, especially for pure-data classes. First, a look at what I consider a sane 2D point class. (This example is mutable since immutable classes had not caught on in this era yet.)

Point.java

 1public class Point {
 2  public int x;
 3  public int y;
 4
 5  public Point() { }
 6  public Point(int x, int y) {
 7    this.x = x;
 8    this.y = y;
 9  }
10}

Yes, it’s still an order of magnitude more verbose than C (especially if we wanted to implement equals()), but it’s not horrible. Now let’s make it a bean:

PointBean.java

 1public class PointBean {
 2  private int x;
 3  private int y;
 4
 5  public PointBean() { }
 6  // It's actually somewhat counter-convention to have any meaningful
 7  // constructors, but most people would make an exception for this class.
 8  public PointBean(int x, int y) {
 9    this.x = x;
10    this.y = y;
11  }
12
13  public int getX() {
14    return x;
15  }
16
17  public void setX(int x) {
18    this.x = x;
19  }
20
21  public int getY() {
22    return y;
23  }
24
25  public void setY(int y) {
26    this.y = y;
27  }
28}

With the JavaBeans specification came a bunch of APIs to work with beans, including a serialisation library whose deserialiser is literally a Java scripting language expressed in XML. Nobody uses these anymore. However, the bean properties (a.k.a. “inconvenient public variables”) anti-pattern is still in common use. I will cover this in more detail in a later post.

Enterprise Beans

I’ve had multiple people laugh when first presented with the term Enterprise JavaBeans because they thought “enterprise” was being used pejoratively to mock the inherent verbosity of JavaBeans. On the contrary, Enterprise JavaBeans is perhaps one of the reasons “enterprise” is associated with obscenely verbose code.

I myself have not worked with EJB proper, and not at all with anything that existed during the second generation, so my knowledge here is kind of fuzzy.

EJB aimed to be the be-all and end-all of application frameworks. Website backend? EJB. Desktop application? EJB. Server daemon? EJB. Monitoring the vitals of your astronauts? EJB.

The culmination of all problems inherent in the second generation, EJB is an astoundingly massive framework. It provides:

A bean-based object-relational mapper (JPA).
A bean-based message publish/subscribe system (JMS).
Bean-based session management.
Bean-based XML processing (JAXB).
Bean-based service management (JMX).
Bean-based dependency injection with a healthy dose of XML.
Bean-based remote procedure calls.
Probably a lot more.

It might be non-obvious at first, but EBJ places a lot of emphasis on beans. But these aren’t any old beans, they’re Enterprise Beans. In the original EJB versions, Enterprise Beans put normal JavaBeans to shame in terms of verbosity.

(EJB is apparently less bad today. The content below is about the 1.x versions of EJB.)

The dependency injection system is the first problem. The EJB framework was responsible for constructing bean objects and wiring them to their dependencies via their bean properties. This was specified in a massive XML “configuration” file which explicitly indicated how to populate each property of each class using fully-qualified class names (e.g., com.example.applicationx.model.bean.AbstractSingletonProxyFactoryBean).

The bigger problem is RPC. No, not “it’s really verbose if you happen to want to use EJB to do RPC”. It’s really verbose because you need to use RPC even if your whole application runs in one process. Not only do you need to define your bean class and wire it with XML, you also need to define not one, but two interfaces to accompany it. The “home” interface is used for operations that inherently happen in the local process (such as creating new instances) and a “remote” interface used for everything else. So let’s make Point into an Enterprise Bean!

EnterprisePointBean.java

 1/* Imports elided */
 2
 3public class EnterprisePointBean extends EnterpriseBean {
 4  private int x;
 5  private int y;
 6
 7  EnterprisePointBean() { }
 8  EnterprisePointBean(int x, int y) {
 9    this.x = x;
10    this.y = y;
11  }
12
13  public int getX() {
14    return x;
15  }
16
17  public void setX(int x) {
18    this.x = x;
19  }
20
21  public int getY() {
22    return y;
23  }
24
25  public void setY(int y) {
26    this.y = y;
27  }
28}

EnterprisePointHome.java

1/* Imports elided */
2
3public interface EnterprisePointHome extends EJBHome {
4  public EnterprisePointBean create()
5  throws RemoteException, CreateException;
6
7  public EnterprisePointBean create(int x, int y)
8  throws RemoteException, CreateException;
9}

EnterprisePointRemote.java

1/* Imports elided */
2
3public interface EnterprisePointRemote extends EJBRemote {
4  public int getX() throws RemoteException;
5  public void setX(int x) throws RemoteException;
6  public int getY() throws RemoteException;
7  public void setY(int y) throws RemoteException;
8}

Generation III — Conventional Wisdom

Java 1.5 introduced a sorely missing feature to the language: generics. While Java’s generics were introduced so as to allow retrofitting them onto existing APIs, they possibilities opened created enough churn in common libraries to allow less “enterprisey” alternatives to gain traction.

The third generation is primarily defined by growing use of “convention over configuration” methods to reduce boilerplate, as well as near complete abandonment of the RPC mechanisms built in to Java (RMI and CORBA). The ecosystem also fragmented into less overbearing frameworks that were still similar in concept to EJB and its ilk.

Major libraries of this generation include Hibernate (an ORM), Tomcat (an application container), and Guice (a dependency injection framework). While these certainly have their problems, they are all less problematic than EJB.

On the darker side, this generation also brought forth Spring (with wonderful classes such as AbstractSingletonProxyFactoryBean and AnnotationAwareAspectJAutoProxyCreator) and Grails.

All in all, there isn’t too much to say about this generation. It’s less bad than the previous but still carries a lot of its problems.

Generation IV — Renaissance

The fourth generation started perhaps as early as Java 1.6, but gained much more traction with the advent of lambda expressions and other functional support in Java 1.8.

The fourth generation shakes off a lot of the problems introduced by the prior generations — but sadly, not all; some things, like the inconvenient public variables anti-pattern persist.

Of course, it presumably has its problems too. As a Java programmer who uses the fourth-generation approach, I presumably simply haven’t discovered these problems yet, and a few years down the line may well end up mocking this article. But for now, fourth-generation is the bee’s knees.

The fourth generation has stark contrast with the prior generations, not only because of the things it adds, but because of the things it casts away.

Rejection of reuse by extension

Subclassing happens infrequently compared to the prior generations because the practise of reusing an implementation by extending the class containing it has almost completely fallen out of favour. It does still happen as subclassing is occasionally the right tool for the job. But gone are the deep, convoluted class hierarchies of yore. Instead, reuse more often takes place by delegation or functional decomposition.

The term “composition over inheritance” unfortunately appears to have been muddied by a number of approaches which implement composition using inheritance, but there’s a decent amount of material explaining the general benefits of this approach if one can see past the Design Patterns fallout.

Within the Java ecosystem, the largest benefit is perhaps the elimination of the “ultimate extensibility” mentality in which classes have various hook points and other awkward mechanisms to facilitate potential subclasses and a precarious amount of state involved in the extension machinery.

Rejection of deep mutability

Large architectures in the prior generations were often built around extensive object graphs with pervasive mutability. Even when controlled with proper object-oriented methods, this practise introduces many windows for concurrency bugs and makes it hard to reason about when and how values may change.

The fourth generation sees the rise of making immutable data models and shallow resource graphs. In many cases, mutability is confined to local variables and things uniquely referenced by local variables.

There are still some major exceptions, such as caches or objects which track the (inherently mutable) state of the outside world.

Rejection of design patterns

Singletons are rare now, except when implementing an interface, and even then nobody calls them that. The singleton-ness is usually solely there to avoid heap allocations.

Designing the factory pattern into an API is now uncommon. It still has its uses, but they prefer to go by the term Supplier nowadays.

Strategies are rarely an explicit concept; most people just use functions.

The list can be extended to the less common patterns as well. The important point is that code is less verbose and APIs less labyrinthine.

Rejection of “magic”

The fourth-generation culture largely considers many of the third-generation frameworks to be unnecessarily “magical” and so these have fallen out of favour.

A notable example is Guice, a dependency injection framework. Guice attempts to keep verbosity down with implicit rules about how most things are bound, requiring explicit configuration only in unusual cases.

However, even when this works perfectly, it ultimately only saves a small amount of logic-free code (maybe a hundred lines in a large application) in exchange for a lot of complexity and mental overhead.

In the circles I work in, Guice is increasingly seen as counter-productive, and is generally replaced by direct dependency injection (i.e., passing dependencies into a constructor) with a single logic-free class responsible for constructing all the resources at once.

The rise of functional approaches

Functional programming in Java had a small cult following for a while, with a very small minority of programmers willing to deal with all the nested anonymous classes that result.

// What the Java 8 Streams API would look like with Java 7 syntax
public static Map<String, Integer> wordCount(String text) {
  return Arrays.stream(text.split("\\s+"))
    .filter(new Predicate<String>() {
      @Override
      public boolean test(String word) {
        return !word.isEmpty();
      }
    }).map(new Function<String, String>() {
      @Override
      public String apply(String word) {
        return word.toLowerCase();
      }
    }).collect(Collectors.toMap(
      new Function<String,String>() {
        @Override
        public String apply(String word) {
          return word;
        }
      },
      new Function<String,Integer>() {
        @Override
        public Integer apply(String __) {
          return 1;
        }
      },
      new BinaryOperator<Integer>() {
        @Override
        public Integer apply(Integer a, Integer b) {
          return a + b;
        }
      }));
}

Java 1.8 changed that by adding lambda expressions. The above code can now be written as:

public static Map<String, Integer> wordCount(String text) {
  return Arrays.stream(text.split("\n"))
    .filter(s -> !s.isEmpty())
    .map(String::toLowerCase)
    .collect(Collectors.toMap(s -> s, __ -> 1, (a,b) -> a + b));
}

Because of this, functional programming has enjoyed much higher popularity, particularly with the first-party streams and Optional APIs embracing functional composition.

Perhaps non-obviously, this has also likely contributed to the prior points. Strategies and factories are in many cases simply replaced by functions, which do not need to be given elaborate names. Composition is also easier as there is less boilerplate involved in passing function-like things around.

An Instance Short of a Singleton