Java Dependency Analysis and Modularization
Written by Jens Dietrich   
Monday, 03 October 2011
Article Index
Java Dependency Analysis and Modularization
Dependency Graphs
Refactoring Dependencies
Everybody agrees that modularization is good, but how do we go about transforming a big ball of mud architecture to something like OSGi?


 Most experienced developers have experienced some form of system rot: the quality of a program deteriorates over time and it becomes more and more expensive to update and maintain it. This is often caused by poorly managed dependencies.

Dependencies between software artifacts (classes, packages, functions,.. ) are created when an artifact is referenced by another artifact, for instance, when a method of a class invokes a method defined in another class. These dependencies are then propagated to other units of code: a package depends on another package if there is a class dependency between classes in the respective packages etc. Dependencies can become problematic when they are created to solve short-term problems but bypass rules defined as part of the system architecture such as “the persistency layer should not depend on the presentation layer”. These dependencies then become technical debt that starts piling up.

And eventually, this will become a problem.

Remember the late 90s when everybody wanted to port their applications to the web. That was easy to do for applications with a clear separation between user interface and logic layer, and difficult to impossible for applications where the logic depended on a particular user interface (usually a desktop UI).

Now Java programmers face a similar situation. There are many use cases that require modularity - creating plugin ecosystems around products, product lines and the ability to make incremental updates to name a few. And there are several great platforms for modularity available, in particular OSGi and its extensions (Eclipse, declarative services, Spring dynamic modules). But all of these platforms have strict requirements when it comes to dependencies.

A common theme is that these frameworks have containers to manage dependencies automatically. This requires that programmers adhere to the following two principles:

  • package separability - dependencies between different packages should be minimised so that packages can be deployed in different modules. In particular, there should be no circular dependencies between packages.  
  • interface separability - dependencies between abstract classes and interfaces and their implementing concrete types should be minimised, so that abstract types and implementation types can be part of different modules. This facilitates the compatibility of different implementations and makes it easier to replace a particular implementation within an application.

The question arises how existing applications can be refactored to modular designs based on one of these platforms.

The State of Affairs

To answer this question, we have investigated a large set of open-source Java programs (the qualitas corpus) in order to find out how many of those programs suffer from dependency related problems. The short answer is: almost all of them. In this experiment, we checked the dependency graph extracted from the respective program for instances of the following antipatterns which compromise package and interface separability, respectively:

  1. strong circular dependencies between packages (CD): dependency chains starting in a package A, traversing some other packages and the returning into A. This is a strong version of circular dependency caused by one reference chain that creates the package dependencies. In particular, this pattern cannot be broken by splitting packages.
  2. strong circular dependencies between jars (CDC): dependency chains starting in a jar A, traversing some other jars and the returning into A.
  3. subtype knowledge (STK): supertypes (classes or interfaces) (indirectly) referencing their own subtypes.
  4. abstraction without decoupling (AWD): classes referencing both abstract types and their implementation types.
  5. degenerated inheritance (DEGINH): multiple paths from subtypes to super types (in Java, this is possible because interfaces support multiple inheritance).

Surprisingly, almost all programs analysed were ripe with instances of these patterns.

Here are some examples. In tomcat-7.0.2, there is the following circular dependency between jars (CDC):

  1. org.apache.catalina.ha.context.ReplicatedContext in tomcat-catalina-ha.jar
  2. depends on org.apache.catalina.core.ApplicationContext in tomcat-catalina.jar
  3. depends on org.apache.catalina.Service in tomcat-catalina.jar
  4. depends on org.apache.catalina.startup.Catalina in tomcat-catalina.jar
  5. depends on org.apache.catalina.ha.ClusterRuleSet in tomcat-catalina-ha.jar
According to the Tomcat documentation, catalina is the servlet container, and the ha package/jar contains cluster functionality. This means that Tomcat, even when used without clustering, depends on cluster functionality being available.



Tomcat jars and their relationships - click to enlarge

Dependency chains traversing several packages are even more abundant. For instance, the OpenJDK (both versions 6 and 7) contains such a chain linking AWT (java.awt) and Swing (javax.swing). The critical edge is a reference to javax.swing.JComponent in java.awt.Component. This tightly couples the two alternative toolkits together and makes it impossible to deploy them separately.

This implies that an application that only uses the older AWT will also need Swing to run!

Last Updated ( Monday, 03 October 2011 )