Code Maat and my first public Dockerfile

Last year I read the excellent book “Your Code as a Crime Scene” by Adam Tornhill. It is an excellent book that uses the work from research on how to find weak spots in a code base by mining version history in order to fix broken designs, maintenance issues, and team productivity bottlenecks. I can highly recommend it to everyone.

In addition to the book Adam wrote an open source tool, Code Maat, in Clojure which he uses in his book to explain the theories and show it in a practical way on how to find the hot spots. With Code Maat you can analyse version history log files from Git, Perforce, Subversion and Mercurial. To run Code Maat you need to install and setup a couple of things on your developer PC. Since I finally started to work practically with Docker I created a Dockerfile. You can find it on my Github repository and how to use it.


Java 9 and modularity

“The module system: a seat belt, not a jetpack”

I visited JavaOne (for the first time) in October this year with the main goal to get more information about Java 9 and the support for modularity which is more known as Project Jigsaw. Java 9 was initially targeted for September 2016, but when I listened and talked to others at JavaOne I realised that it would be hard to reach that target and hence it was no surprise when Mark Reynolds recently set the new target date to March 2017 in order to get everything in place.

This is not a post about Java 9 and how the new modularity support will be implemented and how to use it. It is more of a reflection in general and why we should us it. If you want to find out more about Jigsaw I suggest you to download the presentations at JavaOne  or watch them. I can recommend (“Prepare for JDK 9”, “Introduction to Modular Development”, “Advanced Modular Development”, “Project Jigsaw: Under the Hood”, “Project Jigsaw Hack Session”).

Modularity is certainly not a new thing which Bert Edman began his talk “Building Modular Cloud Applications with OSGi” (I can recommend his book with the same name) where he refers to Dijskstra which addressed the need for “separation of concerns” in the 70s. This design pattern to let each module to have a single functional responsibility is well understood, but hard to achieve or at least to preserve over time. I can see that from the company I have been Chief Architect for the last years. One of the products was built with its own modularity system. The design goals where good, but from the start the concept of concern were misunderstood. The modules were designed from a technical aspect instead of a functional concern aspect.



A module exposes some kind of interface (external) that reflects the concern and responsibility of the module. The implementation (internal) part implements the contract and hides from the outer world how this is implemented. Another useful concept/pattern that describes this is the Cohesion and Coupling where a well designed system should strive for Low Coupling between modules and High Cohesion within the modules. Easy to write, hard to achieve.



Each module defines it module definition in the file. In the example above there are two different modules. One is declaring its dependency to another module (extenda.a -> extenda.b) and one module declaring its external API (package com.extenda.api). By declaring the modules in java files and not meta-inf files give for example compile time support. I am also fond of exporting packages and not single interfaces and classes as in OSGi. It makes it easier for the developer since it is easy to att the class/interface to the module file instead of just placing it in the exported package.

So far, there has been no java language support for modularity except for the package concept. The most well-known and proven module support is OSGi which has been both criticised and proven in production. I tried Apache Karaf a couple of years ago which at that time was easy to use and it seemed to work pretty well. I only made a small prototype and in the end we selected a non OSGi framework to implement the new integration product. But, I like the concept of OSGi.
Maven (and Gradle/Ivy) is also some kind of modularity system that is easy to understand and use.

So why do I want to start using modules? Modules will not solve everything, but I see three things that will help us:


Standard language support
OSGi and Jigsaw has a lot of similarities. As I see it OSGi has the pros of being more mature and proved in battle. If you need to switch modules during runtime you should go for OSGi.
We don’t have the need for switching of modules in runtime and still OSGi is not part of the standard of Java and as the product company I work for my role is to always try to use the standard as much as possible since the products lives for a long time. Selecting a GUI framework for a web or mobile client is not a fun thing since there is a new cool framework and language every month. Therefor we have selected as JavaFX as much as possible for our client where it is possible to use.
In the same way using an official language feature will make it more future proof in many aspects.

Tooling support
What we lack from our home grown modular system is the language and tooling support to get compile time support. We waste a lot of time to get the errors until runtime. Jigsaw will help us with that.

Explicit external API
Other products that we have don’t explicitly defined an external API and hence all classes and interfaces are part of the external API.


The products are meant to be customized for each customer where needed. The problem we face is that the developers tend to forget that we have the responsibility to avoid breaking backward compatibility. If we had explicitly defined what is external we could create rules that verifies this for every build. Another advantage of start using modules is to preserve the design of the concern of each current “module”.

Runtime images with JDK 9

The JDK has itself been refactored and modularised and for the first time deprecated classes will probably been unavailable to use in JDK 9 and 10. The modules are “packaged” in three different profiles (compact1, compact2 and compact3) which contains different number of modules. With this is mind and new tool to create my own runtime image with just the modules I need is a feature that we will use in the future. Embedding the JRE and other components such as the database will make the life easier for the company and our customers.


Even though Java 9 is not scheduled until Q1 2017 it is time to start design and implement for Jigsaw. Download the Jigsaw alpha release, design your modules and create your files and setup a nightly build to verify your design from now on. Make sure you download the Jigsaw alpha JDK release and not the regular JDK 9 release.

A look at Java 8

So finally I had some time to look at Java 8 and the new features. I picked the excellent book “Java 8 in Action : Lambdas, streams and functional-style programming” to update myself. As always Manning’s books are easy to read and follow on a new subject, but still in-depth.


As the title indicates one big part is about Lambdas and the new way of functional design it implies to the world of Java. It will for sure lead to more comprehensive code which is almost always better until it gets too implicit and the developer who tries to understand it with not enough skills misunderstands the code.

The possibilities to use and pass functions is really good and will lead for sure to better design. I liked the way the Java language designers used the new concept of Default methods to introduce certain methods to existing interfaces. It will be interesting to see how we will use Default methods in the future.


The second theme of the book handles Streams which looks really good as well. Streams will allow us to handle information as we are used with information in relational databases (hopefully we don’t forget to keep doing that instead of using the streams api…). A lot of iterators and logic to handle the data can be reduced to a much less and more efficient code.

Streams, lambdas and the improvements of Future and other asynch features will allow us to better deal with multi-core processors now and in the future without using threads, synchronisation and other not-so-recommended way of programming multi-core applications.


Mondrian in Action – a great introduction to data warehouse modeling (and Mondrian of course)

I am a newbie to data warehouse modeling and business intelligence, but thanks to the book “Mondrian in Action” (ISBN 978-1-61729-098-5) I now have a thorough understanding of the concept and what to think. I am far from an expert of course, but I am now on my way to create business intelligence model to analyze the domain of software development. Using the Pentaho platform which includes Mondrian makes it possible to create a model that enables to visualize the history and to make conclusions of the evolution of the software in order to make better software in the future.

The authors also recommends “The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition” if you want to dig into the word of data warehouse modeling. It contains more in-depth knowledge once you get started with dw-modeling, but “Mondrian in Action” gave me a better introduction to the subject.

In coming posts I will add my findings around data warehouse modeling for software repository mining.

Testing regular expressions

I used to use RegeBuddy which is a really good tool to develop and test regular expressions and I really miss it. But, since I am using an OS that is not supported by RegexBuddy I can’t use it any more. I have tried different online service and this week I found the online service at which is not as good as RegexBuddy, but still decent. It gives you the ability to try for different languages and the specific features for each of them and to do match and replace testing. It does not have the colored syntax matching which I really liked with RegexBuddy.


Why should we document?

As a reaction to large overwhelming (enterprise) architecture frameworks such as TOGAF and RUP the agile manifesto stated a number of things in an ambition to produce better software in a more efficient way. One of the things that were stated was

  • Working software over comprehensive documention

As many have said before me this does not mean we should omit documentation, but I think many developers read this statements as

  • Working software over comprehensive documention

Many developers want to write code and so do I. It is fun to write and to run the code and see it working. We don’t have the time to write that documentation since we are on our way to write the next masterpiece of code or we think we will document once we are done. Well, probably the most common reason is lack of time in order to be able to finish the system or product. But, aren’t we just fooling ourself in the long run. As most other human beings I tend to forget things very fast, even the code I have written myself.

People are leaving the company from time to time

People are leaving the company and bring the local information

People tend to quit their job after a time. Some sooner than other and in the IT industry people tend to change more often than in other industries. When a person leaves her job a lot of knowledge is walking out of the door and it takes time to recover from that loss. But, even we who stays we tend to forget and loose knowledge and information after a while. It is a human defect (or a blessing) to forget things that has happened or being said. In the IT industry we are kind of sloppy regarding taking notes and document our work properly compared to more mature professions such as Architecture, Physics and Legal.

Yes, it is true that might loose time to document since the design will change. But, I strongly agree with Andy Hunt in his book “Pragmatic Programmer” when he states “Perhaps the most important is to write/visualize”. It is the same phenomena when we explain a problem to colleague (or a rubber duck). During the explanation we find the solution in ourself. Forcing us to document we get better and faster of writing and drawing, but we also find a lot of bugs or design errors during the writing/drawing.

Old documentation
“The worst thing I know is outdated documentation. It is better with no documentation in those cases”

This is a rather common stated opinion. I don’t agree. I would say “it is better with outdated documentation, than no documentation”. For me all kind of documentation of code, design and architecture gives you an idea what was the purpose of the design and code when the information was discussed and formulated. If you use the SCM for the documentation you are able to get the code how it looked like when the documentation was written and you are then able to see the evolution of the software being developed.

Writing unit tests using BDD style

Well written tests could be seen as part of good documentation, but rather often it is of no help at all. Take for example this test

public void testOne() {
String a = "Kalle";
int b = someService.someMethod(a);
assert(b, 2);

Seing this code or the test report does not give us a clue what the test does or what the code we are testing should do. There are better ways of doing this and I tend to favor unit tests using the Behavior-driven development (BDD) style to structure the unit test.
First of all it is important to give a (as close to) self explaining name of the test method of what the test is testing and the expected result. Don’t forget to remove the “test” in the methods name since it is obvious this is a test. The method is annotated, right? Secondly it is possible to use the given-when-then BDD styel in a unit test as well.

public void whenKalleIsUsedAsUserNameWhensomeMethodIsCalledThereShouldBeTwoOfThem() {
// Given a first name "Kalle"
String a = "Kalle";
// When service is requested for the number users with Kalle as first name
int b = someService.someMethod(a);
// Then there should be two of them
assert(b, 2);

This might a stupid example, but using this structure it makes it more obvious to the reader what the test is about. Using a proper method name makes it easier to spot what went wrong reading the error report of your build.

My first contribution to the Open Source community

Open Source

For the last decade I have used a lot of great and sometimes less great Open Source software in my attempts to build systems and software products. Most of them are published to be used under no obligations and others under restricted license terms. I am so grateful for all these work that others have done to make my life easier.


So, I decided it’s payback time to create my first Open Source project. I decided to use Github as repository for the software. Since my current main employer is Extenda I created a Organization account (

Software Package Data Exchange®

The first project is really about Open Source. At Extenda we use Open Source libraries from different sources and we think we do oblige to the different licenses regarding paying license fees, provides license texts, source code etc. However we really want to 100% sure and we have looked at different software and services and discuss with lawyers to find a way to improve our process and assurance. During that process I discovered the standard Software Package Data Exchange® (SPDX® - which goal is to provide a standard way to describe the different licenses a product/system is using with a RDF report.


Looking at their site there were no real community tools to generate such report. There were only commercial tools. Since Extenda uses maven site to generate product documentation including the list of libraries and the licenses we are using it felt natural to create a maven site plugin that could as part of the site genration phase create an spdx report file. The result is an spdx-maven-plugin ( and it is “published” and pushed to Github under Apache License 2.0. The current version is 0.0.1 and will still need a lot of work before it complies with the standard. I need to understand more how to write tests for maven plugins of this kind. Stay tuned…

Github Pages

I felt using the would not be enough and since Github provides Github Pages ( I created a “site” for organization Extenda ( and the plugin project ( Even though the documentation ( is rather straight forward I manage to misunderstand it partly because of my lack of practical experience with and Git and Github. But, finally I got it right so you should be able to do it as well :).


The maven plugin is of course built using maven and so far with a very basic test suite (“Hello World” more or less). The goal is to create a set of tests, but I need to learn how to create those maven plugin tests.
In order to verify from an external part I wanted to make sure the build was successful in a different environment than my own. Therefor I created an account at Cloudbee’s service BuildHive in order to build and run junit tests for the project (

GitHub and BuildHive works well together

BuildHive is free of use for public projects on GitHub and it is really easy setup a build project for a repository on GitHub. Once verified the connection between BuildHive and GitHub, using the GitHub credentials, you just select “Add Git Repository”

Add Git Repository

and you will se the list of your personal and your organization’s repositories on GitHub.

List of GitHub repositories

To add a build select Enable and the Jenkins project will automatically be created for you.

Enabled BuildHive project

If you select the link you will be redirected to the Jenkins project build. You will probably notice that the project is already built and and probably has failed. BuildHive tries to figure out how to build the project, but you will probably need to configure the project. Select Configuration and enter the Shell Script for your project. Since spdx-maven-plugin is built using maven I entered mvn package (other build management solutions are supported as well – magic).

Enter mvn package as Shell Script

Select save and now the projects can be built successfully.

Succesfull build

So far BuildHive has no support the artifact(s) to a repository of some kind. It is primaraily a build and verification service. An option is to create an account on CloudBees (DEV@Cloud) which is a paid service with a free entry level where you can build your open source projects. It allows to use private settings to build and deploy your project. BuildHive (and CloudBees) uses web hooks to be notified once a new push has been made to the repository on GitHub and a new build is started automatically.

It is fascinating how easy it is with these integrated services to setup a versioned controlled project, creating a project site and connect it to a build server.

Finalized the online course “Data Mining with Weka”







Tonight I finalized the online course “Data Mining with Weka” ( provided by authers of the book “Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition” and the tool Weka. This was my first online course and it was really easy to follow the videos introducing each concept at a steady pace and how to use the Weka tool. Each week a new Class was put online and each class had 6 lessions with some questions to answer to verify the understanding of the concept of the lesson. In addition there was a mid-term and post-course assessment that you could take as well.

Classification Pipeline by Ariel Kleiner (

Classification Pipeline by Ariel Kleiner






The Classification Pipeline is a good image summarizing what machine learning using classification is all about. If you don’t want to take the course or read the Data Mining book I can recommend to watch Thomas Oldervoll’s talk about “Machine Learning for Java Developers” at JavaZone this year (

I will post my results from mining software repositories using the repositories I have access to and to combine that with research by others. Stay tuned (hopefully if I get some results to talk about :)).

Why SonarSource favors SonarQube Java rules over Checkstyle and PMD

Since I have started to dig into the world of SonarQube a little bit deeper than before I have noticed the SonarQube Java rules engine. I recently found this blog post by SonarSource where they explains why they favor their own engine over Checkstyle and PMD. The reasons they provide sounds fair even though my initial thought was that you commit yourself to the SonarQube platform. But, does it matter? We do like it and there are more things that can be developed and shared with each other as a community.