Skip to content

Pentaho integration testing with Spring Boot

Tomasz Kaszuba edited this page Oct 30, 2018 · 11 revisions

Background

During any serious Enterprise Application development sooner or later it become clear that some sort of integration testing is required to proceed in a sane, stable way. Pentaho provides many examples of how to write unit tests but since they use mocking they aren't very useful for true integration tests. During a recent project it was necessary to integrate PDI with spring boot so that true integration testing could be performed.

Integration

There are several ways to integrate Pentaho with Spring.

Embedding

The first attempt at integration was to tightly couple and embed it with spring natively. This seemed like a good idea at the time but it turned out this was a lost cause.

Singletons

The major reason being is that PDI libraries use a lot of pure java singletons and as everyone knows Spring hates pure java singletons because they can't be managed within a spring context. To embed PDI it was necessary to rewrite a lot of the core PDI code making the classes non static.

Environment variables

PDI variable expansion uses environment variables in the background. For a unit test, it means that tests can't be run in parallel. Not sure why the developers decided to go this route but since the code is rather old I am guessing that it was a choice of convenience at the time.

Library dependencies

Embedding PDI requires libraries to be included in the POM. This will grow the POM exponentially as PDI is a very big product. Another downside is that if a library is missed and certain functionality is required in the future it won't be known, so a good precaution is to include everything, which creates collisions with some of the same libraries in spring.

Load time

PDI uses Karaf to manage plugin and library loading which greatly increases the startup time. Embedding PDI means that this facility is unavailable greatly slowing the startup time.

Microservices

The good news is that there is an easy solution to the above problems. PDI comes with a built in REST API microservice running on a jetty web server called Carte. Having a loosely coupled web service allows easy configuration management for different environments as each carte server can be run on different ports, quick load times, no library dependencies in spring and PDI handling of variable expansion or singletons are no longer a pain.

HTTP Client

Accessing the carte server requires an HTTP client. Creating an HTTP Request/Response object is trivial and there are many good examples on the web so it won't be discussed here. There are limited authorization options available in Carte so if security is of concern then some work might be in order to put it behind a reverse proxy. The Carte API provides a request in an XML format which allows logging from PDI to be seamlessly integrated into spring boot. Overrides to kettle properties can also be provided through URL GET parameters so it is very flexible and configurable.

Unit Tests

Carte Servers

Having the HTTP client is enough to access the Carte server and start ETL jobs or transformations but when running integration tests we need to guarantee that the required Carte servers are up before trying to run integration tests through SpringBootTest else they'll be an external dependency to start the Carte servers. Bootstrapping Carte with Spring Boot Tests requires a launcher that runs carte.sh to start up the servers. Having the launcher is enough but it's not really viable to bring up and tear down the Carte server for each test as this is too time intensive. The best way is to bring up a Carte server once at the time of loading the spring context and then tear it down at the end of the tests. The way to do this is to proceed the initialisation method of the launcher with the PostConstruct annotation. The PostConstruct fits in really well to a unit testing framework and gets called right after a spring context is loaded. The same goes for PreDestroy for teardown. Implementing these two methods is enough to start up and destroy the Carte servers when running SpringBootTest making the bootstrapping seamless.

Composition