-
Notifications
You must be signed in to change notification settings - Fork 0
Pentaho integration testing with Spring Boot
During any serious Enterprise Application development sooner or later it become clear that some sort of integration testing is required to proceed in a sane, stable way. Pentaho provides many examples of how to write unit tests but since they use mocking they aren't very useful for true integration tests. During a recent project it was necessary to integrate PDI with spring boot so that true integration testing could be performed.
There are several ways to integrate Pentaho with Spring.
The first attempt at integration was to tightly couple and embed it with spring natively. This seemed like a good idea at the time but it turned out this was a lost cause.
The major reason being is that PDI libraries use a lot of pure java singletons and as everyone knows Spring hates pure java singletons because they can't be managed within a spring context. To embed PDI it was necessary to rewrite a lot of the core PDI code making the classes non static.
PDI variable expansion uses environment variables in the background. For a unit test, it means that tests can't be run in parallel. Not sure why the developers decided to go this route but since the code is rather old I am guessing that it was a choice of convenience at the time.
Embedding PDI requires libraries to be included in the POM. This will grow the POM exponentially as PDI is a very big product. Another downside is that if a library is missed and certain functionality is required in the future it won't be known, so a good precaution is to include everything, which creates collisions with some of the same libraries in spring.
PDI uses Karaf to manage plugin and library loading which greatly increases the startup time. Embedding PDI means that this facility is unavailable greatly slowing the startup time.
The good news is that there is an easy solution to the above problems. PDI comes with a built in REST API microservice running on a jetty web server called Carte. Having a loosely coupled web service allows easy configuration management for different environments as each carte server can be run on different ports, quick load times, no library dependencies in spring and PDI handling of variable expansion or singletons are no longer a pain.
Accessing the carte server requires an HTTP client. Creating an HTTP Request/Response object is trivial and there are many good examples on the web so it won't be discussed here. There are limited authorization options available in Carte so if security is of concern then some work might be in order to put it behind a reverse proxy. The Carte API provides a request in an XML format which allows logging from PDI to be seamlessly integrated into spring boot. Overrides to kettle properties can also be provided through URL GET parameters so it is very flexible and configurable.
Having the HTTP client is enough to access the Carte server and start ETL jobs or transformations but when running integration tests we need to guarantee that the required Carte servers are up before trying to run integration tests through SpringBootTest
else they'll be an external dependency to start the Carte servers. Bootstrapping Carte with Spring Boot Tests requires a launcher that runs carte.sh to start up the servers. Having the launcher is enough but it's not really viable to bring up and tear down the Carte server for each test as this is too time intensive. The best way is to bring up a Carte server once at the time of loading the spring context and then tear it down at the end of the tests. The way to do this is to proceed the initialisation method of the launcher with the PostConstruct
annotation. The PostConstruct
fits in really well to a unit testing framework and gets called right after a spring context is loaded. The same goes for PreDestroy
for teardown. Implementing these two methods is enough to start up and destroy the Carte servers when running SpringBootTest
making the bootstrapping seamless.