Skip to content

Commit

Permalink
Merge pull request #20 from rtdi/master
Browse files Browse the repository at this point in the history
docs
  • Loading branch information
wernerdaehn authored Jan 26, 2025
2 parents e2c81bf + 829ca3a commit bbf29a1
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 6 deletions.
30 changes: 25 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,34 @@ Source code available here: [github](https://github.com/rtdi/RTDIRulesService)

Docker image here: [dockerhub](https://hub.docker.com/r/rtdi/rulesservice)

## Table of Contents

[Design Thinking goal](#b1)
[Requirements](#b2)
[Installation and testing](#b3)
[Rules](#b4)
[Licensing](#b5)
[Data protection and privacy](#b6)


<a name="b1"/>

## Design Thinking goal

* As a business user I would like to validate the incoming data and cleanse it in realtime
* Consumers have the choice to read the raw or the cleansed data
* Operational dashboards using the rule results provide information about the data quality
* Different types of rules should be supported, validation rules, cleansing rules, data augmentation, standardization rules,...

<a name="b2"/>

## Requirements

* Payload (value) in Avro Format
* Apache Kafka connection with the permissions to run as a KStream
* Schema Registry connection to read (and write) schema definitions

<a name="b3"/>

## Installation and testing

Expand Down Expand Up @@ -77,7 +92,7 @@ To simplify entering rules, sample values can be entered and the result be recal

Once a rule file is complete, it must be copied from the `inactive` to the `active` directory. The button `Activate` does that. The reason for this two staged approach is to allow users saving intermediate definitions without impacting the currently running service.

<img src="docs/media/Rule.png" width="50%">
<img src="https://github.com/rtdi/RulesService/blob/main/docs/media/Rule.png" width="50%">


### Step 3: Topics
Expand All @@ -87,7 +102,7 @@ Scaling is achieved by increasing the number of KStream instances used for this

The screen also allows to copy the rule files being used into the active folder to simplify activating each from the rule file dialog.

<img src="docs/media/Topics.png" width="50%">
<img src="https://github.com/rtdi/RulesService/blob/main/docs/media/Topics.png" width="50%">

### Result

Expand All @@ -100,7 +115,7 @@ Querying this data allows detailed reporting which records were processed by wha

The exact Avro schema field definition can be found [here](docs/audit-schema.md)

<img src="docs/media/RuleResult.png" width="50%">
<img src="https://github.com/rtdi/RulesService/blob/main/docs/media/RuleResult.png" width="50%">


### Sample files
Expand All @@ -111,7 +126,10 @@ The found messages are streamed in chunks into the screen and can be saved, eith
The files are stored in the directory `/apps/rulesservice/definitions/<subjectname>/sampledata/`.
If no file name is specified, the name will be `partition_<partition>_offset_<offset>.json`.

<img src="docs/media/SampleData.png" width="50%">
<img src="https://github.com/rtdi/RulesService/blob/main/docs/media/SampleData.png" width="50%">


<a name="b4"/>

## Rules

Expand Down Expand Up @@ -182,11 +200,13 @@ For more examples [see](docs/rule-syntax.md)

* Can a new output column be created via a formula? No, the output schema is always derived from the input schema, for two reasons. First, if adding fields would be possible, it might collide when the input subject is evolved to a new version. The other reason is performance. It would require to create a new output message from scratch, copying the majority of the data even if nothing has changed. That would be too expensive. So the only option is to add the column to the input schema first.

<a name="b5"/>

## Licensing

This application is provided as dual license. For all users with less than 100'000 messages processed per month, the application can be used free of charge and the code falls under a Gnu Public License. Users with more than 100'000 messages per month are asked to get a ![commercial license](LICENSE_COMMERCIAL) to support further development of this solution. The commercial license is on a monthly pay-per-use basis.
This application is provided as dual license. For all users with less than 100'000 messages processed per month, the application can be used free of charge and the code falls under a Gnu Public License. Users with more than 100'000 messages per month are asked to get a [commercial license](LICENSE_COMMERCIAL) to support further development of this solution. The commercial license is on a monthly pay-per-use basis.

<a name="b6"/>

## Data protection and privacy

Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<packaging>war</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<revision>0.9.27</revision>
<revision>0.9.28</revision>
<log4j.version>2.24.1</log4j.version>
<kafka.version>3.8.0</kafka.version>
<jersey>3.1.8</jersey>
Expand Down

0 comments on commit bbf29a1

Please sign in to comment.