Skip to content

Commit 7183174

Browse files
authored
Merge pull request #10 from CDOT-CV/jpo-deduplicator-removal
Jpo deduplicator removal
2 parents 2596542 + 551e09f commit 7183174

File tree

59 files changed

+11
-4143
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+11
-4143
lines changed

.github/workflows/ci.yml

-28
This file was deleted.

.github/workflows/docker.yml

+1-21
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,7 @@ on:
44
pull_request:
55
types: [opened, synchronize, reopened]
66

7-
jobs:
8-
jpo-deduplicator:
9-
runs-on: ubuntu-latest
10-
steps:
11-
- name: Checkout
12-
uses: actions/checkout@v3
13-
- name: Set up Docker Buildx
14-
uses: docker/setup-buildx-action@v2
15-
- name: Build
16-
uses: docker/build-push-action@v3
17-
with:
18-
context: jpo-deduplicator
19-
build-args: |
20-
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
21-
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
22-
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
23-
secrets: |
24-
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
25-
cache-from: type=gha
26-
cache-to: type=gha,mode=max
27-
7+
jobs:
288
jpo-jikkou:
299
runs-on: ubuntu-latest
3010
steps:

.github/workflows/dockerhub.yml

+1-33
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,7 @@ on:
77
- "master"
88
- "release/*"
99

10-
jobs:
11-
dockerhub-jpo-deduplicator:
12-
runs-on: ubuntu-latest
13-
steps:
14-
- name: Checkout
15-
uses: actions/checkout@v3
16-
- name: Set up Docker Buildx
17-
uses: docker/setup-buildx-action@v2
18-
- name: Login to DockerHub
19-
uses: docker/login-action@v2
20-
with:
21-
username: ${{ secrets.DOCKERHUB_USERNAME }}
22-
password: ${{ secrets.DOCKERHUB_TOKEN }}
23-
24-
- name: Replace Docker tag
25-
id: set_tag
26-
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV
27-
28-
- name: Build
29-
uses: docker/build-push-action@v3
30-
with:
31-
context: jpo-deduplicator
32-
push: true
33-
tags: usdotjpoode/jpo-deduplicator:${{ env.TAG }}
34-
build-args: |
35-
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
36-
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
37-
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
38-
secrets: |
39-
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
40-
cache-from: type=gha
41-
cache-to: type=gha,mode=max
42-
10+
jobs:
4311
dockerhub-jpo-jikkou:
4412
runs-on: ubuntu-latest
4513
steps:

README.md

-68
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,6 @@ The JPO ITS utilities repository serves as a central location for deploying open
2121
- [Configuration](#configuration)
2222
- [Configure Kafka Connector Creation](#configure-kafka-connector-creation)
2323
- [Quick Run](#quick-run-2)
24-
- [5. jpo-deduplicator](#5-jpo-deduplicator)
25-
- [Deduplication Config](#deduplication-config)
26-
- [Generate a Github Token](#generate-a-github-token)
27-
- [Quick Run](#quick-run-3)
2824
- [Security Notice](#security-notice)
2925

3026

@@ -190,70 +186,6 @@ The following environment variables can be used to configure Kafka Connectors:
190186
3. Click `OdeBsmJson`, and now you should see your message!
191187
8. Feel free to test this with other topics or by producing to these topics using the [ODE](https://github.com/usdot-jpo-ode/jpo-ode)
192188

193-
194-
<a name="deduplicator"></a>
195-
196-
## 5. jpo-deduplicator
197-
The JPO-Deduplicator is a Kafka Java spring-boot application designed to reduce the number of messages stored and processed in the ODE system. This is done by reading in messages from an input topic (such as topic.ProcessedMap) and outputting a subset of those messages on a related output topic (topic.DeduplicatedProcessedMap). Functionally, this is done by removing deduplicate messages from the input topic and only passing on unique messages. In addition, each topic will pass on at least 1 message per hour even if the message is a duplicate. This behavior helps ensure messages are still flowing through the system. The following topics currently support deduplication.
198-
199-
- topic.ProcessedMap -> topic.DeduplicatedProcessedMap
200-
- topic.ProcessedMapWKT -> topic.DeduplicatedProcessedMapWKT
201-
- topic.OdeMapJson -> topic.DeduplicatedOdeMapJson
202-
- topic.OdeTimJson -> topic.DeduplicatedOdeTimJson
203-
- topic.OdeRawEncodedTIMJson -> topic.DeduplicatedOdeRawEncodedTIMJson
204-
- topic.OdeBsmJson -> topic.DeduplicatedOdeBsmJson
205-
- topic.ProcessedSpat -> topic.DeduplicatedProcessedSpat
206-
207-
### Deduplication Config
208-
209-
When running the jpo-deduplication as a submodule in jpo-utils, the deduplicator will automatically configure an algorithm as enabled or disabled depending on if the corresponding subcomponent is also active. For example if the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER environment variable is set to true, the deduplicator will start performing deduplication for ProcessedMap, ProcessedMapWKT, and ProcessedSpat data. If the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER is set to false, the deduplicator will disable deduplication for those same topics. To manually configure deduplication for a topic, the following environment variables can also be used. If no value is passed for a given environment variable, the corresponding deduplication algorithm will default to enabled.
210-
211-
| Environment Variable | Description |
212-
|---|---|
213-
| `ENABLE_PROCESSED_MAP_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap message Deduplication |
214-
| `ENABLE_PROCESSED_MAP_WKT_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap WKT message Deduplication |
215-
| `ENABLE_ODE_MAP_DEDUPLICATION` | `true` / `false` - Enable ODE MAP message Deduplication |
216-
| `ENABLE_ODE_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE TIM message Deduplication |
217-
| `ENABLE_ODE_RAW_ENCODED_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE Raw Encoded TIM Deduplication |
218-
| `ENABLE_PROCESSED_SPAT_DEDUPLICATION` | `true` / `false` - Enable ProcessedSpat Deduplication |
219-
| `ENABLE_ODE_BSM_DEDUPLICATION` | `true` / `false` - Enable ODE BSM Deduplication |
220-
221-
### Generate a Github Token
222-
223-
A GitHub token is required to pull artifacts from GitHub repositories. This is required to obtain the jpo-deduplicator jars and must be done before attempting to build this repository.
224-
225-
1. Log into GitHub.
226-
2. Navigate to Settings -> Developer settings -> Personal access tokens.
227-
3. Click "New personal access token (classic)".
228-
1. As of now, GitHub does not support `Fine-grained tokens` for obtaining packages.
229-
4. Provide a name and expiration for the token.
230-
5. Select the `read:packages` scope.
231-
6. Click "Generate token" and copy the token.
232-
7. Copy the token name and token value into your `.env` file.
233-
234-
For local development the following steps are also required
235-
8. Create a copy of [settings.xml](jpo-deduplicator/jpo-deduplicator/settings.xml) and save it to `~/.m2/settings.xml`
236-
9. Update the variables in your `~/.m2/settings.xml` with the token value and target jpo-ode organization.
237-
238-
### Quick Run
239-
1. Create a copy of `sample.env` and rename it to `.env`.
240-
2. Update the variable `MAVEN_GITHUB_TOKEN` to a github token used for downloading jar file dependencies. For full instructions on how to generate a token please see here:
241-
3. Set the password for `MONGO_ADMIN_DB_PASS` and `MONGO_READ_WRITE_PASS` environmental variables to a secure password.
242-
4. Set the `COMPOSE_PROFILES` variable to: `kafka,kafka_ui,kafka_setup, jpo-deduplicator`
243-
5. Navigate back to the root directory and run the following command: `docker compose up -d`
244-
6. Produce a sample message to one of the sink topics by using `kafka_ui` by:
245-
1. Go to `localhost:8001`
246-
2. Click local -> Topics
247-
3. Select `topic.OdeMapJson`
248-
4. Select `Produce Message`
249-
5. Copy in sample JSON for a Map Message
250-
6. Click `Produce Message` multiple times
251-
7. View the synced message in `kafka_ui` by:
252-
1. Go to `localhost:8001`
253-
2. Click local -> Topics
254-
3. Select `topic.DeduplicatedOdeMapJson`
255-
4. You should now see only one copy of the map message sent.
256-
257189
[Back to top](#toc)
258190

259191
## Security Notice

docker-compose-deduplicator.yml

-41
This file was deleted.

docker-compose.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
include:
22
- docker-compose-connect.yml
33
- docker-compose-mongo.yml
4-
- docker-compose-kafka.yml
5-
- docker-compose-deduplicator.yml
4+
- docker-compose-kafka.yml

docs/Release_notes.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,5 @@ USDOT PR 23: Adding missing ode topics
4040
USDOT PR 24: Index updates
4141
CDOT PR 6: Adding MEC Deposit Resources
4242
USDOT PR 25: Updating version for kafka ui to latest release
43-
CDOT PR 7: Tim compatibility and CI updates
43+
CDOT PR 7: Tim compatibility and CI updates
44+
CDOT PR 8: Jpo deduplicator removal

jpo-deduplicator/.dockerignore

-1
This file was deleted.

jpo-deduplicator/Dockerfile

-48
This file was deleted.

0 commit comments

Comments
 (0)