You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Generate a Github Token](#generate-a-github-token)
27
-
-[Quick Run](#quick-run-3)
28
24
-[Security Notice](#security-notice)
29
25
30
26
@@ -190,70 +186,6 @@ The following environment variables can be used to configure Kafka Connectors:
190
186
3. Click `OdeBsmJson`, and now you should see your message!
191
187
8. Feel free to test this with other topics or by producing to these topics using the [ODE](https://github.com/usdot-jpo-ode/jpo-ode)
192
188
193
-
194
-
<aname="deduplicator"></a>
195
-
196
-
## 5. jpo-deduplicator
197
-
The JPO-Deduplicator is a Kafka Java spring-boot application designed to reduce the number of messages stored and processed in the ODE system. This is done by reading in messages from an input topic (such as topic.ProcessedMap) and outputting a subset of those messages on a related output topic (topic.DeduplicatedProcessedMap). Functionally, this is done by removing deduplicate messages from the input topic and only passing on unique messages. In addition, each topic will pass on at least 1 message per hour even if the message is a duplicate. This behavior helps ensure messages are still flowing through the system. The following topics currently support deduplication.
When running the jpo-deduplication as a submodule in jpo-utils, the deduplicator will automatically configure an algorithm as enabled or disabled depending on if the corresponding subcomponent is also active. For example if the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER environment variable is set to true, the deduplicator will start performing deduplication for ProcessedMap, ProcessedMapWKT, and ProcessedSpat data. If the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER is set to false, the deduplicator will disable deduplication for those same topics. To manually configure deduplication for a topic, the following environment variables can also be used. If no value is passed for a given environment variable, the corresponding deduplication algorithm will default to enabled.
A GitHub token is required to pull artifacts from GitHub repositories. This is required to obtain the jpo-deduplicator jars and must be done before attempting to build this repository.
224
-
225
-
1. Log into GitHub.
226
-
2. Navigate to Settings -> Developer settings -> Personal access tokens.
227
-
3. Click "New personal access token (classic)".
228
-
1. As of now, GitHub does not support `Fine-grained tokens` for obtaining packages.
229
-
4. Provide a name and expiration for the token.
230
-
5. Select the `read:packages` scope.
231
-
6. Click "Generate token" and copy the token.
232
-
7. Copy the token name and token value into your `.env` file.
233
-
234
-
For local development the following steps are also required
235
-
8. Create a copy of [settings.xml](jpo-deduplicator/jpo-deduplicator/settings.xml) and save it to `~/.m2/settings.xml`
236
-
9. Update the variables in your `~/.m2/settings.xml` with the token value and target jpo-ode organization.
237
-
238
-
### Quick Run
239
-
1. Create a copy of `sample.env` and rename it to `.env`.
240
-
2. Update the variable `MAVEN_GITHUB_TOKEN` to a github token used for downloading jar file dependencies. For full instructions on how to generate a token please see here:
241
-
3. Set the password for `MONGO_ADMIN_DB_PASS` and `MONGO_READ_WRITE_PASS` environmental variables to a secure password.
242
-
4. Set the `COMPOSE_PROFILES` variable to: `kafka,kafka_ui,kafka_setup, jpo-deduplicator`
243
-
5. Navigate back to the root directory and run the following command: `docker compose up -d`
244
-
6. Produce a sample message to one of the sink topics by using `kafka_ui` by:
245
-
1. Go to `localhost:8001`
246
-
2. Click local -> Topics
247
-
3. Select `topic.OdeMapJson`
248
-
4. Select `Produce Message`
249
-
5. Copy in sample JSON for a Map Message
250
-
6. Click `Produce Message` multiple times
251
-
7. View the synced message in `kafka_ui` by:
252
-
1. Go to `localhost:8001`
253
-
2. Click local -> Topics
254
-
3. Select `topic.DeduplicatedOdeMapJson`
255
-
4. You should now see only one copy of the map message sent.
0 commit comments