Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSquery fix issue with document rejection by upgrading osquery_manager package and rolling over indices on upgrade #148991

Merged

Conversation

kevinlog
Copy link
Contributor

@kevinlog kevinlog commented Jan 16, 2023

Summary

This PR helps address this bug: elastic/beats#34250

The issue is that with the Agent v2, osquery beat sends different values in the data_stream field, which are type constant_keyword, to the logs-osquery_manager.result-* indices. This causes document rejection if users upgraded to Agent v8.6.0 and already had data OR if they start fresh with Agent v8.6.0 and upgrade to the 8.6.1 Agent/osquery beat with the fix for sending this data incorrectly.

To make the fix for this problem seamless for users on upgrade, we need to do the following:

  1. Install the latest 1.6.0 osquery_manager package which has a new pipeline which will correct the values that the 8.6.0 Agent/osquery beat send.
  2. Check to see if there are "bad" values in the relevant data streams and roll the over if needed

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@kevinlog kevinlog force-pushed the task/move-datastream-check-to-plugin-start branch from c63b8f6 to 6a6641d Compare January 16, 2023 21:38
@kevinlog kevinlog force-pushed the task/move-datastream-check-to-plugin-start branch from 1f9025e to 7915504 Compare January 17, 2023 16:50
@kevinlog kevinlog added Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.7.0 v8.6.0 v8.6.1 release_note:fix labels Jan 17, 2023
@kevinlog kevinlog marked this pull request as ready for review January 17, 2023 19:02
@kevinlog kevinlog requested a review from a team as a code owner January 17, 2023 19:02
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-defend-workflows (Team:Defend Workflows)

@kevinlog
Copy link
Contributor Author

@elasticmachine merge upstream

@kevinlog kevinlog requested review from tomsonpl and removed request for patrykkopycinski January 17, 2023 19:02
@kevinlog kevinlog changed the title OSquery datastream rollover check moved to plugin start OSquery fix issue with document rejection by upgrading osquery_manager package and rolling over indices on upgrade Jan 17, 2023
@kevinlog
Copy link
Contributor Author

@elasticmachine merge upstream

@tomsonpl tomsonpl requested a review from paul-tavares January 19, 2023 19:47
@tomsonpl
Copy link
Contributor

@paul-tavares Hey, do you mind taking a look to check if the function above meets all the ''requirements'' of a good a async function within plugin as we discussed? It's quite late for me and my brain stopped working so would appreciate a second pair of eyes :P
Big thanks in advance!

@kevinlog
Copy link
Contributor Author

@elasticmachine merge upstream

@kevinlog
Copy link
Contributor Author

@tomsonpl I checked out and tried your change and it looks to be working. thanks for pushing up the change!

I can get the data after the fix in OSQuery:

image

and I get the log output as expected to show that we update the package and rollover the indices:

image

@tomsonpl
Copy link
Contributor

@elasticmachine merge upstream

@tomsonpl
Copy link
Contributor

@elasticmachine merge upstream

@tomsonpl tomsonpl self-requested a review January 23, 2023 07:50
Copy link
Contributor

@tomsonpl tomsonpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I am not sure about the way (force) to upgrade for users' integration, but it seems to be the best idea we have for now :) Thanks @kevinlog !

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@dasansol92 dasansol92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great to me. I just wonder about the amount of async calls and awaits it will do during the plugin start process. Is this something could take too much time?

@@ -135,6 +136,9 @@ export class OsqueryPlugin implements Plugin<OsqueryPluginSetup, OsqueryPluginSt
await this.initialize(core, dataViewsService);
}

// Upgrade integration into 1.6.0 and rollover if found 'generic' dataset - we do not want to wait for it
upgradeIntegration({ packageInfo, client, esClient, logger: this.logger });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: as the upgrade is conditional, could this be renamed to something like upgradeIntegrationIfNeeded or checkAndUpgradeIntegrationIfNeeded

});

// Then for each of those datastreams, we need to see if they need to rollover.
await asyncForEach(dataStreams.data_streams, async (dataStream) => {
Copy link
Contributor

@dasansol92 dasansol92 Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be changed, but if you would like to avoid asyncForEach import, you could just use a for loop:

for (const dataStream of dataStreams.data_streams) {
[....]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done in parallel? Wondering if you could use pMap here and speed up the process.

@tomsonpl
Copy link
Contributor

We discussed this offline with @dasansol92 , and given the short time we have for the 8.6.1 FF, we agreed that it's gonna be merged in the current state and the performance improvements can be added in a next iteration.
Big thanks @dasansol92 :)

Copy link
Contributor

@dasansol92 dasansol92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM, suggestions can be done later.

@tomsonpl tomsonpl added the bug Fixes for quality problems that affect the customer experience label Jan 23, 2023
@tomsonpl tomsonpl merged commit 192c739 into elastic:main Jan 23, 2023
@tomsonpl
Copy link
Contributor

Merged, @kevinlog FYI.

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.6 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 148991

Questions ?

Please refer to the Backport tool documentation

@tomsonpl
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.6

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

tomsonpl pushed a commit to tomsonpl/kibana that referenced this pull request Jan 23, 2023
…r package and rolling over indices on upgrade (elastic#148991)

(cherry picked from commit 192c739)

# Conflicts:
#	x-pack/plugins/osquery/server/plugin.ts
#	x-pack/plugins/osquery/tsconfig.json
@kevinlog kevinlog deleted the task/move-datastream-check-to-plugin-start branch January 23, 2023 11:36
Copy link
Contributor

@paul-tavares paul-tavares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I left a comment regarding the pinning of the package version to 1.6.0

savedObjectsClient: client,
pkgkey: pkgToPkgKey({
name: packageInfo.name,
version: '1.6.0', // This package upgrade is specific to a bug fix, so keeping the upgrade focused on 1.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be pinning the version to 1.6.0 here, is that correct? What if the newest version is higher than 1.6.0?

The reason this looks strange to me is because this PR is being merged to main which means it applies to v8.7++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @paul-tavares! Thanks for raising the issue :) I haven’t really think about it, but the goal is to force upgrade older version than 1.6.0 to 1.6.0 which fixed the issue. The rest - update to any other versions are not crucial to us.
I think we can still discuss how to handle this 8.7 onwards, the most crucial part is to fix 8.6.1 and actually @kevinlog suggested another way to fix it: #149325

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-tavares - osquery doesn't normally upgrade it's package automatically. We wanted to make an exception in this case since 1.6.0 contains a critical fix. We'll only try to upgrade if the package version is below 1.6.0. We will make sure the package is at 1.6.0 and after that, users can check and upgrade the package as normal with newer changes pushed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. As long as you realize that the forced upgrade here might be to a version (1.6.0) that is not the latest and thats ok condition, then 👍

Thanks for the responses @tomsonpl , @kevinlog

@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 23, 2023
@KOTungseth KOTungseth added the Feature:Osquery Security Solution Osquery feature label Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting bug Fixes for quality problems that affect the customer experience Feature:Osquery Security Solution Osquery feature release_note:fix Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants