Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMR-9569: Update the dataCenter to keyword for generating proper organization facets. #1700

Closed
wants to merge 1 commit into from

Conversation

Bccorb
Copy link
Contributor

@Bccorb Bccorb commented Dec 8, 2023

Overview

The the query metric we currently used failed to be processed by the CMR due dataCenter itself not resolving to attribute.
Changed the query to use portal config value as a keyword search instead. This works and generates the correct organizations. (To me 😬)

What is the feature?

Update the way we query ES for portal collection data.

What is the Solution?

Using the keyword value over the dataCenter value.

What areas of the application does this impact?

All portals that had their config changed.

Testing

Reproduction steps

  • Using a dev environment with changes applied and pointed at PROD and a second browser with PROD open
  • Compare the results of the portals
  • Ensure Organization has facet options available and selecting any one will filter to the correct collections

Checklist

  • I have added automated tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2b6b38b) 91.94% compared to head (e34e35d) 91.94%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1700   +/-   ##
=======================================
  Coverage   91.94%   91.94%           
=======================================
  Files         725      725           
  Lines       19349    19349           
  Branches     4562     4562           
=======================================
  Hits        17790    17790           
  Misses       1423     1423           
  Partials      136      136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@macrouch
Copy link
Contributor

macrouch commented Dec 8, 2023

This feels like an unnecessary change. The ticket and even this PR acknowledge there is a bug in CMR. EDSC shouldn't be providing a different search experience than anyone else querying CMR, so if their dataCenter parameter isn't working correctly for us, it isn't working correctly for anyone else and that is where it needs to be fixed.

keyword is not a very precise method of searching for collections. Because it searches so many different fields it could potentially pick up collections the portal owners would not expect to be returned in their portal. This is why we don't list keyword as being a supported field in the portal documentation, we don't want to encourage it's use

@Bccorb
Copy link
Contributor Author

Bccorb commented Dec 12, 2023

This feels like an unnecessary change. The ticket and even this PR acknowledge there is a bug in CMR. EDSC shouldn't be providing a different search experience than anyone else querying CMR, so if their dataCenter parameter isn't working correctly for us, it isn't working correctly for anyone else and that is where it needs to be fixed.

keyword is not a very precise method of searching for collections. Because it searches so many different fields it could potentially pick up collections the portal owners would not expect to be returned in their portal. This is why we don't list keyword as being a supported field in the portal documentation, we don't want to encourage it's use

I hear your argument, but I don't think this is unnecessary at all. The issue at heart is that for some reason dataCenter and organizations are used what seems to be synonymously. This makes the problem of of having to reconfigure a lot of of CMR logic in reference to dataCenters and organizations in order to create the correct experience.

The change to keyword as it stands now, yields exactly what is expected in the facets. Couple this with the fact that the collections results also use a keyword search currently, this fix is more inline with what EDSC already does for retrieving the collections, it just matches the facets to the same approach.

@abbottry
Copy link
Member

The problem is that changes to CMR will result in different values over time and we wont know to expect it. keyword takes advantage of elastic searches full text search, and searches across many fields, users will not expect this. The dataCenter keyword, bugged or not, has much more functionality tied to -- the value you are seeing/providing is more of an.. obfuscation, a feature that CMR internally calls "humanizers" which allows us to normalize a number of searches and direct them towards all the possible combinations.

Additionally, using keyword isn't great because the purpose of a portal is scope the EDSC experience and allow users to then run searches. If we hijack the search bar as part of the portal, the users will not be able to filter any further unless they restrict their usage to facets, temporal and spatial.

Any time we are seeing issues related to the results coming back from CMR, we need to file tickets against CMR. Humanizers and facets have seen their fair share of bugs throughout the years, and we need to continue to address them rather than obscure them with (even valid or functional) work arounds.

@macrouch
Copy link
Contributor

This makes the problem of of having to reconfigure a lot of of CMR logic in reference to dataCenters and organizations in order to create the correct experience.

When CMR has a bug the bug gets fixed in CMR. EDSC strives to not put bandaids on CMR bugs because that means a different quality of product is delivered depending on which system you are using.

Couple this with the fact that the collections results also use a keyword search currently, this fix is more inline with what EDSC already does for retrieving the collections, it just matches the facets to the same approach.

Keyword searches on collections are not meant to be the same approach as facets, or portals. They are broad keyword searches meant for ease of use. We do not want to use broad searches for portals.

@Bccorb
Copy link
Contributor Author

Bccorb commented Dec 12, 2023

The problem is that changes to CMR will result in different values over time and we wont know to expect it. keyword takes advantage of elastic searches full text search, and searches across many fields, users will not expect this. The dataCenter keyword, bugged or not, has much more functionality tied to -- the value you are seeing/providing is more of an.. obfuscation, a feature that CMR internally calls "humanizers" which allows us to normalize a number of searches and direct them towards all the possible combinations.

Additionally, using keyword isn't great because the purpose of a portal is scope the EDSC experience and allow users to then run searches. If we hijack the search bar as part of the portal, the users will not be able to filter any further unless they restrict their usage to facets, temporal and spatial.

Any time we are seeing issues related to the results coming back from CMR, we need to file tickets against CMR. Humanizers and facets have seen their fair share of bugs throughout the years, and we need to continue to address them rather than obscure them with (even valid or functional) work arounds.

I think the part that seems to be missing here, is that the current strategy now does not do what you think it does. It does not appear to do some query a kin to get me all the collections where dataCenter = value That is what I think we think is happening, but dataCenter is not a queryable value in any form. There is only organization_h. Instead what appears to happen is that a keyword search is being performed to retrieve the collections. The facets v2 were designed expected there to be a conversion of dataCenter to organization, but that doesn't seem to be the case.

@Bccorb
Copy link
Contributor Author

Bccorb commented Dec 12, 2023

This makes the problem of of having to reconfigure a lot of of CMR logic in reference to dataCenters and organizations in order to create the correct experience.

When CMR has a bug the bug gets fixed in CMR. EDSC strives to not put bandaids on CMR bugs because that means a different quality of product is delivered depending on which system you are using.

Couple this with the fact that the collections results also use a keyword search currently, this fix is more inline with what EDSC already does for retrieving the collections, it just matches the facets to the same approach.

Keyword searches on collections are not meant to be the same approach as facets, or portals. They are broad keyword searches meant for ease of use. We do not want to use broad searches for portals.

What you are asking for is nearly a complete reshaping of how v2 facets handle organizations as a whole. Which is has far more downhill impact that I may not have the scope to know how it will roll downhill? I think there has to be a trade off for bugs that have persisted for years, and a fix.

For the latter, I am saying that a portal config right now, appears to just do a keyword search to return the collections. The facets are the thing that are not doing that.

@Bccorb Bccorb closed this Dec 12, 2023
@Bccorb Bccorb deleted the CMR-9569 branch December 12, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants