Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EB-PPS Sampling sketch #499

Merged
merged 9 commits into from
Feb 6, 2024
Merged

EB-PPS Sampling sketch #499

merged 9 commits into from
Feb 6, 2024

Conversation

jmalkin
Copy link
Contributor

@jmalkin jmalkin commented Jan 30, 2024

Adds EBPPS Sketch, providing parity with Java.

No cross-language tests yet. I did perform some characterization tests for entropy of a single sketch and 2-way merge, where the average of many runs should look like the weighted input distribution and things were in line with the C++ version. I haven't figured out how to create proper tests from that since it's basically "Are entropy and KL Divergence 'close enough'?"

@leerho leerho self-requested a review February 1, 2024 17:25
@@ -102,7 +102,7 @@
* </pre>
*
* <p><strong>VarOpt Union:</strong> VarOpt unions also store more information than a reservoir
* sketch. As before, we keep values with similar o hte same meaning in corresponding locations
* sketch. As before, we keep values with similar to the same meaning in corresponding locations
Copy link
Contributor

@leerho leerho Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have multiple sketches using the same PreambleUtil class. That is OK, but where you have the documentation for each of the specific sketches that use this class is confusing. You chose to place the preamble structure for Ebpps in its Sketch class, but for VarOpt and Reservoir is in this class. You could either move the specific preamble structures in to their respective sketch classes, or (I think I would prefer) have all of the structure docs in this class, separated by clear documentation boundaries. I.e., use of <h2> </h2> headers, which work great in javadocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure of the best way to do this. I feel like stacking them all together is not actually helpful since the class starts to feel bloated. You seem less worried about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sketch specific docs, and related code is nicely separated out with headers, it won't feel bloated. It all has a reason for being there. A class begins to feel "bloated", I think, when too much totally unrelated stuff is put into the same class.

The intention was for the PreambleUtil class to be THE place to go to understand the various serialization formats and hold the static constants and static helper methods related to serialization. Having this stuff spread out in different classes makes it hard to find -- 'cause I, at least, will not be able to remember where to go to find it. We have this inconsistency in some other sketches as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge here is that I didn't want to create a stand-alone family, so there are now 3 separate sketches squashed into the same PreambleUtil class. But that's fine, I certainly don't have an objection to moving it.

Copy link
Contributor

@leerho leerho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have some questions about your fractional logic: somewhere we need to be sure that "c_" never is allowed to be negative.
  • Also, are you using Checkstyle?
  • A minor change to make sure k is not allowed to be negative.

@jmalkin
Copy link
Contributor Author

jmalkin commented Feb 4, 2024

In theory I (now) have checkstyle running but it's not complaining if I create a public method w/o a comment. Not sure that VS Code is using it properly.

Copy link
Contributor

@leerho leerho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this!

@jmalkin jmalkin merged commit a8231cb into master Feb 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants