-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EB-PPS Sampling sketch #499
Conversation
src/main/java/org/apache/datasketches/sampling/EbppsItemsSketch.java
Fixed
Show resolved
Hide resolved
src/main/java/org/apache/datasketches/sampling/EbppsItemsSample.java
Fixed
Show resolved
Hide resolved
src/test/java/org/apache/datasketches/sampling/EbppsSketchTest.java
Fixed
Show resolved
Hide resolved
src/test/java/org/apache/datasketches/sampling/EbppsSketchTest.java
Fixed
Show resolved
Hide resolved
src/main/java/org/apache/datasketches/sampling/EbppsItemsSample.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/datasketches/sampling/EbppsItemsSketch.java
Outdated
Show resolved
Hide resolved
@@ -102,7 +102,7 @@ | |||
* </pre> | |||
* | |||
* <p><strong>VarOpt Union:</strong> VarOpt unions also store more information than a reservoir | |||
* sketch. As before, we keep values with similar o hte same meaning in corresponding locations | |||
* sketch. As before, we keep values with similar to the same meaning in corresponding locations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have multiple sketches using the same PreambleUtil class. That is OK, but where you have the documentation for each of the specific sketches that use this class is confusing. You chose to place the preamble structure for Ebpps in its Sketch class, but for VarOpt and Reservoir is in this class. You could either move the specific preamble structures in to their respective sketch classes, or (I think I would prefer) have all of the structure docs in this class, separated by clear documentation boundaries. I.e., use of <h2> </h2> headers, which work great in javadocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure of the best way to do this. I feel like stacking them all together is not actually helpful since the class starts to feel bloated. You seem less worried about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sketch specific docs, and related code is nicely separated out with headers, it won't feel bloated. It all has a reason for being there. A class begins to feel "bloated", I think, when too much totally unrelated stuff is put into the same class.
The intention was for the PreambleUtil class to be THE place to go to understand the various serialization formats and hold the static constants and static helper methods related to serialization. Having this stuff spread out in different classes makes it hard to find -- 'cause I, at least, will not be able to remember where to go to find it. We have this inconsistency in some other sketches as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge here is that I didn't want to create a stand-alone family, so there are now 3 separate sketches squashed into the same PreambleUtil
class. But that's fine, I certainly don't have an objection to moving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I have some questions about your fractional logic: somewhere we need to be sure that "c_" never is allowed to be negative.
- Also, are you using Checkstyle?
- A minor change to make sure k is not allowed to be negative.
…ean up checks on c and theta to ensure no invalid inputs
In theory I (now) have checkstyle running but it's not complaining if I create a public method w/o a comment. Not sure that VS Code is using it properly. |
…ttle endian format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work on this!
Adds EBPPS Sketch, providing parity with Java.
No cross-language tests yet. I did perform some characterization tests for entropy of a single sketch and 2-way merge, where the average of many runs should look like the weighted input distribution and things were in line with the C++ version. I haven't figured out how to create proper tests from that since it's basically "Are entropy and KL Divergence 'close enough'?"