Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate entropy #199

Merged
merged 7 commits into from
Jan 27, 2022
Merged

Calculate entropy #199

merged 7 commits into from
Jan 27, 2022

Conversation

kissgyorgy
Copy link
Contributor

@kissgyorgy kissgyorgy commented Jan 25, 2022

Calculate entropy for all UnknownChunks and draw an ASCII text plot on verbose mode.

Fixes #70

@kissgyorgy kissgyorgy requested review from qkaiser and vlaci January 25, 2022 18:26
@qkaiser
Copy link
Contributor

qkaiser commented Jan 26, 2022

and draw an ASCII text plot on verbose mode

nerd alert ! 🤓

Copy link
Contributor

@qkaiser qkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for @kukovecz or @vlaci to review this branch prior to approving it. I tend to miss things otherwise :)

@qkaiser
Copy link
Contributor

qkaiser commented Jan 26, 2022

I've been playing with plotext on this branch and I honestly love this ❤️

Copy link
Contributor

@kukovecz kukovecz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love the modifications here 👍

Validate the specified value starting from 1.
(0 makes no sense, because it would mean don't do anything).
We calculate entropy percenteges in 1mB chunks. We tested with different
chunk sizes and the bigger the chunk size was the faster it could calculate
1mB seemed like a sweet spot, which isn't too small to be too slow, but
granular enough to provide useful information about most binary files.

There is a new CLI option: --entropy-depth, which determines how deep
we should calculate entropy for these files. The 0 value can turn this
feature completely off.
For entropy representation
A graphical plot is shown in ASCII when verbose mode is enabled.
We are using plotext for the drawing.
Explained by QKaiser:
This will provide more granularity for users
looking into small files (e.g.encrypted config file),
but keep a "constant" time regardless of the file being analyzed.
Iterators should have no side effects, because calling them would be
very suprising (nothing would happen if the iterator is not exhausted).
@kukovecz kukovecz merged commit 5f20faa into main Jan 27, 2022
@kukovecz kukovecz deleted the calculate-entropy branch January 27, 2022 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculate entropy for UnknownChunks
4 participants