-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving API: part 1: functionality for input pyfive.high_level.Dataset #241
Conversation
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
Before I review the code itself, for me the big issue is the high level API. In #241 we suggest that we could just do things like Given the library code itself has to explicitly implement That means, I'd prefer to see us explicitly allowing |
yes, that is indeed on the TODO list - I was planning on tackling that in part 2 PR ie review and merge this first, then part 2 explicit stats, and then part 3 clean up (get rid of all those versions etc) 🍺 |
activestorage/active.py
Outdated
def __load_nc_file(self): | ||
""" Get the netcdf file and it's b-tree""" | ||
""" Get the netcdf file and its b-tree""" | ||
ncvar = self.ncvar | ||
# in all cases we need an open netcdf file to get at attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it's an open netcdf file any more is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 61cf5dd
activestorage/active.py
Outdated
@@ -310,10 +307,6 @@ def _get_selection(self, *args): | |||
# hopefully fix pyfive to get a dtype directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the docstring for this method, and this comment are quite right any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 4cdeb1b
activestorage/active.py
Outdated
@@ -362,13 +355,6 @@ def _from_storage(self, ds, indexer, chunks, out_shape, out_dtype, compressor, f | |||
# Because we do this, we need to read the dataset b-tree now, not as we go, so | |||
# it is already in cache. If we remove the thread pool from here, we probably | |||
# wouldn't need to do it before the first one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is irrelevant now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is irrelevant now?
That's my understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 9aca259
chunk = chunk.reshape(-1, order='A') | ||
chunk = chunk.reshape(shape, order=order) | ||
else: | ||
class storeinfo: pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we import the class so it's more obvious what it's for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this just now, there is unfortunately flakiness involved:
E AssertionError: assert 265.90347 == array(258.62814, dtype=float32)
E + where array(258.62814, dtype=float32) = <built-in function array>(258.62814, dtype='float32')
E + where <built-in function array> = np.array
tests/unit/test_active.py:104: AssertionError
----------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------
Treating input <HDF5 dataset "TREFHT": shape (12, 4, 8), type "float32"> as variable object.
Reducing chunk of object <class 'pyfive.high_level.Dataset'>
Reducing chunk of object <class 'pyfive.high_level.Dataset'>XXX StI byte offset 12394 StI size 128
XXX
StI byte offset 12522actual offset StI size12394 actual size 128
128
actual offset 12522 actual size 128
with implement:
from pyfive.h5d import StoreInfo as storeinfo
storeinfo.byte_offset = offset
storeinfo.size = size
print("XXX", "StI byte offset", storeinfo.byte_offset, "StI size", storeinfo.size)
print("actual offset", offset, "actual size", size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StoreInfo
does not seem to be a safe class the way it is implemented at the moment, since it gets frozen in at times, and doesn't update on the go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are about documentation. This looks good to go as part of a sequence of events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thnaks V - Just some very minor suggestions
activestorage/active.py
Outdated
@@ -362,13 +355,6 @@ def _from_storage(self, ds, indexer, chunks, out_shape, out_dtype, compressor, f | |||
# Because we do this, we need to read the dataset b-tree now, not as we go, so | |||
# it is already in cache. If we remove the thread pool from here, we probably | |||
# wouldn't need to do it before the first one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is irrelevant now?
That's my understanding
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
av._method = "min" | ||
assert av.method([3,444]) == 3 | ||
av_slice_min = av[3:5] | ||
assert av_slice_min == np.array(249.6583, dtype="float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnlawrence @davidhassell this test works a treat, many thanks for reminding me about my own work that I forgot 🤣 I am still keeping the actual real-world test with Reductionist for now though, just so we are fully covered, until the end (ie when we done with work o Pyfive)
Description
Contribution towards #231
This allows for a type
pyfive.high_level.Dataset
to be Active-ed. eg (the test I included):Before you get started
Checklist
[ ] Any changed dependencies have been added or removed correctly (if need be)