-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from *Indexer objects to .oindex
and .vindex
on explicityIndexed array classes
#8364
Changes from all commits
5ca0fe8
92e18a9
b54388d
9474e09
25eec55
9504ac5
c9aa893
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,13 +1,9 @@ | ||||||||||||||||||||||||||
from __future__ import annotations | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
import sys | ||||||||||||||||||||||||||
from collections.abc import Hashable | ||||||||||||||||||||||||||
from collections.abc import Hashable, Iterable, Mapping | ||||||||||||||||||||||||||
from enum import Enum | ||||||||||||||||||||||||||
from typing import ( | ||||||||||||||||||||||||||
TYPE_CHECKING, | ||||||||||||||||||||||||||
Any, | ||||||||||||||||||||||||||
Final, | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
from typing import TYPE_CHECKING, Any, Final, TypeVar | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
import numpy as np | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -19,9 +15,7 @@ | |||||||||||||||||||||||||
|
||||||||||||||||||||||||||
from numpy.typing import NDArray | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
from xarray.namedarray._typing import ( | ||||||||||||||||||||||||||
duckarray, | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
from xarray.namedarray._typing import T_DuckArray, duckarray | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
try: | ||||||||||||||||||||||||||
from dask.array.core import Array as DaskArray | ||||||||||||||||||||||||||
|
@@ -30,6 +24,10 @@ | |||||||||||||||||||||||||
DaskArray = NDArray # type: ignore | ||||||||||||||||||||||||||
DaskCollection: Any = NDArray # type: ignore | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
K = TypeVar("K") | ||||||||||||||||||||||||||
V = TypeVar("V") | ||||||||||||||||||||||||||
T = TypeVar("T") | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Singleton type, as per https://github.com/python/typing/pull/240 | ||||||||||||||||||||||||||
class Default(Enum): | ||||||||||||||||||||||||||
|
@@ -80,6 +78,54 @@ def to_0d_object_array( | |||||||||||||||||||||||||
return result | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
def is_duck_array(value: Any) -> TypeGuard[T_DuckArray]: | ||||||||||||||||||||||||||
if isinstance(value, np.ndarray): | ||||||||||||||||||||||||||
return True | ||||||||||||||||||||||||||
return ( | ||||||||||||||||||||||||||
hasattr(value, "ndim") | ||||||||||||||||||||||||||
and hasattr(value, "shape") | ||||||||||||||||||||||||||
and hasattr(value, "dtype") | ||||||||||||||||||||||||||
and ( | ||||||||||||||||||||||||||
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__")) | ||||||||||||||||||||||||||
or hasattr(value, "__array_namespace__") | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
Comment on lines
+81
to
+92
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Not needed anymore. Use isinstance checks with _arrayfunction_or_api instead, for example: if isinstance(data, _arrayfunction_or_api):
return NamedArray(dims, data, attrs) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sweet! thank you for the tip There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Illviljan, i'm running in a strange error when working with the following works fine In [18]: isinstance(x, _arrayfunction_or_api)
Out[18]: True
In [19]: data = sparse.COO(np.array([1, 1, 0, 1, 1, 0]))
In [20]: isinstance(data, _arrayfunction_or_api)
Out[20]: True however, when this fails: In [21]: data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))
In [22]: isinstance(data, _arrayfunction_or_api)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 isinstance(data, _arrayfunction_or_api)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in _ProtocolMeta.__instancecheck__(cls, instance)
1989 return True
1990 if cls._is_protocol:
-> 1991 if all(hasattr(instance, attr) and
1992 # All *methods* can be blocked by setting them to None.
1993 (not callable(getattr(cls, attr, None)) or
1994 getattr(instance, attr) is not None)
1995 for attr in _get_protocol_attrs(cls)):
1996 return True
1997 return super().__instancecheck__(instance)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in <genexpr>(.0)
1989 return True
1990 if cls._is_protocol:
-> 1991 if all(hasattr(instance, attr) and
1992 # All *methods* can be blocked by setting them to None.
1993 (not callable(getattr(cls, attr, None)) or
1994 getattr(instance, attr) is not None)
1995 for attr in _get_protocol_attrs(cls)):
1996 return True
1997 return super().__instancecheck__(instance)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:884, in SparseArray.real(self)
859 @property
860 def real(self):
861 """The real part of the array.
862
863 Examples
(...)
882 numpy.real : NumPy equivalent function.
883 """
--> 884 return self.__array_ufunc__(np.real, "__call__", self)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:305, in SparseArray.__array_ufunc__(self, ufunc, method, *inputs, **kwargs)
302 inputs = tuple(reversed(inputs_transformed))
304 if method == "__call__":
--> 305 result = elemwise(ufunc, *inputs, **kwargs)
306 elif method == "reduce":
307 result = SparseArray._reduce(ufunc, *inputs, **kwargs)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:49, in elemwise(func, *args, **kwargs)
12 def elemwise(func, *args, **kwargs):
13 """
14 Apply a function to any number of arguments.
15
(...)
46 it is necessary to convert Numpy arrays to :obj:`COO` objects.
47 """
---> 49 return _Elemwise(func, *args, **kwargs).get_result()
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:466, in _Elemwise.__init__(self, func, *args, **kwargs)
463 self._dense_result = False
465 self._check_broadcast()
--> 466 self._get_fill_value()
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:535, in _Elemwise._get_fill_value(self)
525 """
526 A function that finds and returns the fill-value.
527
(...)
531 If the fill-value is inconsistent.
532 """
533 from ._coo import COO
--> 535 zero_args = tuple(
536 arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
537 )
539 # Some elemwise functions require a dtype argument, some abhorr it.
540 try:
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:536, in <genexpr>(.0)
525 """
526 A function that finds and returns the fill-value.
527
(...)
531 If the fill-value is inconsistent.
532 """
533 from ._coo import COO
535 zero_args = tuple(
--> 536 arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
537 )
539 # Some elemwise functions require a dtype argument, some abhorr it.
540 try:
TypeError: string indices must be integers, not 'ellipsis' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like a bug in sparse. data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))
data.real
Traceback (most recent call last):
File "C:\Users\J.W\AppData\Local\Temp\ipykernel_17864\1521749299.py", line 1, in <cell line: 1>
data.real
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 884, in real
return self.__array_ufunc__(np.real, "__call__", self)
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 305, in __array_ufunc__
result = elemwise(ufunc, *inputs, **kwargs)
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 49, in elemwise
return _Elemwise(func, *args, **kwargs).get_result()
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 466, in __init__
self._get_fill_value()
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 535, in _get_fill_value
zero_args = tuple(
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 536, in <genexpr>
arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
TypeError: string indices must be integers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i opened an issue here: what could be a practical temporary solution? should we consider reverting to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I lean towards commenting out .imag/.real until it's solved. That won't be a complete refactor I think. I gave it a good try using a typeguarded |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# It's probably OK to give this as a TypeGuard; though it's not perfectly robust. | ||||||||||||||||||||||||||
def is_dict_like(value: Any) -> TypeGuard[Mapping]: | ||||||||||||||||||||||||||
return hasattr(value, "keys") and hasattr(value, "__getitem__") | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
def _is_scalar(value, include_0d): | ||||||||||||||||||||||||||
# from xarray.core.variable import NON_NUMPY_SUPPORTED_ARRAY_TYPES | ||||||||||||||||||||||||||
NON_NUMPY_SUPPORTED_ARRAY_TYPES = tuple() | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
if include_0d: | ||||||||||||||||||||||||||
include_0d = getattr(value, "ndim", None) == 0 | ||||||||||||||||||||||||||
return ( | ||||||||||||||||||||||||||
include_0d | ||||||||||||||||||||||||||
or isinstance(value, (str, bytes)) | ||||||||||||||||||||||||||
or not ( | ||||||||||||||||||||||||||
isinstance(value, (Iterable,) + NON_NUMPY_SUPPORTED_ARRAY_TYPES) | ||||||||||||||||||||||||||
or hasattr(value, "__array_function__") | ||||||||||||||||||||||||||
or hasattr(value, "__array_namespace__") | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
def is_scalar(value: Any, include_0d: bool = True) -> TypeGuard[Hashable]: | ||||||||||||||||||||||||||
"""Whether to treat a value as a scalar. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Any non-iterable, string, or 0-D array | ||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||
return _is_scalar(value, include_0d) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
def is_0d_dask_array(x): | ||||||||||||||||||||||||||
return is_duck_dask_array(x) and is_scalar(x) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
class ReprObject: | ||||||||||||||||||||||||||
"""Object that prints as the given value, for use with sentinel values.""" | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed in a different PR, but while looking at the changes I noticed that without involving a type checker
is_duck_dask_array
returns the same result asis_dask_collection
. Could we also do a runtime check thatx
is a duckarray? (isinstance(x, _arrayfunction_or_api)
if I understand correctly)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary if we agree that these functions are private.
All testing in xarray and NamedArray will use a type checker and will fail if anything else than a duckarray is input.
Unnecessary isinstance checks adds up, I see them pop up a lot when profiling the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point: the only place we use it for at the moment is in the
__dask_*__
protocols where we can guarantee that whatever we pass is a duck array.However, the fact remains that without a type checker the function name is misleading: it will not check that whatever is passed is a duck array, and any future contribution will have to be aware of that quirk. So instead, I'd rather use
is_dask_collection
directly in the__dask_*__
protocols, because that makes the fact that we don't check for duck arrays again explicit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of quirks hidden in the pytests as well, future contributors will have to deal with those (and mypy) in the CI if they want something merged.
Just like we are weary of removing tests without motivation we should be weary of using
# type: ignore
orAny
without motivation. It's another tool I think we should use more internally to speed things up.is_duck_dask_array
andis_dask_collection
are typeguarded differently, so they have a little different flavour. Maybe using is_dask_collection only is good enough.