-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from *Indexer objects to .oindex
and .vindex
on explicityIndexed array classes
#8364
Conversation
def is_duck_array(value: Any) -> TypeGuard[T_DuckArray]: | ||
if isinstance(value, np.ndarray): | ||
return True | ||
return ( | ||
hasattr(value, "ndim") | ||
and hasattr(value, "shape") | ||
and hasattr(value, "dtype") | ||
and ( | ||
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__")) | ||
or hasattr(value, "__array_namespace__") | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_duck_array(value: Any) -> TypeGuard[T_DuckArray]: | |
if isinstance(value, np.ndarray): | |
return True | |
return ( | |
hasattr(value, "ndim") | |
and hasattr(value, "shape") | |
and hasattr(value, "dtype") | |
and ( | |
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__")) | |
or hasattr(value, "__array_namespace__") | |
) | |
) |
Not needed anymore. Use isinstance checks with _arrayfunction_or_api instead, for example:
if isinstance(data, _arrayfunction_or_api):
return NamedArray(dims, data, attrs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweet! thank you for the tip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Illviljan, i'm running in a strange error when working with _arrayfunction_or_api
check plus sparse data.
the following works fine
In [18]: isinstance(x, _arrayfunction_or_api)
Out[18]: True
In [19]: data = sparse.COO(np.array([1, 1, 0, 1, 1, 0]))
In [20]: isinstance(data, _arrayfunction_or_api)
Out[20]: True
however, when this fails:
In [21]: data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))
In [22]: isinstance(data, _arrayfunction_or_api)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 isinstance(data, _arrayfunction_or_api)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in _ProtocolMeta.__instancecheck__(cls, instance)
1989 return True
1990 if cls._is_protocol:
-> 1991 if all(hasattr(instance, attr) and
1992 # All *methods* can be blocked by setting them to None.
1993 (not callable(getattr(cls, attr, None)) or
1994 getattr(instance, attr) is not None)
1995 for attr in _get_protocol_attrs(cls)):
1996 return True
1997 return super().__instancecheck__(instance)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in <genexpr>(.0)
1989 return True
1990 if cls._is_protocol:
-> 1991 if all(hasattr(instance, attr) and
1992 # All *methods* can be blocked by setting them to None.
1993 (not callable(getattr(cls, attr, None)) or
1994 getattr(instance, attr) is not None)
1995 for attr in _get_protocol_attrs(cls)):
1996 return True
1997 return super().__instancecheck__(instance)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:884, in SparseArray.real(self)
859 @property
860 def real(self):
861 """The real part of the array.
862
863 Examples
(...)
882 numpy.real : NumPy equivalent function.
883 """
--> 884 return self.__array_ufunc__(np.real, "__call__", self)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:305, in SparseArray.__array_ufunc__(self, ufunc, method, *inputs, **kwargs)
302 inputs = tuple(reversed(inputs_transformed))
304 if method == "__call__":
--> 305 result = elemwise(ufunc, *inputs, **kwargs)
306 elif method == "reduce":
307 result = SparseArray._reduce(ufunc, *inputs, **kwargs)
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:49, in elemwise(func, *args, **kwargs)
12 def elemwise(func, *args, **kwargs):
13 """
14 Apply a function to any number of arguments.
15
(...)
46 it is necessary to convert Numpy arrays to :obj:`COO` objects.
47 """
---> 49 return _Elemwise(func, *args, **kwargs).get_result()
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:466, in _Elemwise.__init__(self, func, *args, **kwargs)
463 self._dense_result = False
465 self._check_broadcast()
--> 466 self._get_fill_value()
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:535, in _Elemwise._get_fill_value(self)
525 """
526 A function that finds and returns the fill-value.
527
(...)
531 If the fill-value is inconsistent.
532 """
533 from ._coo import COO
--> 535 zero_args = tuple(
536 arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
537 )
539 # Some elemwise functions require a dtype argument, some abhorr it.
540 try:
File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:536, in <genexpr>(.0)
525 """
526 A function that finds and returns the fill-value.
527
(...)
531 If the fill-value is inconsistent.
532 """
533 from ._coo import COO
535 zero_args = tuple(
--> 536 arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
537 )
539 # Some elemwise functions require a dtype argument, some abhorr it.
540 try:
TypeError: string indices must be integers, not 'ellipsis'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bug in sparse.
data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))
data.real
Traceback (most recent call last):
File "C:\Users\J.W\AppData\Local\Temp\ipykernel_17864\1521749299.py", line 1, in <cell line: 1>
data.real
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 884, in real
return self.__array_ufunc__(np.real, "__call__", self)
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 305, in __array_ufunc__
result = elemwise(ufunc, *inputs, **kwargs)
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 49, in elemwise
return _Elemwise(func, *args, **kwargs).get_result()
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 466, in __init__
self._get_fill_value()
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 535, in _get_fill_value
zero_args = tuple(
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 536, in <genexpr>
arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
TypeError: string indices must be integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i opened an issue here:
what could be a practical temporary solution? should we consider reverting to the is_duck_array check
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards commenting out .imag/.real until it's solved. That won't be a complete refactor I think.
xp.real/xp.imag will still be there for the very few early birds of NamedArray.
I gave it a good try using a typeguarded is_duck_array
but it's just terrible version of isinstance
, because it can't handle the else
-cases.
@@ -80,6 +78,54 @@ def to_0d_object_array( | |||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed in a different PR, but while looking at the changes I noticed that without involving a type checker is_duck_dask_array
returns the same result as is_dask_collection
. Could we also do a runtime check that x
is a duckarray? (isinstance(x, _arrayfunction_or_api)
if I understand correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary if we agree that these functions are private.
All testing in xarray and NamedArray will use a type checker and will fail if anything else than a duckarray is input.
Unnecessary isinstance checks adds up, I see them pop up a lot when profiling the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point: the only place we use it for at the moment is in the __dask_*__
protocols where we can guarantee that whatever we pass is a duck array.
However, the fact remains that without a type checker the function name is misleading: it will not check that whatever is passed is a duck array, and any future contribution will have to be aware of that quirk. So instead, I'd rather use is_dask_collection
directly in the __dask_*__
protocols, because that makes the fact that we don't check for duck arrays again explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of quirks hidden in the pytests as well, future contributors will have to deal with those (and mypy) in the CI if they want something merged.
Just like we are weary of removing tests without motivation we should be weary of using # type: ignore
or Any
without motivation. It's another tool I think we should use more internally to speed things up.
is_duck_dask_array
and is_dask_collection
are typeguarded differently, so they have a little different flavour. Maybe using is_dask_collection only is good enough.
whats-new.rst
api.rst