Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from *Indexer objects to .oindex and .vindex on explicityIndexed array classes #8364

Closed
wants to merge 7 commits into from

Conversation

andersy005
Copy link
Member

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@github-actions github-actions bot added the topic-NamedArray Lightweight version of Variable label Oct 23, 2023
Comment on lines +81 to +92
def is_duck_array(value: Any) -> TypeGuard[T_DuckArray]:
if isinstance(value, np.ndarray):
return True
return (
hasattr(value, "ndim")
and hasattr(value, "shape")
and hasattr(value, "dtype")
and (
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__"))
or hasattr(value, "__array_namespace__")
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_duck_array(value: Any) -> TypeGuard[T_DuckArray]:
if isinstance(value, np.ndarray):
return True
return (
hasattr(value, "ndim")
and hasattr(value, "shape")
and hasattr(value, "dtype")
and (
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__"))
or hasattr(value, "__array_namespace__")
)
)

Not needed anymore. Use isinstance checks with _arrayfunction_or_api instead, for example:

    if isinstance(data, _arrayfunction_or_api):
        return NamedArray(dims, data, attrs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet! thank you for the tip

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Illviljan, i'm running in a strange error when working with _arrayfunction_or_api check plus sparse data.

the following works fine

In [18]: isinstance(x, _arrayfunction_or_api)
Out[18]: True

In [19]: data = sparse.COO(np.array([1, 1, 0, 1, 1, 0]))

In [20]: isinstance(data, _arrayfunction_or_api)
Out[20]: True

however, when this fails:

In [21]: data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))

In [22]: isinstance(data, _arrayfunction_or_api)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 isinstance(data, _arrayfunction_or_api)

File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in _ProtocolMeta.__instancecheck__(cls, instance)
   1989     return True
   1990 if cls._is_protocol:
-> 1991     if all(hasattr(instance, attr) and
   1992             # All *methods* can be blocked by setting them to None.
   1993             (not callable(getattr(cls, attr, None)) or
   1994              getattr(instance, attr) is not None)
   1995             for attr in _get_protocol_attrs(cls)):
   1996         return True
   1997 return super().__instancecheck__(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.11/typing.py:1991, in <genexpr>(.0)
   1989     return True
   1990 if cls._is_protocol:
-> 1991     if all(hasattr(instance, attr) and
   1992             # All *methods* can be blocked by setting them to None.
   1993             (not callable(getattr(cls, attr, None)) or
   1994              getattr(instance, attr) is not None)
   1995             for attr in _get_protocol_attrs(cls)):
   1996         return True
   1997 return super().__instancecheck__(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:884, in SparseArray.real(self)
    859 @property
    860 def real(self):
    861     """The real part of the array.
    862 
    863     Examples
   (...)
    882     numpy.real : NumPy equivalent function.
    883     """
--> 884     return self.__array_ufunc__(np.real, "__call__", self)

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_sparse_array.py:305, in SparseArray.__array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    302     inputs = tuple(reversed(inputs_transformed))
    304 if method == "__call__":
--> 305     result = elemwise(ufunc, *inputs, **kwargs)
    306 elif method == "reduce":
    307     result = SparseArray._reduce(ufunc, *inputs, **kwargs)

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:49, in elemwise(func, *args, **kwargs)
     12 def elemwise(func, *args, **kwargs):
     13     """
     14     Apply a function to any number of arguments.
     15 
   (...)
     46     it is necessary to convert Numpy arrays to :obj:`COO` objects.
     47     """
---> 49     return _Elemwise(func, *args, **kwargs).get_result()

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:466, in _Elemwise.__init__(self, func, *args, **kwargs)
    463 self._dense_result = False
    465 self._check_broadcast()
--> 466 self._get_fill_value()

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:535, in _Elemwise._get_fill_value(self)
    525 """
    526 A function that finds and returns the fill-value.
    527 
   (...)
    531     If the fill-value is inconsistent.
    532 """
    533 from ._coo import COO
--> 535 zero_args = tuple(
    536     arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
    537 )
    539 # Some elemwise functions require a dtype argument, some abhorr it.
    540 try:

File ~/mambaforge/envs/xarray-tests/lib/python3.11/site-packages/sparse/_umath.py:536, in <genexpr>(.0)
    525 """
    526 A function that finds and returns the fill-value.
    527 
   (...)
    531     If the fill-value is inconsistent.
    532 """
    533 from ._coo import COO
    535 zero_args = tuple(
--> 536     arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args
    537 )
    539 # Some elemwise functions require a dtype argument, some abhorr it.
    540 try:

TypeError: string indices must be integers, not 'ellipsis'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug in sparse.

data = sparse.COO(np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']))
data.real
Traceback (most recent call last):

  File "C:\Users\J.W\AppData\Local\Temp\ipykernel_17864\1521749299.py", line 1, in <cell line: 1>
    data.real

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 884, in real
    return self.__array_ufunc__(np.real, "__call__", self)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_sparse_array.py", line 305, in __array_ufunc__
    result = elemwise(ufunc, *inputs, **kwargs)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 49, in elemwise
    return _Elemwise(func, *args, **kwargs).get_result()

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 466, in __init__
    self._get_fill_value()

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 535, in _get_fill_value
    zero_args = tuple(

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\sparse\_umath.py", line 536, in <genexpr>
    arg.fill_value[...] if isinstance(arg, COO) else arg for arg in self.args

TypeError: string indices must be integers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i opened an issue here:

what could be a practical temporary solution? should we consider reverting to the is_duck_array check?

Copy link
Contributor

@Illviljan Illviljan Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards commenting out .imag/.real until it's solved. That won't be a complete refactor I think.
xp.real/xp.imag will still be there for the very few early birds of NamedArray.

I gave it a good try using a typeguarded is_duck_array but it's just terrible version of isinstance, because it can't handle the else-cases.

@@ -80,6 +78,54 @@ def to_0d_object_array(
return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed in a different PR, but while looking at the changes I noticed that without involving a type checker is_duck_dask_array returns the same result as is_dask_collection. Could we also do a runtime check that x is a duckarray? (isinstance(x, _arrayfunction_or_api) if I understand correctly)

Copy link
Contributor

@Illviljan Illviljan Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary if we agree that these functions are private.
All testing in xarray and NamedArray will use a type checker and will fail if anything else than a duckarray is input.

Unnecessary isinstance checks adds up, I see them pop up a lot when profiling the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point: the only place we use it for at the moment is in the __dask_*__ protocols where we can guarantee that whatever we pass is a duck array.

However, the fact remains that without a type checker the function name is misleading: it will not check that whatever is passed is a duck array, and any future contribution will have to be aware of that quirk. So instead, I'd rather use is_dask_collection directly in the __dask_*__ protocols, because that makes the fact that we don't check for duck arrays again explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of quirks hidden in the pytests as well, future contributors will have to deal with those (and mypy) in the CI if they want something merged.
Just like we are weary of removing tests without motivation we should be weary of using # type: ignore or Any without motivation. It's another tool I think we should use more internally to speed things up.

is_duck_dask_array and is_dask_collection are typeguarded differently, so they have a little different flavour. Maybe using is_dask_collection only is good enough.

@andersy005
Copy link
Member Author

@andersy005 andersy005 closed this Feb 14, 2024
@andersy005 andersy005 deleted the namedarray-indexing branch March 15, 2024 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-NamedArray Lightweight version of Variable
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants