-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetadataError from ValueError: Could not convert object to NumPy datetime #201
Comments
@jsignell summoning you in case you have any thoughts / ideas here |
@thodson-usgs got a similar looking error in #203 (comment), but only on more recent versions of virtualizarr. There must be some kind of regression, which we should narrow down using |
I am taking a look. Are you sure you got the same error when you dropped the time component? I am seeing an s3 access issue when I do that (which I am taking to mean I made it passed the original error). from virtualizarr import open_virtual_dataset
vds = open_virtual_dataset(
's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-01.nc',
indexes={},
drop_variables=["Time"]
)
vds.virtualize.to_kerchunk("combined_no_t.json", format="json")
ds = xr.open_dataset('combined_no_t.json', engine="kerchunk") Show more output
---------------------------------------------------------------------------
NoCredentialsError Traceback (most recent call last) File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/fsspec/asyn.py:245, in _run_coros_in_chunks.<locals>._run_coro(coro, i)
244 try:
--> 245 return await asyncio.wait_for(coro, timeout=timeout), i
246 except Exception as e:
File ~/micromamba/envs/virtualizarr/lib/python3.12/asyncio/tasks.py:520, in wait_for(fut, timeout)
519 async with timeouts.timeout(timeout):
--> 520 return await fut
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:1125, in S3FileSystem._cat_file(self, path, version_id, start, end)
1123 resp["Body"].close()
-> 1125 return await _error_wrapper(_call_and_read, retries=self.retries)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:142, in _error_wrapper(func, args, kwargs, retries)
141 err = translate_boto_error(err)
--> 142 raise err
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:113, in _error_wrapper(func, args, kwargs, retries)
112 try:
--> 113 return await func(*args, **kwargs)
114 except S3_RETRYABLE_ERRORS as e:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:1112, in S3FileSystem._cat_file.<locals>._call_and_read()
1111 async def _call_and_read():
-> 1112 resp = await self._call_s3(
1113 "get_object",
1114 Bucket=bucket,
1115 Key=key,
1116 **version_id_kw(version_id or vers),
1117 **head,
1118 **self.req_kw,
1119 )
1120 try:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:362, in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs)
361 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs)
--> 362 return await _error_wrapper(
363 method, kwargs=additional_kwargs, retries=self.retries
364 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:142, in _error_wrapper(func, args, kwargs, retries)
141 err = translate_boto_error(err)
--> 142 raise err
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:113, in _error_wrapper(func, args, kwargs, retries)
112 try:
--> 113 return await func(*args, **kwargs)
114 except S3_RETRYABLE_ERRORS as e:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/client.py:388, in AioBaseClient._make_api_call(self, operation_name, api_params)
387 apply_request_checksum(request_dict)
--> 388 http, parsed_response = await self._make_request(
389 operation_model, request_dict, request_context
390 )
392 await self.meta.events.emit(
393 'after-call.{service_id}.{operation_name}'.format(
394 service_id=service_id, operation_name=operation_name
(...)
399 context=request_context,
400 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/client.py:416, in AioBaseClient._make_request(self, operation_model, request_dict, request_context)
415 try:
--> 416 return await self._endpoint.make_request(
417 operation_model, request_dict
418 )
419 except Exception as e:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/endpoint.py:98, in AioEndpoint._send_request(self, request_dict, operation_model)
97 self._update_retries_context(context, attempts)
---> 98 request = await self.create_request(request_dict, operation_model)
99 success_response, exception = await self._get_response(
100 request, operation_model, context
101 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/endpoint.py:86, in AioEndpoint.create_request(self, params, operation_model)
83 event_name = 'request-created.{service_id}.{op_name}'.format(
84 service_id=service_id, op_name=operation_model.name
85 )
---> 86 await self._event_emitter.emit(
87 event_name,
88 request=request,
89 operation_name=operation_model.name,
90 )
91 prepared_request = self.prepare_request(request)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/hooks.py:66, in AioHierarchicalEmitter._emit(self, event_name, kwargs, stop_on_response)
65 # Await the handler if its a coroutine.
---> 66 response = await resolve_awaitable(handler(**kwargs))
67 responses.append((handler, response))
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/_helpers.py:15, in resolve_awaitable(obj)
14 if inspect.isawaitable(obj):
---> 15 return await obj
17 return obj
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/signers.py:24, in AioRequestSigner.handler(self, operation_name, request, **kwargs)
19 async def handler(self, operation_name=None, request=None, **kwargs):
20 # This is typically hooked up to the "request-created" event
21 # from a client's event emitter. When a new request is created
22 # this method is invoked to sign the request.
23 # Don't call this method directly.
---> 24 return await self.sign(operation_name, request)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/signers.py:88, in AioRequestSigner.sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name)
86 raise e
---> 88 auth.add_auth(request)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/botocore/auth.py:418, in SigV4Auth.add_auth(self, request)
417 if self.credentials is None:
--> 418 raise NoCredentialsError()
419 datetime_now = datetime.datetime.utcnow()
NoCredentialsError: Unable to locate credentials
The above exception was the direct cause of the following exception:
ReferenceNotReachable Traceback (most recent call last)
Cell In[7], line 1
----> 1 ds = xr.open_dataset('combined_no_t.json', engine="kerchunk")
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
559 decoders = _resolve_decoders_kwargs(
560 decode_cf,
561 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
567 decode_coords=decode_coords,
568 )
570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
572 filename_or_obj,
573 drop_variables=drop_variables,
574 **decoders,
575 **kwargs,
576 )
577 ds = _dataset_from_backend_dataset(
578 backend_ds,
579 filename_or_obj,
(...)
589 **kwargs,
590 )
591 return ds
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/kerchunk/xarray_backend.py:12, in KerchunkBackend.open_dataset(self, filename_or_obj, storage_options, open_dataset_options, **kw)
8 def open_dataset(
9 self, filename_or_obj, *, storage_options=None, open_dataset_options=None, **kw
10 ):
11 open_dataset_options = (open_dataset_options or {}) | kw
---> 12 ref_ds = open_reference_dataset(
13 filename_or_obj,
14 storage_options=storage_options,
15 open_dataset_options=open_dataset_options,
16 )
17 return ref_ds
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/kerchunk/xarray_backend.py:46, in open_reference_dataset(filename_or_obj, storage_options, open_dataset_options)
42 open_dataset_options = {}
44 m = fsspec.get_mapper("reference://", fo=filename_or_obj, **storage_options)
---> 46 return xr.open_dataset(m, engine="zarr", consolidated=False, **open_dataset_options)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
559 decoders = _resolve_decoders_kwargs(
560 decode_cf,
561 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
567 decode_coords=decode_coords,
568 )
570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
572 filename_or_obj,
573 drop_variables=drop_variables,
574 **decoders,
575 **kwargs,
576 )
577 ds = _dataset_from_backend_dataset(
578 backend_ds,
579 filename_or_obj,
(...)
589 **kwargs,
590 )
591 return ds
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:1182, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version, store, engine)
1180 store_entrypoint = StoreBackendEntrypoint()
1181 with close_on_error(store):
-> 1182 ds = store_entrypoint.open_dataset(
1183 store,
1184 mask_and_scale=mask_and_scale,
1185 decode_times=decode_times,
1186 concat_characters=concat_characters,
1187 decode_coords=decode_coords,
1188 drop_variables=drop_variables,
1189 use_cftime=use_cftime,
1190 decode_timedelta=decode_timedelta,
1191 )
1192 return ds
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/store.py:58, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
44 encoding = filename_or_obj.get_encoding()
46 vars, attrs, coord_names = conventions.decode_cf_variables(
47 vars,
48 attrs,
(...)
55 decode_timedelta=decode_timedelta,
56 )
---> 58 ds = Dataset(vars, attrs=attrs)
59 ds = ds.set_coords(coord_names.intersection(vars))
60 ds.set_close(filename_or_obj.close)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/dataset.py:711, in Dataset.__init__(self, data_vars, coords, attrs)
708 if isinstance(coords, Dataset):
709 coords = coords._variables
--> 711 variables, coord_names, dims, indexes, _ = merge_data_and_coords(
712 data_vars, coords
713 )
715 self._attrs = dict(attrs) if attrs else None
716 self._close = None
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/dataset.py:425, in merge_data_and_coords(data_vars, coords)
421 coords = create_coords_with_default_indexes(coords, data_vars)
423 # exclude coords from alignment (all variables in a Coordinates object should
424 # already be aligned together) and use coordinates' indexes to align data_vars
--> 425 return merge_core(
426 [data_vars, coords],
427 compat="broadcast_equals",
428 join="outer",
429 explicit_coords=tuple(coords),
430 indexes=coords.xindexes,
431 priority_arg=1,
432 skip_align_args=[1],
433 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/merge.py:699, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args)
696 for pos, obj in skip_align_objs:
697 aligned.insert(pos, obj)
--> 699 collected = collect_variables_and_indexes(aligned, indexes=indexes)
700 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
701 variables, out_indexes = merge_collected(
702 collected, prioritized, compat=compat, combine_attrs=combine_attrs
703 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/merge.py:362, in collect_variables_and_indexes(list_of_mappings, indexes)
360 append(name, variable, indexes[name])
361 elif variable.dims == (name,):
--> 362 idx, idx_vars = create_default_index_implicit(variable)
363 append_all(idx_vars, {k: idx for k in idx_vars})
364 else:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:1404, in create_default_index_implicit(dim_variable, all_variables)
1402 else:
1403 dim_var = {name: dim_variable}
-> 1404 index = PandasIndex.from_variables(dim_var, options={})
1405 index_vars = index.create_variables(dim_var)
1407 return index, index_vars
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:654, in PandasIndex.from_variables(cls, variables, options)
651 if level is not None:
652 data = var._data.array.get_level_values(level)
--> 654 obj = cls(data, dim, coord_dtype=var.dtype)
655 assert not isinstance(obj.index, pd.MultiIndex)
656 # Rename safely
657 # make a shallow copy: cheap and because the index name may be updated
658 # here or in other constructors (cannot use pd.Index.rename as this
659 # constructor is also called from PandasMultiIndex)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:589, in PandasIndex.__init__(self, array, dim, coord_dtype, fastpath)
587 index = array
588 else:
--> 589 index = safe_cast_to_index(array)
591 if index.name is None:
592 # make a shallow copy: cheap and because the index name may be updated
593 # here or in other constructors (cannot use pd.Index.rename as this
594 # constructor is also called from PandasMultiIndex)
595 index = index.copy()
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:469, in safe_cast_to_index(array)
459 emit_user_level_warning(
460 (
461 "`pandas.Index` does not support the `float16` dtype."
(...)
465 category=DeprecationWarning,
466 )
467 kwargs["dtype"] = "float64"
--> 469 index = pd.Index(np.asarray(array), **kwargs)
471 return _maybe_cast_to_cftimeindex(index)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexing.py:509, in ExplicitlyIndexed.__array__(self, dtype)
507 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
508 # Leave casting to an array up to the underlying array type.
--> 509 return np.asarray(self.get_duck_array(), dtype=dtype)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/common.py:181, in BackendArray.get_duck_array(self, dtype)
179 def get_duck_array(self, dtype: np.typing.DTypeLike = None):
180 key = indexing.BasicIndexer((slice(None),) * self.ndim)
--> 181 return self[key]
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:104, in ZarrArrayWrapper.__getitem__(self, key)
102 elif isinstance(key, indexing.OuterIndexer):
103 method = self._oindex
--> 104 return indexing.explicit_indexing_adapter(
105 key, array.shape, indexing.IndexingSupport.VECTORIZED, method
106 )
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexing.py:1014, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
992 """Support explicit indexing by delegating to a raw indexing method.
993
994 Outer and/or vectorized indexers are supported by indexing a second time
(...)
1011 Indexing result, in the form of a duck numpy-array.
1012 """
1013 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1014 result = raw_indexing_method(raw_key.tuple)
1015 if numpy_indices.tuple:
1016 # index the loaded np.ndarray
1017 indexable = NumpyIndexingAdapter(result)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:94, in ZarrArrayWrapper._getitem(self, key)
93 def _getitem(self, key):
---> 94 return self._array[key]
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:800, in Array.__getitem__(self, selection)
798 result = self.get_orthogonal_selection(pure_selection, fields=fields)
799 else:
--> 800 result = self.get_basic_selection(pure_selection, fields=fields)
801 return result
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:926, in Array.get_basic_selection(self, selection, out, fields)
924 return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
925 else:
--> 926 return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:968, in Array._get_basic_selection_nd(self, selection, out, fields)
962 def _get_basic_selection_nd(self, selection, out=None, fields=None):
963 # implementation of basic selection for array with at least one dimension
964
965 # setup indexer
966 indexer = BasicIndexer(selection, self)
--> 968 return self._get_selection(indexer=indexer, out=out, fields=fields)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:1343, in Array._get_selection(self, indexer, out, fields)
1340 if math.prod(out_shape) > 0:
1341 # allow storage to get multiple items at once
1342 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1343 self._chunk_getitems(
1344 lchunk_coords,
1345 lchunk_selection,
1346 out,
1347 lout_selection,
1348 drop_axes=indexer.drop_axes,
1349 fields=fields,
1350 )
1351 if out.shape:
1352 return out
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:2177, in Array._chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
2175 if not isinstance(self._meta_array, np.ndarray):
2176 contexts = ConstantMap(ckeys, constant=Context(meta_array=self._meta_array))
-> 2177 cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
2179 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
2180 if ckey in cdatas:
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/storage.py:1435, in FSStore.getitems(self, keys, contexts)
1432 continue
1433 elif isinstance(v, Exception):
1434 # Raise any other exception
-> 1435 raise v
1436 else:
1437 # The function calling this method may not recognize the transformed
1438 # keys, so we send the values returned by self.map.getitems back into
1439 # the original key space.
1440 results[keys_transformed[k]] = v |
btw, |
Here's the bug: VirtualiZarr/virtualizarr/zarr.py Line 70 in 179bb2a
Reverting this line back to VirtualiZarr/virtualizarr/zarr.py Line 47 in 0ad4de5
causes my test to pass. I propose changing this to fill_value: FillValueT = Field(default=np.nan, validate_default=True) which also passes. |
AFAICT, In [28]: np.array([0.0], dtype=np.dtype("datetime64[ns]"))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[28], line 1
----> 1 np.array([0.0], dtype=np.dtype("datetime64[ns]")) Called via zarr.v2.meta.Metadata2.decode_fill_value(np.nan, np.dtype("datetime64[ns]")) But that line fails with a fill value of |
Thanks @TomAugspurger, I put an example back on #206. These might indeed be the same issue, but I want to be careful about crossing streams here. |
I'm trying to debug @thodson-usgs's example from cubed-dev/cubed#520 (and originally #197).
He is doing a whole serverless reduction of virtual references to multiple files (!!! - relevant to #123), but there seem to be some more basic errors to be fixed first.
Specifically, if I try to use virtualizarr on just one of his files this happens:
At first I assumed there was something wrong with our handling of the loaded
cftime_variables
, but actually even if I drop the'Time'
variable I still get exactly the same error:I don't know why it's even trying to convert anything to a datetime - none of the other variables have units of time.
What's also weird is that this is raised from within
meta.py:260, in Metadata2.decode_fill_value(cls, v, dtype, object_codec)
, which suggests a problem with thefill_value
. But I checked and all of the variables in this virtual dataset have a fill_value of either a float ornan
in their.encoding
, again nothing about a datetime.The text was updated successfully, but these errors were encountered: