Releases · rapidsai/cudf

23 Sep 16:41

rapids-bot

v24.08.00a

f5d1c24

[NIGHTLY] v24.08.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Improve Polars docs (#16820) @bdice
Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Creation of CI artifacts for cudf-polars wheels (#16680) @wence-
Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @ja...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

01 May 12:29

raydouglass

v24.04.01

af9fd84

v24.04.01

🚨 Breaking Changes

Restructure pylibcudf/arrow interop facilities (#15325) @vyasr
Change exceptions thrown by copying APIs (#15319) @vyasr
Change strings_column_view::char_size to return int64 (#15197) @davidwendt
Upgrade to arrow-14.0.2 (#15108) @galipremsagar
Add support for pandas-2.2 in cudf (#15100) @galipremsagar
Deprecate cudf::hashing::spark_murmurhash3_x86_32 (#15074) @davidwendt
Align MultiIndex.get_indexder with pandas 2.2 change (#15059) @mroeschke
Raise an error on import for unsupported GPUs. (#15053) @bdice
Deprecate datelike isin casting strings to dates to match pandas 2.2 (#15046) @mroeschke
Align concat Series name behavior in pandas 2.2 (#15032) @mroeschke
Add future_stack to DataFrame.stack (#15015) @galipremsagar
Deprecate groupby fillna (#15000) @mroeschke
Deprecate replace with categorical columns (#14988) @mroeschke
Deprecate delim_whitespace in read_csv for pandas 2.2 (#14986) @mroeschke
Deprecate parameters similar to pandas 2.2 (#14984) @mroeschke
Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. (#14962) @bdice
Add pandas-2.x support in cudf (#14916) @galipremsagar
Use cuco::static_set in the hash-based groupby (#14813) @PointKernel

🐛 Bug Fixes

Fix an issue with creating a series from scalar when dtype='category' (#15476) @galipremsagar
Update pre-commit-hooks to v0.0.3 (#15355) @KyleFromNVIDIA
[BUG][JNI] Trigger MemoryBuffer.onClosed after memory is freed (#15351) @abellina
Fix an issue with multiple short list rowgroups using the Parquet chunked reader. (#15342) @nvdbaranec
Avoid importing dask-expr if "query-planning" config is False (#15340) @rjzamora
Fix gtests/ERROR_TEST errors when run in Debug (#15317) @davidwendt
Fix OOB read in inflate_kernel (#15309) @vuule
Work around a cuFile error when running CSV tests with memcheck (#15293) @vuule
Fix Doxygen upload directory (#15291) @KyleFromNVIDIA
Fix Doxygen check (#15289) @KyleFromNVIDIA
Reintroduce PANDAS_GE_220 import (#15287) @wence-
Fix mean computation for the geometric distribution in the data generator (#15282) @vuule
Fix Parquet decimal64 stats (#15281) @etseidl
Make linking of nvtx3-cpp BUILD_LOCAL_INTERFACE (#15271) @KyleFromNVIDIA
Workaround compute-sanitizer memcheck bug (#15259) @davidwendt
Cleanup hostdevice_vector and add more APIs (#15252) @ttnghia
Fix number of rows in randomly generated lists columns (#15248) @vuule
Fix wrong output for collect_list/collect_set of lists column (#15243) @ttnghia
Fix testchunkedPackTwoPasses to copy from the bounce buffer (#15220) @abellina
Fix accessing .columns by an external API (#15212) @galipremsagar
[JNI] Disable testChunkedPackTwoPasses for now (#15210) @abellina
Update labeler and codeowner configs for CMake files (#15208) @PointKernel
Avoid dict normalization in __dask_tokenize__ (#15187) @rjzamora
Fix memcheck error in distinct inner join (#15164) @PointKernel
Remove unneeded script parameters in test_cpp_memcheck.sh (#15158) @davidwendt
Fix ListColumn.to_pandas() to retain list type (#15155) @galipremsagar
Avoid factorization in MultiIndex.to_pandas (#15150) @mroeschke
Fix GroupBy.get_group and GroupBy.indices (#15143) @wence-
Remove const from range_window_bounds::_extent. (#15138) @mythrocks
DataFrame.columns = ... retains RangeIndex & set dtype (#15129) @mroeschke
Correctly handle output for GroupBy.apply when chunk results are reindexed series (#15109) @brandon-b-miller
Fix Series.groupby.shift with a MultiIndex (#15098) @mroeschke
Fix reductions when DataFrame has MulitIndex columns (#15097) @mroeschke
Fix deprecation warnings for deprecated hash() calls (#15095) @davidwendt
Add support for arrow large_string in cudf (#15093) @galipremsagar
Fix sort_values pytest failure with pandas-2.x regression (#15092) @galipremsagar
Resolve path parsing issues in get_json_object (#15082) @SurajAralihalli
Fix bugs in handling of delta encodings (#15075) @etseidl
Fix is_device_write_preferred in void_sink and user_sink_wrapper (#15064) @vuule
Eliminate duplicate allocation of nested string columns (#15061) @vuule
Raise an error on import for unsupported GPUs. (#15053) @bdice
Align concat Series name behavior in pandas 2.2 (#15032) @mroeschke
Fix Index.difference to handle duplicate values when one of the inputs is empty (#15016) @galipremsagar
Add future_stack to DataFrame.stack (#15015) @galipremsagar
Fix handling of values=None in pylibcudf GroupBy.get_groups (#14998) @shwina
Fix DataFrame.sort_index to respect ignore_index on all axis (#14995) @galipremsagar
Raise for pyarrow array that is tz-aware (#14980) @mroeschke
Direct SeriesGroupBy.aggregate to SeriesGroupBy.agg (#14971) @rjzamora
Respect IntervalDtype and CategoricalDtype objects passed by users (#14961) @mroeschke
unset CUDF_SPILL after a pytest (#14958) @galipremsagar
Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader (#14939) @karthikeyann
Fix chunked reads of Parquet delta encoded pages (#14921) @etseidl
Fix reading offset for data stream in ORC reader (#14911) @ttnghia
Enable sanitizer check for a test case testORCReadAndWriteForDecimal128 (#14897) @res-life
Fix dask token normalization (#14829) @rjzamora
Fix 24.04 versions (#14825) @raydouglass
Ensure slow private attrs are maybe proxies (#14380) @mroeschke

📖 Documentation

Ignore DLManagedTensor in the docs build (#15392) @davidwendt
Revert "Temporarily disable docs errors. (#15265)" (#15269) @bdice
Temporarily disable docs errors. (#15265) @bdice
Update developer_guide.md with new guidance on quoted internal includes (#15238) @harrism
Fix broken link for developer guide (#15025) @sanjana098
[DOC] Update typo in docs example of structs_column_wrapper (#14949) @karthikeyann
Update cudf.pandas FAQ. (#14940) @bdice
Optimize doc builds (#14856) @vyasr
Add developer guideline to use east const. (#14836) @bdice
Document how cuDF is pronounced (#14753) @pentschev
Notes convert to Pandas-compat (#12641) @Touutae-lab

🚀 New Features

Address inconsistency in single quote normalization in JSON reader (#15324) @shrshi
Use JNI pinned pool resource with cuIO (#15255) @abellina
Add DELTA_BYTE_ARRAY encoder for Parquet (#15239) @etseidl
Migrate filling operations to pylibcudf (#15225) @brandon-b-miller
[JNI] rmm based pinned pool (#15219) @abellina
Implement zero-copy host buffer source instead of using an arrow implementation (#15189) @vuule
Enable creation of columns from scalar (#15181) @vyasr
Use NVTX from GitHub. (#15178) @bdice
Implement segmented_row_bit_count for computing row sizes by segments of rows (#15169) @ttnghia
Implement search using pylibcudf (#15166) @vyasr
Add distinct left join (#15149) @PointKernel
Add cardinality control for groupby benchs with flat types (#15134) @PointKernel
Add ability to request Parquet encodings on a per-column basis (#15081) @etseidl
Automate include grouping order in .clang-format (#15063) @harrism
Requesting a clean build directory also clears Jitify cache (#15052) @robertmaynard
API for JSON unquoted whitespace normalization (#15033) @shrshi
Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf (#15011) @vyasr
Implement replace in pylibcudf (#15005) @vyasr
Add distinct key inner join (#14990) @PointKernel
Implement rolling in pylibcudf (#14982) @vyasr
Implement joins in pylibcudf (#14972) @vyasr
Implement scans and reductions in pylibcudf (#14970) @vyasr
Rewrite cudf internals using pylibcudf groupby (#14946) @vyasr
Implement groupby in pylibcudf (#14945) @vyasr
Support casting of Map type to string in JSON reader (#14936) @karthikeyann
POC for whitespace removal in input JSON data using FST (#14931) @shrshi
Support for LZ4 compression in ORC and Parquet (#14906) @vuule
Remove supports_streams from cuDF custom memory resources. (#14857) @harrism
Migrate unary operations to pylibcudf (#14850) @vyasr
Migrate binary operations to pylibcudf (#14821) @vyasr
Add row index and stripe size options to Python ORC chunked writer (#14785) @vuule
Support CUDA 12.2 (#14712) @jameslamb

🛠️ Improvements

Backport: Relax protobuf lower bound to 3.20. (#15506) (#15610) @bdice
Use conda env create --yes instead of --force (#15403) @bdice
Restructure pylibcudf/arrow interop facilities (#15325) @vyasr
Change exceptions thrown by copying APIs (#15319) @vyasr
Enable branch testing for cudf.pandas (#15316) @galipremsagar
Replace black with ruff-format (#15312) @mroeschke
This fixes an NPE when trying to read empty JSON data by adding a new API for missing information (#15307) @revans2
Address poor performance of Parquet string decoding (#15304) @etseidl
Update script input name (#15301) @AyodeAwe
Make test_read_parquet_partitioned_filtered data deterministic (#15296) @mroeschke
Add timeout for cudf.pandas pandas tests (#15284) @galipremsagar
Add upper bound to prevent usage of NumPy 2 (#15283) @bdice
Fix cudf::test::to_host return of host_vector (#15263) @davidwendt
Implement grouped product scan (#15254) @wence-
Add CUDA 12.4 to supported PTX versions (#15247) @brandon-b-miller
Implement DataFrame|Series.squeeze (#15244) @mroeschke
Roll back ipow changes due to register pressure. (#15242) @pmattione-nvidia
Remove create_chars_child_column utility (#15241) @davidwendt
Update dlpack to version 0.8 (#15237) @dantegd
Improve performance in JSON reader when mixed_types_as_string option is enabled (#15236) @shrshi
Remove row conversion code from libcudf (#15234) @ttnghia
Use variable substitution for RAPIDS version in Doxyfile (#15231) @KyleFromNVIDIA
Add ListColumns.to_pandas(arrow_type=) (#15228) @mroeSC...

Contributors

trxcllnt, robertmaynard, and 36 other contributors

Assets 2

10 Apr 15:05

raydouglass

v24.04.00

578c240

v24.04.00

🚨 Breaking Changes

Restructure pylibcudf/arrow interop facilities (#15325) @vyasr
Change exceptions thrown by copying APIs (#15319) @vyasr
Change strings_column_view::char_size to return int64 (#15197) @davidwendt
Upgrade to arrow-14.0.2 (#15108) @galipremsagar
Add support for pandas-2.2 in cudf (#15100) @galipremsagar
Deprecate cudf::hashing::spark_murmurhash3_x86_32 (#15074) @davidwendt
Align MultiIndex.get_indexder with pandas 2.2 change (#15059) @mroeschke
Raise an error on import for unsupported GPUs. (#15053) @bdice
Deprecate datelike isin casting strings to dates to match pandas 2.2 (#15046) @mroeschke
Align concat Series name behavior in pandas 2.2 (#15032) @mroeschke
Add future_stack to DataFrame.stack (#15015) @galipremsagar
Deprecate groupby fillna (#15000) @mroeschke
Deprecate replace with categorical columns (#14988) @mroeschke
Deprecate delim_whitespace in read_csv for pandas 2.2 (#14986) @mroeschke
Deprecate parameters similar to pandas 2.2 (#14984) @mroeschke
Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. (#14962) @bdice
Add pandas-2.x support in cudf (#14916) @galipremsagar
Use cuco::static_set in the hash-based groupby (#14813) @PointKernel

🐛 Bug Fixes

Fix an issue with creating a series from scalar when dtype='category' (#15476) @galipremsagar
Update pre-commit-hooks to v0.0.3 (#15355) @KyleFromNVIDIA
[BUG][JNI] Trigger MemoryBuffer.onClosed after memory is freed (#15351) @abellina
Fix an issue with multiple short list rowgroups using the Parquet chunked reader. (#15342) @nvdbaranec
Avoid importing dask-expr if "query-planning" config is False (#15340) @rjzamora
Fix gtests/ERROR_TEST errors when run in Debug (#15317) @davidwendt
Fix OOB read in inflate_kernel (#15309) @vuule
Work around a cuFile error when running CSV tests with memcheck (#15293) @vuule
Fix Doxygen upload directory (#15291) @KyleFromNVIDIA
Fix Doxygen check (#15289) @KyleFromNVIDIA
Reintroduce PANDAS_GE_220 import (#15287) @wence-
Fix mean computation for the geometric distribution in the data generator (#15282) @vuule
Fix Parquet decimal64 stats (#15281) @etseidl
Make linking of nvtx3-cpp BUILD_LOCAL_INTERFACE (#15271) @KyleFromNVIDIA
Workaround compute-sanitizer memcheck bug (#15259) @davidwendt
Cleanup hostdevice_vector and add more APIs (#15252) @ttnghia
Fix number of rows in randomly generated lists columns (#15248) @vuule
Fix wrong output for collect_list/collect_set of lists column (#15243) @ttnghia
Fix testchunkedPackTwoPasses to copy from the bounce buffer (#15220) @abellina
Fix accessing .columns by an external API (#15212) @galipremsagar
[JNI] Disable testChunkedPackTwoPasses for now (#15210) @abellina
Update labeler and codeowner configs for CMake files (#15208) @PointKernel
Avoid dict normalization in __dask_tokenize__ (#15187) @rjzamora
Fix memcheck error in distinct inner join (#15164) @PointKernel
Remove unneeded script parameters in test_cpp_memcheck.sh (#15158) @davidwendt
Fix ListColumn.to_pandas() to retain list type (#15155) @galipremsagar
Avoid factorization in MultiIndex.to_pandas (#15150) @mroeschke
Fix GroupBy.get_group and GroupBy.indices (#15143) @wence-
Remove const from range_window_bounds::_extent. (#15138) @mythrocks
DataFrame.columns = ... retains RangeIndex & set dtype (#15129) @mroeschke
Correctly handle output for GroupBy.apply when chunk results are reindexed series (#15109) @brandon-b-miller
Fix Series.groupby.shift with a MultiIndex (#15098) @mroeschke
Fix reductions when DataFrame has MulitIndex columns (#15097) @mroeschke
Fix deprecation warnings for deprecated hash() calls (#15095) @davidwendt
Add support for arrow large_string in cudf (#15093) @galipremsagar
Fix sort_values pytest failure with pandas-2.x regression (#15092) @galipremsagar
Resolve path parsing issues in get_json_object (#15082) @SurajAralihalli
Fix bugs in handling of delta encodings (#15075) @etseidl
Fix is_device_write_preferred in void_sink and user_sink_wrapper (#15064) @vuule
Eliminate duplicate allocation of nested string columns (#15061) @vuule
Raise an error on import for unsupported GPUs. (#15053) @bdice
Align concat Series name behavior in pandas 2.2 (#15032) @mroeschke
Fix Index.difference to handle duplicate values when one of the inputs is empty (#15016) @galipremsagar
Add future_stack to DataFrame.stack (#15015) @galipremsagar
Fix handling of values=None in pylibcudf GroupBy.get_groups (#14998) @shwina
Fix DataFrame.sort_index to respect ignore_index on all axis (#14995) @galipremsagar
Raise for pyarrow array that is tz-aware (#14980) @mroeschke
Direct SeriesGroupBy.aggregate to SeriesGroupBy.agg (#14971) @rjzamora
Respect IntervalDtype and CategoricalDtype objects passed by users (#14961) @mroeschke
unset CUDF_SPILL after a pytest (#14958) @galipremsagar
Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader (#14939) @karthikeyann
Fix chunked reads of Parquet delta encoded pages (#14921) @etseidl
Fix reading offset for data stream in ORC reader (#14911) @ttnghia
Enable sanitizer check for a test case testORCReadAndWriteForDecimal128 (#14897) @res-life
Fix dask token normalization (#14829) @rjzamora
Fix 24.04 versions (#14825) @raydouglass
Ensure slow private attrs are maybe proxies (#14380) @mroeschke

📖 Documentation

Ignore DLManagedTensor in the docs build (#15392) @davidwendt
Revert "Temporarily disable docs errors. (#15265)" (#15269) @bdice
Temporarily disable docs errors. (#15265) @bdice
Update developer_guide.md with new guidance on quoted internal includes (#15238) @harrism
Fix broken link for developer guide (#15025) @sanjana098
[DOC] Update typo in docs example of structs_column_wrapper (#14949) @karthikeyann
Update cudf.pandas FAQ. (#14940) @bdice
Optimize doc builds (#14856) @vyasr
Add developer guideline to use east const. (#14836) @bdice
Document how cuDF is pronounced (#14753) @pentschev
Notes convert to Pandas-compat (#12641) @Touutae-lab

🚀 New Features

Address inconsistency in single quote normalization in JSON reader (#15324) @shrshi
Use JNI pinned pool resource with cuIO (#15255) @abellina
Add DELTA_BYTE_ARRAY encoder for Parquet (#15239) @etseidl
Migrate filling operations to pylibcudf (#15225) @brandon-b-miller
[JNI] rmm based pinned pool (#15219) @abellina
Implement zero-copy host buffer source instead of using an arrow implementation (#15189) @vuule
Enable creation of columns from scalar (#15181) @vyasr
Use NVTX from GitHub. (#15178) @bdice
Implement segmented_row_bit_count for computing row sizes by segments of rows (#15169) @ttnghia
Implement search using pylibcudf (#15166) @vyasr
Add distinct left join (#15149) @PointKernel
Add cardinality control for groupby benchs with flat types (#15134) @PointKernel
Add ability to request Parquet encodings on a per-column basis (#15081) @etseidl
Automate include grouping order in .clang-format (#15063) @harrism
Requesting a clean build directory also clears Jitify cache (#15052) @robertmaynard
API for JSON unquoted whitespace normalization (#15033) @shrshi
Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf (#15011) @vyasr
Implement replace in pylibcudf (#15005) @vyasr
Add distinct key inner join (#14990) @PointKernel
Implement rolling in pylibcudf (#14982) @vyasr
Implement joins in pylibcudf (#14972) @vyasr
Implement scans and reductions in pylibcudf (#14970) @vyasr
Rewrite cudf internals using pylibcudf groupby (#14946) @vyasr
Implement groupby in pylibcudf (#14945) @vyasr
Support casting of Map type to string in JSON reader (#14936) @karthikeyann
POC for whitespace removal in input JSON data using FST (#14931) @shrshi
Support for LZ4 compression in ORC and Parquet (#14906) @vuule
Remove supports_streams from cuDF custom memory resources. (#14857) @harrism
Migrate unary operations to pylibcudf (#14850) @vyasr
Migrate binary operations to pylibcudf (#14821) @vyasr
Add row index and stripe size options to Python ORC chunked writer (#14785) @vuule
Support CUDA 12.2 (#14712) @jameslamb

🛠️ Improvements

Use conda env create --yes instead of --force (#15403) @bdice
Restructure pylibcudf/arrow interop facilities (#15325) @vyasr
Change exceptions thrown by copying APIs (#15319) @vyasr
Enable branch testing for cudf.pandas (#15316) @galipremsagar
Replace black with ruff-format (#15312) @mroeschke
This fixes an NPE when trying to read empty JSON data by adding a new API for missing information (#15307) @revans2
Address poor performance of Parquet string decoding (#15304) @etseidl
Update script input name (#15301) @AyodeAwe
Make test_read_parquet_partitioned_filtered data deterministic (#15296) @mroeschke
Add timeout for cudf.pandas pandas tests (#15284) @galipremsagar
Add upper bound to prevent usage of NumPy 2 (#15283) @bdice
Fix cudf::test::to_host return of host_vector (#15263) @davidwendt
Implement grouped product scan (#15254) @wence-
Add CUDA 12.4 to supported PTX versions (#15247) @brandon-b-miller
Implement DataFrame|Series.squeeze (#15244) @mroeschke
Roll back ipow changes due to register pressure. (#15242) @pmattione-nvidia
Remove create_chars_child_column utility (#15241) @davidwendt
Update dlpack to version 0.8 (#15237) @dantegd
Improve performance in JSON reader when mixed_types_as_string option is enabled (#15236) @shrshi
Remove row conversion code from libcudf (#15234) @ttnghia
Use variable substitution for RAPIDS version in Doxyfile (#15231) @KyleFromNVIDIA
Add ListColumns.to_pandas(arrow_type=) (#15228) @mroeschke
Treat dask-cudf CI artifacts as pure wheels (#15223) @bdice
Clean...

Contributors

trxcllnt, robertmaynard, and 36 other contributors

Assets 2

27 Feb 15:12

raydouglass

v24.02.02

dd34fdb

v24.02.02

🚨 Breaking Changes

Remove **kwargs from astype (#14765) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Drop Pascal GPU support. (#14630) @bdice
Update to CCCL 2.2.0. (#14576) @bdice
Expunge as_frame conversions in Column algorithms (#14491) @wence-
Deprecate cudf::make_strings_column accepting typed offsets (#14461) @davidwendt
Remove deprecated nvtext::load_merge_pairs_file (#14460) @davidwendt
Include writer code and writerVersion in ORC files (#14458) @vuule
Remove null mask for zero nulls in json readers (#14451) @karthikeyann
REF: Remove **kwargs from to_pandas, raise if nullable is not implemented (#14438) @mroeschke
Consolidate 1D pandas object handling in as_column (#14394) @mroeschke
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
Switch to scikit-build-core (#13531) @vyasr

🐛 Bug Fixes

Bump to nvcomp 3.0.6. (#15128) @bdice
[HOTFIX] Unpin numba<0.58 (#15031) @raydouglass
Exclude tests from builds (#14981) @vyasr
Fix the bounce buffer size in ORC writer (#14947) @vuule
Revert sum/product aggregation to always produce int64_t type (#14907) @SurajAralihalli
Fixed an issue with output chunking computation stemming from input chunking. (#14889) @nvdbaranec
Fix total_byte_size in Parquet row group metadata (#14802) @etseidl
Fix index difference to follow the pandas format (#14789) @amiralimi
Fix shared-workflows repo name (#14784) @raydouglass
Remove unparseable attributes from all nodes (#14780) @vyasr
Refactor and add validation to IntervalIndex.init (#14778) @mroeschke
Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer (#14772) @etseidl
Fix calls to deprecated strings factory API (#14771) @davidwendt
Fix ptx file discovery in editable installs (#14767) @vyasr
Revise shuffle deprecation to align with dask/dask (#14762) @rjzamora
Enable intermediate proxies to be picklable (#14752) @shwina
Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it (#14751) @etseidl
Fix CMake args (#14746) @vyasr
Fix logic bug introduced in #14730 (#14742) @wence-
[Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds (#14731) @razajafri
Fix Groupby.get_group (#14728) @rjzamora
Ensure that all CUDA kernels in cudf have hidden visibility. (#14726) @robertmaynard
Split cuda versions for notebook testing (#14722) @raydouglass
Fix to_numeric not preserving Series index and name (#14718) @mroeschke
Update dask-cudf wheel name (#14713) @raydouglass
Fix strings::contains matching end of string target (#14711) @davidwendt
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Write file-level statistics when writing ORC files with zero rows (#14707) @vuule
Potential fix for peformance regression in #14415 (#14706) @etseidl
Ensure DataFrame column types are preserved during serialization (#14705) @mroeschke
Skip numba test that fails on ARM (#14702) @brandon-b-miller
Allow Z in datetime string parsing in non pandas compat mode (#14701) @mroeschke
Fix nan_as_null not being respected when passing arrow object (#14688) @mroeschke
Fix constructing Series/Index from arrow array and dtype (#14686) @mroeschke
Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply (#14679) @SurajAralihalli
Add BaseOffset as a final proxy type to pass instancechecks for offsets against BaseOffset (#14678) @shwina
Add row conversion code from spark-rapids-jni (#14664) @ttnghia
Unconditionally export the CCCL path (#14656) @vyasr
Ensure libcudf searches for our patched version of CCCL first (#14655) @robertmaynard
Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink (#14648) @vyasr
Fix invalid memory access in Parquet reader (#14637) @etseidl
Use column_empty over as_column([]) (#14632) @mroeschke
Add (implicit) handling for torch tensors in is_scalar (#14623) @wence-
Fix astype/fillna not maintaining column subclass and types (#14615) @mroeschke
Remove non-empty nulls in cudf::get_json_object (#14609) @davidwendt
Remove cuda::proclaim_return_type from nested lambda (#14607) @ttnghia
Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex (#14605) @mroeschke
Address potential race conditions in Parquet reader (#14602) @etseidl
Fix DataFrame.reindex removing column name (#14601) @mroeschke
Remove unsanitized input test data from copy gtests (#14600) @davidwendt
Fix race detected in Parquet writer (#14598) @etseidl
Correct invalid or missing return types (#14587) @robertmaynard
Fix unsanitized nulls from strings segmented-reduce (#14586) @davidwendt
Upgrade to nvCOMP 3.0.5 (#14581) @davidwendt
Fix unsanitized nulls produced by cudf::clamp APIs (#14580) @davidwendt
Fix unsanitized nulls produced by libcudf dictionary decode (#14578) @davidwendt
Fixes a symbol group lookup table issue (#14561) @elstehle
Drop llvm16 from cuda118-conda devcontainer image (#14526) @charlesbluca
REF: Make DataFrame.from_pandas process by column (#14483) @mroeschke
Improve memory footprint of isin by using contains (#14478) @wence-
Move creation of env.yaml outside the current directory (#14476) @davidwendt
Enable pd.Timestamp objects to be picklable when cudf.pandas is active (#14474) @shwina
Correct dtype of count aggregations on empty dataframes (#14473) @wence-
Avoid DataFrame conversion in MultiIndex.from_pandas (#14470) @mroeschke
JSON writer: avoid default stream use in string_scalar constructors (#14444) @vuule
Fix default stream use in the CSV reader (#14443) @vuule
Preserve DataFrame(columns=).columns dtype during empty-like construction (#14381) @mroeschke
Defer PTX file load to runtime (#13690) @brandon-b-miller

📖 Documentation

Disable parallel build (#14796) @vyasr
Add pylibcudf to the docs (#14791) @vyasr
Describe unpickling expectations when cudf.pandas is enabled (#14693) @shwina
Update CONTRIBUTING for pyproject-only builds (#14653) @vyasr
More doxygen fixes (#14639) @vyasr
Enable doxygen XML generation and fix issues (#14477) @vyasr
Some doxygen improvements (#14469) @vyasr
Remove warning in dask-cudf docs (#14454) @wence-
Update README links with redirects. (#14378) @bdice
Add pip install instructions to README (#13677) @shwina

🚀 New Features

Add ci check for external kernels (#14768) @robertmaynard
JSON single quote normalization API (#14729) @shrshi
Write cuDF version in Parquet "created_by" metadata field (#14721) @etseidl
Implement remaining copying APIs in pylibcudf along with required helper functions (#14640) @vyasr
Don't constrain numba<0.58 (#14616) @brandon-b-miller
Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet (#14590) @etseidl
JSON - Parse mixed types as string in JSON reader (#14572) @karthikeyann
JSON quote normalization (#14545) @shrshi
Make DefaultHostMemoryAllocator settable (#14523) @gerashegalov
Implement more copying APIs in pylibcudf (#14508) @vyasr
Include writer code and writerVersion in ORC files (#14458) @vuule
Parquet sub-rowgroup reading. (#14360) @nvdbaranec
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
PARQUET-2261 Size Statistics (#14000) @etseidl
Improve GroupBy JIT error handling (#13854) @brandon-b-miller
Generate unified Python/C++ docs (#13846) @vyasr
Expand JIT groupby test suite (#13813) @brandon-b-miller

🛠️ Improvements

Pin pytest<8 (#14920) @galipremsagar
Move cudf::char_utf8 definition from detail to public header (#14779) @davidwendt
Clean up TimedeltaIndex.__init__ constructor (#14775) @mroeschke
Clean up DatetimeIndex.__init__ constructor (#14774) @mroeschke
Some frame.py typing, move seldom used methods in frame.py (#14766) @mroeschke
Remove **kwargs from astype (#14765) @mroeschke
fix benchmarks compatibility with newer pytest-cases (#14764) @jameslamb
Add pynvjitlink as a dependency (#14763) @brandon-b-miller
Resolve degenerate performance in create_structs_data (#14761) @SurajAralihalli
Simplify ColumnAccessor methods; avoid unnecessary validations (#14758) @mroeschke
Pin pytest-cases<3.8.2 (#14756) @mroeschke
Use _from_data instead of _from_columns for initialzing Frame (#14755) @mroeschke
Consolidate cudf object handling in as_column (#14754) @mroeschke
Reduce execution time of Parquet C++ tests (#14750) @vuule
Implement to_datetime(..., utc=True) (#14749) @mroeschke
Remove usages of rapids-env-update (#14748) @KyleFromNVIDIA
Provide explicit pool size and avoid RMM detail APIs (#14741) @harrism
Implement cudf.MultiIndex.from_arrays (#14740) @mroeschke
Remove unused/single use methods (#14739) @mroeschke
refactor CUDA versions in dependencies.yaml (#14733) @jameslamb
Remove unneeded methods in Column (#14730) @mroeschke
Clean up base column methods (#14725) @mroeschke
Ensure column.fillna signatures are consistent (#14724) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Replace as_numerical with as_numerical_column/codes (#14719) @mroeschke
Use offsetalator in gather_chars (#14700) @davidwendt
Use make_strings_children for fill() specialization logic (#14697) @davidwendt
Change io::detail::orc namespace into io::orc::detail (#14696) @ttnghia
Fix call to deprecated factory function (#14695) @davidwendt
Use as_column instead of arange for range like inputs (#14689) @mroeschke
Reorganize ORC reader into multiple files and perform some small fixes to cuIO code (#14665) @ttnghia
Split parquet test into multiple files (#14663) @etseidl
Custom error messages for IO with nonexistent files (#14662) @vuule
Explicitly pass .dtype into is_foo_dtype functions (#14657) @mroeschke
Basic val...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

13 Feb 14:24

raydouglass

v24.02.01

33ffdf5

v24.02.01

🚨 Breaking Changes

Remove **kwargs from astype (#14765) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Drop Pascal GPU support. (#14630) @bdice
Update to CCCL 2.2.0. (#14576) @bdice
Expunge as_frame conversions in Column algorithms (#14491) @wence-
Deprecate cudf::make_strings_column accepting typed offsets (#14461) @davidwendt
Remove deprecated nvtext::load_merge_pairs_file (#14460) @davidwendt
Include writer code and writerVersion in ORC files (#14458) @vuule
Remove null mask for zero nulls in json readers (#14451) @karthikeyann
REF: Remove **kwargs from to_pandas, raise if nullable is not implemented (#14438) @mroeschke
Consolidate 1D pandas object handling in as_column (#14394) @mroeschke
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
Switch to scikit-build-core (#13531) @vyasr

🐛 Bug Fixes

[HOTFIX] Unpin numba<0.58 (#15031) @raydouglass
Exclude tests from builds (#14981) @vyasr
Fix the bounce buffer size in ORC writer (#14947) @vuule
Revert sum/product aggregation to always produce int64_t type (#14907) @SurajAralihalli
Fixed an issue with output chunking computation stemming from input chunking. (#14889) @nvdbaranec
Fix total_byte_size in Parquet row group metadata (#14802) @etseidl
Fix index difference to follow the pandas format (#14789) @amiralimi
Fix shared-workflows repo name (#14784) @raydouglass
Remove unparseable attributes from all nodes (#14780) @vyasr
Refactor and add validation to IntervalIndex.init (#14778) @mroeschke
Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer (#14772) @etseidl
Fix calls to deprecated strings factory API (#14771) @davidwendt
Fix ptx file discovery in editable installs (#14767) @vyasr
Revise shuffle deprecation to align with dask/dask (#14762) @rjzamora
Enable intermediate proxies to be picklable (#14752) @shwina
Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it (#14751) @etseidl
Fix CMake args (#14746) @vyasr
Fix logic bug introduced in #14730 (#14742) @wence-
[Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds (#14731) @razajafri
Fix Groupby.get_group (#14728) @rjzamora
Ensure that all CUDA kernels in cudf have hidden visibility. (#14726) @robertmaynard
Split cuda versions for notebook testing (#14722) @raydouglass
Fix to_numeric not preserving Series index and name (#14718) @mroeschke
Update dask-cudf wheel name (#14713) @raydouglass
Fix strings::contains matching end of string target (#14711) @davidwendt
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Write file-level statistics when writing ORC files with zero rows (#14707) @vuule
Potential fix for peformance regression in #14415 (#14706) @etseidl
Ensure DataFrame column types are preserved during serialization (#14705) @mroeschke
Skip numba test that fails on ARM (#14702) @brandon-b-miller
Allow Z in datetime string parsing in non pandas compat mode (#14701) @mroeschke
Fix nan_as_null not being respected when passing arrow object (#14688) @mroeschke
Fix constructing Series/Index from arrow array and dtype (#14686) @mroeschke
Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply (#14679) @SurajAralihalli
Add BaseOffset as a final proxy type to pass instancechecks for offsets against BaseOffset (#14678) @shwina
Add row conversion code from spark-rapids-jni (#14664) @ttnghia
Unconditionally export the CCCL path (#14656) @vyasr
Ensure libcudf searches for our patched version of CCCL first (#14655) @robertmaynard
Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink (#14648) @vyasr
Fix invalid memory access in Parquet reader (#14637) @etseidl
Use column_empty over as_column([]) (#14632) @mroeschke
Add (implicit) handling for torch tensors in is_scalar (#14623) @wence-
Fix astype/fillna not maintaining column subclass and types (#14615) @mroeschke
Remove non-empty nulls in cudf::get_json_object (#14609) @davidwendt
Remove cuda::proclaim_return_type from nested lambda (#14607) @ttnghia
Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex (#14605) @mroeschke
Address potential race conditions in Parquet reader (#14602) @etseidl
Fix DataFrame.reindex removing column name (#14601) @mroeschke
Remove unsanitized input test data from copy gtests (#14600) @davidwendt
Fix race detected in Parquet writer (#14598) @etseidl
Correct invalid or missing return types (#14587) @robertmaynard
Fix unsanitized nulls from strings segmented-reduce (#14586) @davidwendt
Upgrade to nvCOMP 3.0.5 (#14581) @davidwendt
Fix unsanitized nulls produced by cudf::clamp APIs (#14580) @davidwendt
Fix unsanitized nulls produced by libcudf dictionary decode (#14578) @davidwendt
Fixes a symbol group lookup table issue (#14561) @elstehle
Drop llvm16 from cuda118-conda devcontainer image (#14526) @charlesbluca
REF: Make DataFrame.from_pandas process by column (#14483) @mroeschke
Improve memory footprint of isin by using contains (#14478) @wence-
Move creation of env.yaml outside the current directory (#14476) @davidwendt
Enable pd.Timestamp objects to be picklable when cudf.pandas is active (#14474) @shwina
Correct dtype of count aggregations on empty dataframes (#14473) @wence-
Avoid DataFrame conversion in MultiIndex.from_pandas (#14470) @mroeschke
JSON writer: avoid default stream use in string_scalar constructors (#14444) @vuule
Fix default stream use in the CSV reader (#14443) @vuule
Preserve DataFrame(columns=).columns dtype during empty-like construction (#14381) @mroeschke
Defer PTX file load to runtime (#13690) @brandon-b-miller

📖 Documentation

Disable parallel build (#14796) @vyasr
Add pylibcudf to the docs (#14791) @vyasr
Describe unpickling expectations when cudf.pandas is enabled (#14693) @shwina
Update CONTRIBUTING for pyproject-only builds (#14653) @vyasr
More doxygen fixes (#14639) @vyasr
Enable doxygen XML generation and fix issues (#14477) @vyasr
Some doxygen improvements (#14469) @vyasr
Remove warning in dask-cudf docs (#14454) @wence-
Update README links with redirects. (#14378) @bdice
Add pip install instructions to README (#13677) @shwina

🚀 New Features

Add ci check for external kernels (#14768) @robertmaynard
JSON single quote normalization API (#14729) @shrshi
Write cuDF version in Parquet "created_by" metadata field (#14721) @etseidl
Implement remaining copying APIs in pylibcudf along with required helper functions (#14640) @vyasr
Don't constrain numba<0.58 (#14616) @brandon-b-miller
Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet (#14590) @etseidl
JSON - Parse mixed types as string in JSON reader (#14572) @karthikeyann
JSON quote normalization (#14545) @shrshi
Make DefaultHostMemoryAllocator settable (#14523) @gerashegalov
Implement more copying APIs in pylibcudf (#14508) @vyasr
Include writer code and writerVersion in ORC files (#14458) @vuule
Parquet sub-rowgroup reading. (#14360) @nvdbaranec
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
PARQUET-2261 Size Statistics (#14000) @etseidl
Improve GroupBy JIT error handling (#13854) @brandon-b-miller
Generate unified Python/C++ docs (#13846) @vyasr
Expand JIT groupby test suite (#13813) @brandon-b-miller

🛠️ Improvements

Pin pytest<8 (#14920) @galipremsagar
Move cudf::char_utf8 definition from detail to public header (#14779) @davidwendt
Clean up TimedeltaIndex.__init__ constructor (#14775) @mroeschke
Clean up DatetimeIndex.__init__ constructor (#14774) @mroeschke
Some frame.py typing, move seldom used methods in frame.py (#14766) @mroeschke
Remove **kwargs from astype (#14765) @mroeschke
fix benchmarks compatibility with newer pytest-cases (#14764) @jameslamb
Add pynvjitlink as a dependency (#14763) @brandon-b-miller
Resolve degenerate performance in create_structs_data (#14761) @SurajAralihalli
Simplify ColumnAccessor methods; avoid unnecessary validations (#14758) @mroeschke
Pin pytest-cases<3.8.2 (#14756) @mroeschke
Use _from_data instead of _from_columns for initialzing Frame (#14755) @mroeschke
Consolidate cudf object handling in as_column (#14754) @mroeschke
Reduce execution time of Parquet C++ tests (#14750) @vuule
Implement to_datetime(..., utc=True) (#14749) @mroeschke
Remove usages of rapids-env-update (#14748) @KyleFromNVIDIA
Provide explicit pool size and avoid RMM detail APIs (#14741) @harrism
Implement cudf.MultiIndex.from_arrays (#14740) @mroeschke
Remove unused/single use methods (#14739) @mroeschke
refactor CUDA versions in dependencies.yaml (#14733) @jameslamb
Remove unneeded methods in Column (#14730) @mroeschke
Clean up base column methods (#14725) @mroeschke
Ensure column.fillna signatures are consistent (#14724) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Replace as_numerical with as_numerical_column/codes (#14719) @mroeschke
Use offsetalator in gather_chars (#14700) @davidwendt
Use make_strings_children for fill() specialization logic (#14697) @davidwendt
Change io::detail::orc namespace into io::orc::detail (#14696) @ttnghia
Fix call to deprecated factory function (#14695) @davidwendt
Use as_column instead of arange for range like inputs (#14689) @mroeschke
Reorganize ORC reader into multiple files and perform some small fixes to cuIO code (#14665) @ttnghia
Split parquet test into multiple files (#14663) @etseidl
Custom error messages for IO with nonexistent files (#14662) @vuule
Explicitly pass .dtype into is_foo_dtype functions (#14657) @mroeschke
Basic validation in reader benchmarks (#14647) @v...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

12 Feb 21:28

raydouglass

v24.02.00

a48e9fc

v24.02.00

🚨 Breaking Changes

Remove **kwargs from astype (#14765) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Drop Pascal GPU support. (#14630) @bdice
Update to CCCL 2.2.0. (#14576) @bdice
Expunge as_frame conversions in Column algorithms (#14491) @wence-
Deprecate cudf::make_strings_column accepting typed offsets (#14461) @davidwendt
Remove deprecated nvtext::load_merge_pairs_file (#14460) @davidwendt
Include writer code and writerVersion in ORC files (#14458) @vuule
Remove null mask for zero nulls in json readers (#14451) @karthikeyann
REF: Remove **kwargs from to_pandas, raise if nullable is not implemented (#14438) @mroeschke
Consolidate 1D pandas object handling in as_column (#14394) @mroeschke
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
Switch to scikit-build-core (#13531) @vyasr

🐛 Bug Fixes

Exclude tests from builds (#14981) @vyasr
Fix the bounce buffer size in ORC writer (#14947) @vuule
Revert sum/product aggregation to always produce int64_t type (#14907) @SurajAralihalli
Fixed an issue with output chunking computation stemming from input chunking. (#14889) @nvdbaranec
Fix total_byte_size in Parquet row group metadata (#14802) @etseidl
Fix index difference to follow the pandas format (#14789) @amiralimi
Fix shared-workflows repo name (#14784) @raydouglass
Remove unparseable attributes from all nodes (#14780) @vyasr
Refactor and add validation to IntervalIndex.init (#14778) @mroeschke
Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer (#14772) @etseidl
Fix calls to deprecated strings factory API (#14771) @davidwendt
Fix ptx file discovery in editable installs (#14767) @vyasr
Revise shuffle deprecation to align with dask/dask (#14762) @rjzamora
Enable intermediate proxies to be picklable (#14752) @shwina
Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it (#14751) @etseidl
Fix CMake args (#14746) @vyasr
Fix logic bug introduced in #14730 (#14742) @wence-
[Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds (#14731) @razajafri
Fix Groupby.get_group (#14728) @rjzamora
Ensure that all CUDA kernels in cudf have hidden visibility. (#14726) @robertmaynard
Split cuda versions for notebook testing (#14722) @raydouglass
Fix to_numeric not preserving Series index and name (#14718) @mroeschke
Update dask-cudf wheel name (#14713) @raydouglass
Fix strings::contains matching end of string target (#14711) @davidwendt
Update to Dask's shuffle_method kwarg (#14708) @pentschev
Write file-level statistics when writing ORC files with zero rows (#14707) @vuule
Potential fix for peformance regression in #14415 (#14706) @etseidl
Ensure DataFrame column types are preserved during serialization (#14705) @mroeschke
Skip numba test that fails on ARM (#14702) @brandon-b-miller
Allow Z in datetime string parsing in non pandas compat mode (#14701) @mroeschke
Fix nan_as_null not being respected when passing arrow object (#14688) @mroeschke
Fix constructing Series/Index from arrow array and dtype (#14686) @mroeschke
Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply (#14679) @SurajAralihalli
Add BaseOffset as a final proxy type to pass instancechecks for offsets against BaseOffset (#14678) @shwina
Add row conversion code from spark-rapids-jni (#14664) @ttnghia
Unconditionally export the CCCL path (#14656) @vyasr
Ensure libcudf searches for our patched version of CCCL first (#14655) @robertmaynard
Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink (#14648) @vyasr
Fix invalid memory access in Parquet reader (#14637) @etseidl
Use column_empty over as_column([]) (#14632) @mroeschke
Add (implicit) handling for torch tensors in is_scalar (#14623) @wence-
Fix astype/fillna not maintaining column subclass and types (#14615) @mroeschke
Remove non-empty nulls in cudf::get_json_object (#14609) @davidwendt
Remove cuda::proclaim_return_type from nested lambda (#14607) @ttnghia
Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex (#14605) @mroeschke
Address potential race conditions in Parquet reader (#14602) @etseidl
Fix DataFrame.reindex removing column name (#14601) @mroeschke
Remove unsanitized input test data from copy gtests (#14600) @davidwendt
Fix race detected in Parquet writer (#14598) @etseidl
Correct invalid or missing return types (#14587) @robertmaynard
Fix unsanitized nulls from strings segmented-reduce (#14586) @davidwendt
Upgrade to nvCOMP 3.0.5 (#14581) @davidwendt
Fix unsanitized nulls produced by cudf::clamp APIs (#14580) @davidwendt
Fix unsanitized nulls produced by libcudf dictionary decode (#14578) @davidwendt
Fixes a symbol group lookup table issue (#14561) @elstehle
Drop llvm16 from cuda118-conda devcontainer image (#14526) @charlesbluca
REF: Make DataFrame.from_pandas process by column (#14483) @mroeschke
Improve memory footprint of isin by using contains (#14478) @wence-
Move creation of env.yaml outside the current directory (#14476) @davidwendt
Enable pd.Timestamp objects to be picklable when cudf.pandas is active (#14474) @shwina
Correct dtype of count aggregations on empty dataframes (#14473) @wence-
Avoid DataFrame conversion in MultiIndex.from_pandas (#14470) @mroeschke
JSON writer: avoid default stream use in string_scalar constructors (#14444) @vuule
Fix default stream use in the CSV reader (#14443) @vuule
Preserve DataFrame(columns=).columns dtype during empty-like construction (#14381) @mroeschke
Defer PTX file load to runtime (#13690) @brandon-b-miller

📖 Documentation

Disable parallel build (#14796) @vyasr
Add pylibcudf to the docs (#14791) @vyasr
Describe unpickling expectations when cudf.pandas is enabled (#14693) @shwina
Update CONTRIBUTING for pyproject-only builds (#14653) @vyasr
More doxygen fixes (#14639) @vyasr
Enable doxygen XML generation and fix issues (#14477) @vyasr
Some doxygen improvements (#14469) @vyasr
Remove warning in dask-cudf docs (#14454) @wence-
Update README links with redirects. (#14378) @bdice
Add pip install instructions to README (#13677) @shwina

🚀 New Features

Add ci check for external kernels (#14768) @robertmaynard
JSON single quote normalization API (#14729) @shrshi
Write cuDF version in Parquet "created_by" metadata field (#14721) @etseidl
Implement remaining copying APIs in pylibcudf along with required helper functions (#14640) @vyasr
Don't constrain numba<0.58 (#14616) @brandon-b-miller
Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet (#14590) @etseidl
JSON - Parse mixed types as string in JSON reader (#14572) @karthikeyann
JSON quote normalization (#14545) @shrshi
Make DefaultHostMemoryAllocator settable (#14523) @gerashegalov
Implement more copying APIs in pylibcudf (#14508) @vyasr
Include writer code and writerVersion in ORC files (#14458) @vuule
Parquet sub-rowgroup reading. (#14360) @nvdbaranec
Move chars column to parent data buffer in strings column (#14202) @karthikeyann
PARQUET-2261 Size Statistics (#14000) @etseidl
Improve GroupBy JIT error handling (#13854) @brandon-b-miller
Generate unified Python/C++ docs (#13846) @vyasr
Expand JIT groupby test suite (#13813) @brandon-b-miller

🛠️ Improvements

Pin pytest<8 (#14920) @galipremsagar
Move cudf::char_utf8 definition from detail to public header (#14779) @davidwendt
Clean up TimedeltaIndex.__init__ constructor (#14775) @mroeschke
Clean up DatetimeIndex.__init__ constructor (#14774) @mroeschke
Some frame.py typing, move seldom used methods in frame.py (#14766) @mroeschke
Remove **kwargs from astype (#14765) @mroeschke
fix benchmarks compatibility with newer pytest-cases (#14764) @jameslamb
Add pynvjitlink as a dependency (#14763) @brandon-b-miller
Resolve degenerate performance in create_structs_data (#14761) @SurajAralihalli
Simplify ColumnAccessor methods; avoid unnecessary validations (#14758) @mroeschke
Pin pytest-cases<3.8.2 (#14756) @mroeschke
Use _from_data instead of _from_columns for initialzing Frame (#14755) @mroeschke
Consolidate cudf object handling in as_column (#14754) @mroeschke
Reduce execution time of Parquet C++ tests (#14750) @vuule
Implement to_datetime(..., utc=True) (#14749) @mroeschke
Remove usages of rapids-env-update (#14748) @KyleFromNVIDIA
Provide explicit pool size and avoid RMM detail APIs (#14741) @harrism
Implement cudf.MultiIndex.from_arrays (#14740) @mroeschke
Remove unused/single use methods (#14739) @mroeschke
refactor CUDA versions in dependencies.yaml (#14733) @jameslamb
Remove unneeded methods in Column (#14730) @mroeschke
Clean up base column methods (#14725) @mroeschke
Ensure column.fillna signatures are consistent (#14724) @mroeschke
Remove mimesis as a testing dependency (#14723) @mroeschke
Replace as_numerical with as_numerical_column/codes (#14719) @mroeschke
Use offsetalator in gather_chars (#14700) @davidwendt
Use make_strings_children for fill() specialization logic (#14697) @davidwendt
Change io::detail::orc namespace into io::orc::detail (#14696) @ttnghia
Fix call to deprecated factory function (#14695) @davidwendt
Use as_column instead of arange for range like inputs (#14689) @mroeschke
Reorganize ORC reader into multiple files and perform some small fixes to cuIO code (#14665) @ttnghia
Split parquet test into multiple files (#14663) @etseidl
Custom error messages for IO with nonexistent files (#14662) @vuule
Explicitly pass .dtype into is_foo_dtype functions (#14657) @mroeschke
Basic validation in reader benchmarks (#14647) @vuule
Update dependencies.yaml to support CUDA 12.*....

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

08 Dec 21:31

raydouglass

v23.12.01

2ce4621

v23.12.01

🚨 Breaking Changes

Raise error in reindex when index is not unique (#14400) @galipremsagar
Expose stream parameter to get_json_object API (#14297) @davidwendt
Refactor cudf_kafka to use skbuild (#14292) @jdye64
Expose stream parameter in public strings convert APIs (#14255) @davidwendt
Upgrade to nvCOMP 3.0.4 (#13815) @vuule

🐛 Bug Fixes

Fix synchronization issue when writing string columns with dictionary to ORC (#14595) @vuule
Update actions/labeler to v4 (#14562) @raydouglass
Fix data corruption when skipping rows (#14557) @etseidl
Fix function name typo in cudf.pandas profiler (#14514) @galipremsagar
Fix intermediate type checking in expression parsing (#14445) @vyasr
Forward merge branch-23.10 into branch-23.12 (#14435) @raydouglass
Remove needs: wheel-build-cudf. (#14427) @bdice
Fix dask dependency in custreamz (#14420) @vyasr
Ensure nvbench initializes nvml context when built statically (#14411) @robertmaynard
Support java AST String literal with desired encoding (#14402) @winningsix
Raise error in reindex when index is not unique (#14400) @galipremsagar
Always build nvbench statically so we don't need to package it (#14399) @robertmaynard
Fix token-count logic in nvtext::tokenize_with_vocabulary (#14393) @davidwendt
Fix as_column(pd.Timestamp/Timedelta, length=) not respecting length (#14390) @mroeschke
cudf.pandas: cuDF subpath checking in module __getattr__ (#14388) @shwina
Fix and disable encoding for nanosecond statistics in ORC writer (#14367) @vuule
Add the new manylinux builds to the build job (#14351) @vyasr
cudf jit parser now supports .pragma instructions with quotes (#14348) @robertmaynard
Fix overflow check in cudf::merge (#14345) @divyegala
Add cramjam (#14344) @vyasr
Enable dask_cudf/io pytests in CI (#14338) @galipremsagar
Temporarily avoid the current build of pydata-sphinx-theme (#14332) @vyasr
Fix host buffer access from device function in the Parquet reader (#14328) @vuule
Run IO tests for Dask-cuDF (#14327) @rjzamora
Fix logical type issues in the Parquet writer (#14322) @vuule
Remove aws-sdk-pinning and revert to arrow 12.0.1 (#14319) @vyasr
test is_valid before reading column data (#14318) @etseidl
Fix gtest validity setting for TextTokenizeTest.Vocabulary (#14312) @davidwendt
Fixes stack context for json lines format that recovers from invalid JSON lines (#14309) @elstehle
Downgrade to Arrow 12.0.0 for aws-sdk-cpp and fix cudf_kafka builds for new CI containers (#14296) @vyasr
fixing thread index overflow issue (#14290) @hyperbolic2346
Fix memset error in nvtext::edit_distance_matrix (#14283) @davidwendt
Changes JSON reader's recovery option's behaviour to ignore all characters after a valid JSON record (#14279) @elstehle
Handle empty string correctly in Parquet statistics (#14257) @etseidl
Fixes behaviour for incomplete lines when recover_with_nulls is enabled (#14252) @elstehle
cudf::detail::pinned_allocator doesn't throw from deallocate (#14251) @robertmaynard
Fix strings replace for adjacent, identical multi-byte UTF-8 character targets (#14235) @davidwendt
Fix the precision when converting a decimal128 column to an arrow array (#14230) @jihoonson
Fixing parquet list of struct interpretation (#13715) @hyperbolic2346

📖 Documentation

Fix io reference in docs. (#14452) @bdice
Update README (#14374) @shwina
Example code for blog on new row comparators (#13795) @divyegala

🚀 New Features

Expose streams in public unary APIs (#14342) @vyasr
Add python tests for Parquet DELTA_BINARY_PACKED encoder (#14316) @etseidl
Update rapids-cmake functions to non-deprecated signatures (#14265) @robertmaynard
Expose streams in public null mask APIs (#14263) @vyasr
Expose streams in binaryop APIs (#14187) @vyasr
Add pylibcudf.Scalar that interoperates with Arrow scalars (#14133) @vyasr
Add decoder for DELTA_BYTE_ARRAY to Parquet reader (#14101) @etseidl
Add DELTA_BINARY_PACKED encoder for Parquet writer (#14100) @etseidl
Add BytePairEncoder class to cuDF (#13891) @davidwendt
Upgrade to nvCOMP 3.0.4 (#13815) @vuule
Use pynvjitlink for CUDA 12+ MVC (#13650) @brandon-b-miller

🛠️ Improvements

Build concurrency for nightly and merge triggers (#14441) @bdice
Cleanup remaining usages of dask dependencies (#14407) @galipremsagar
Update to Arrow 14.0.1. (#14387) @bdice
Remove Cython libcpp wrappers (#14382) @vyasr
Forward-merge branch-23.10 to branch-23.12 (#14372) @bdice
Upgrade to arrow 14 (#14371) @galipremsagar
Fix a pytest typo in test_kurt_skew_error (#14368) @galipremsagar
Use new rapids-dask-dependency metapackage for managing dask versions (#14364) @vyasr
Change nullable() to has_nulls() in cudf::detail::gather (#14363) @divyegala
Split up scan_inclusive.cu to improve its compile time (#14358) @davidwendt
Implement user_datasource_wrapper is_empty() and is_device_read_preferred(). (#14357) @tpn
Added streams to CSV reader and writer api (#14340) @shrshi
Upgrade wheels to use arrow 13 (#14339) @vyasr
Rework nvtext::byte_pair_encoding API (#14337) @davidwendt
Improve performance of nvtext::tokenize_with_vocabulary for long strings (#14336) @davidwendt
Upgrade arrow to 13 (#14330) @galipremsagar
Expose stream parameter in public nvtext replace APIs (#14329) @davidwendt
Drop pyorc dependency and use pandas/pyarrow instead (#14323) @galipremsagar
Avoid pyarrow.fs import for local storage (#14321) @rjzamora
Unpin dask and distributed for 23.12 development (#14320) @galipremsagar
Expose stream parameter in public nvtext tokenize APIs (#14317) @davidwendt
Added streams to JSON reader and writer api (#14313) @shrshi
Minor improvements in source_info (#14308) @vuule
Forward-merge branch-23.10 to branch-23.12 (#14307) @bdice
Add stream parameter to Set Operations (Public List APIs) (#14305) @SurajAralihalli
Expose stream parameter to get_json_object API (#14297) @davidwendt
Sort dictionary data alphabetically in the ORC writer (#14295) @vuule
Expose stream parameter in public strings filter APIs (#14293) @davidwendt
Refactor cudf_kafka to use skbuild (#14292) @jdye64
Update shared-action-workflows references (#14289) @AyodeAwe
Register partd encode dispatch in dask_cudf (#14287) @rjzamora
Update versioning strategy (#14285) @vyasr
Move and rename byte-pair-encoding source files (#14284) @davidwendt
Expose stream parameter in public strings combine APIs (#14281) @davidwendt
Expose stream parameter in public strings contains APIs (#14280) @davidwendt
Add stream parameter to List Sort and Filter APIs (#14272) @SurajAralihalli
Use branch-23.12 workflows. (#14271) @bdice
Refactor LogicalType for Parquet (#14264) @etseidl
Centralize chunked reading code in the parquet reader to reader_impl_chunking.cu (#14262) @nvdbaranec
Expose stream parameter in public strings replace APIs (#14261) @davidwendt
Expose stream parameter in public strings APIs (#14260) @davidwendt
Cleanup of namespaces in parquet code. (#14259) @nvdbaranec
Make parquet schema index type consistent (#14256) @hyperbolic2346
Expose stream parameter in public strings convert APIs (#14255) @davidwendt
Add in java bindings for DataSource (#14254) @revans2
Reimplement cudf::merge for nested types without using comparators (#14250) @divyegala
Add stream parameter to List Manipulation and Operations APIs (#14248) @SurajAralihalli
Expose stream parameter in public strings split/partition APIs (#14247) @davidwendt
Improve contains_column by invoking contains_table (#14238) @PointKernel
Detect and report errors in Parquet header parsing (#14237) @etseidl
Normalizing offsets iterator (#14234) @davidwendt
Forward merge 23.10 into 23.12 (#14231) @galipremsagar
Return error if BOOL8 column-type is used with integers-to-hex (#14208) @davidwendt
Enable indexalator for device code (#14206) @davidwendt
Marginally reduce memory footprint of joins (#14197) @wence-
Add nvtx annotations to spilling-based data movement (#14196) @wence-
Optimize ORC writer for decimal columns (#14190) @vuule
Remove the use of volatile in ORC (#14175) @vuule
Add bytes_per_second to distinct_count of stream_compaction nvbench. (#14172) @Blonck
Add bytes_per_second to transpose benchmark (#14170) @Blonck
cuDF: Build CUDA 12.0 ARM conda packages. (#14112) @bdice
Add bytes_per_second to shift benchmark (#13950) @Blonck
Extract debug_utilities.hpp/cu from column_utilities.hpp/cu (#13720) @ttnghia

Contributors

robertmaynard, tpn, and 26 other contributors

Assets 2

06 Dec 16:26

raydouglass

v23.12.00

c1d3073

v23.12.00

🚨 Breaking Changes

Raise error in reindex when index is not unique (#14400) @galipremsagar
Expose stream parameter to get_json_object API (#14297) @davidwendt
Refactor cudf_kafka to use skbuild (#14292) @jdye64
Expose stream parameter in public strings convert APIs (#14255) @davidwendt
Upgrade to nvCOMP 3.0.4 (#13815) @vuule

🐛 Bug Fixes

Update actions/labeler to v4 (#14562) @raydouglass
Fix data corruption when skipping rows (#14557) @etseidl
Fix function name typo in cudf.pandas profiler (#14514) @galipremsagar
Fix intermediate type checking in expression parsing (#14445) @vyasr
Forward merge branch-23.10 into branch-23.12 (#14435) @raydouglass
Remove needs: wheel-build-cudf. (#14427) @bdice
Fix dask dependency in custreamz (#14420) @vyasr
Ensure nvbench initializes nvml context when built statically (#14411) @robertmaynard
Support java AST String literal with desired encoding (#14402) @winningsix
Raise error in reindex when index is not unique (#14400) @galipremsagar
Always build nvbench statically so we don't need to package it (#14399) @robertmaynard
Fix token-count logic in nvtext::tokenize_with_vocabulary (#14393) @davidwendt
Fix as_column(pd.Timestamp/Timedelta, length=) not respecting length (#14390) @mroeschke
cudf.pandas: cuDF subpath checking in module __getattr__ (#14388) @shwina
Fix and disable encoding for nanosecond statistics in ORC writer (#14367) @vuule
Add the new manylinux builds to the build job (#14351) @vyasr
cudf jit parser now supports .pragma instructions with quotes (#14348) @robertmaynard
Fix overflow check in cudf::merge (#14345) @divyegala
Add cramjam (#14344) @vyasr
Enable dask_cudf/io pytests in CI (#14338) @galipremsagar
Temporarily avoid the current build of pydata-sphinx-theme (#14332) @vyasr
Fix host buffer access from device function in the Parquet reader (#14328) @vuule
Run IO tests for Dask-cuDF (#14327) @rjzamora
Fix logical type issues in the Parquet writer (#14322) @vuule
Remove aws-sdk-pinning and revert to arrow 12.0.1 (#14319) @vyasr
test is_valid before reading column data (#14318) @etseidl
Fix gtest validity setting for TextTokenizeTest.Vocabulary (#14312) @davidwendt
Fixes stack context for json lines format that recovers from invalid JSON lines (#14309) @elstehle
Downgrade to Arrow 12.0.0 for aws-sdk-cpp and fix cudf_kafka builds for new CI containers (#14296) @vyasr
fixing thread index overflow issue (#14290) @hyperbolic2346
Fix memset error in nvtext::edit_distance_matrix (#14283) @davidwendt
Changes JSON reader's recovery option's behaviour to ignore all characters after a valid JSON record (#14279) @elstehle
Handle empty string correctly in Parquet statistics (#14257) @etseidl
Fixes behaviour for incomplete lines when recover_with_nulls is enabled (#14252) @elstehle
cudf::detail::pinned_allocator doesn't throw from deallocate (#14251) @robertmaynard
Fix strings replace for adjacent, identical multi-byte UTF-8 character targets (#14235) @davidwendt
Fix the precision when converting a decimal128 column to an arrow array (#14230) @jihoonson
Fixing parquet list of struct interpretation (#13715) @hyperbolic2346

📖 Documentation

Fix io reference in docs. (#14452) @bdice
Update README (#14374) @shwina
Example code for blog on new row comparators (#13795) @divyegala

🚀 New Features

Expose streams in public unary APIs (#14342) @vyasr
Add python tests for Parquet DELTA_BINARY_PACKED encoder (#14316) @etseidl
Update rapids-cmake functions to non-deprecated signatures (#14265) @robertmaynard
Expose streams in public null mask APIs (#14263) @vyasr
Expose streams in binaryop APIs (#14187) @vyasr
Add pylibcudf.Scalar that interoperates with Arrow scalars (#14133) @vyasr
Add decoder for DELTA_BYTE_ARRAY to Parquet reader (#14101) @etseidl
Add DELTA_BINARY_PACKED encoder for Parquet writer (#14100) @etseidl
Add BytePairEncoder class to cuDF (#13891) @davidwendt
Upgrade to nvCOMP 3.0.4 (#13815) @vuule
Use pynvjitlink for CUDA 12+ MVC (#13650) @brandon-b-miller

🛠️ Improvements

Build concurrency for nightly and merge triggers (#14441) @bdice
Cleanup remaining usages of dask dependencies (#14407) @galipremsagar
Update to Arrow 14.0.1. (#14387) @bdice
Remove Cython libcpp wrappers (#14382) @vyasr
Forward-merge branch-23.10 to branch-23.12 (#14372) @bdice
Upgrade to arrow 14 (#14371) @galipremsagar
Fix a pytest typo in test_kurt_skew_error (#14368) @galipremsagar
Use new rapids-dask-dependency metapackage for managing dask versions (#14364) @vyasr
Change nullable() to has_nulls() in cudf::detail::gather (#14363) @divyegala
Split up scan_inclusive.cu to improve its compile time (#14358) @davidwendt
Implement user_datasource_wrapper is_empty() and is_device_read_preferred(). (#14357) @tpn
Added streams to CSV reader and writer api (#14340) @shrshi
Upgrade wheels to use arrow 13 (#14339) @vyasr
Rework nvtext::byte_pair_encoding API (#14337) @davidwendt
Improve performance of nvtext::tokenize_with_vocabulary for long strings (#14336) @davidwendt
Upgrade arrow to 13 (#14330) @galipremsagar
Expose stream parameter in public nvtext replace APIs (#14329) @davidwendt
Drop pyorc dependency and use pandas/pyarrow instead (#14323) @galipremsagar
Avoid pyarrow.fs import for local storage (#14321) @rjzamora
Unpin dask and distributed for 23.12 development (#14320) @galipremsagar
Expose stream parameter in public nvtext tokenize APIs (#14317) @davidwendt
Added streams to JSON reader and writer api (#14313) @shrshi
Minor improvements in source_info (#14308) @vuule
Forward-merge branch-23.10 to branch-23.12 (#14307) @bdice
Add stream parameter to Set Operations (Public List APIs) (#14305) @SurajAralihalli
Expose stream parameter to get_json_object API (#14297) @davidwendt
Sort dictionary data alphabetically in the ORC writer (#14295) @vuule
Expose stream parameter in public strings filter APIs (#14293) @davidwendt
Refactor cudf_kafka to use skbuild (#14292) @jdye64
Update shared-action-workflows references (#14289) @AyodeAwe
Register partd encode dispatch in dask_cudf (#14287) @rjzamora
Update versioning strategy (#14285) @vyasr
Move and rename byte-pair-encoding source files (#14284) @davidwendt
Expose stream parameter in public strings combine APIs (#14281) @davidwendt
Expose stream parameter in public strings contains APIs (#14280) @davidwendt
Add stream parameter to List Sort and Filter APIs (#14272) @SurajAralihalli
Use branch-23.12 workflows. (#14271) @bdice
Refactor LogicalType for Parquet (#14264) @etseidl
Centralize chunked reading code in the parquet reader to reader_impl_chunking.cu (#14262) @nvdbaranec
Expose stream parameter in public strings replace APIs (#14261) @davidwendt
Expose stream parameter in public strings APIs (#14260) @davidwendt
Cleanup of namespaces in parquet code. (#14259) @nvdbaranec
Make parquet schema index type consistent (#14256) @hyperbolic2346
Expose stream parameter in public strings convert APIs (#14255) @davidwendt
Add in java bindings for DataSource (#14254) @revans2
Reimplement cudf::merge for nested types without using comparators (#14250) @divyegala
Add stream parameter to List Manipulation and Operations APIs (#14248) @SurajAralihalli
Expose stream parameter in public strings split/partition APIs (#14247) @davidwendt
Improve contains_column by invoking contains_table (#14238) @PointKernel
Detect and report errors in Parquet header parsing (#14237) @etseidl
Normalizing offsets iterator (#14234) @davidwendt
Forward merge 23.10 into 23.12 (#14231) @galipremsagar
Return error if BOOL8 column-type is used with integers-to-hex (#14208) @davidwendt
Enable indexalator for device code (#14206) @davidwendt
Marginally reduce memory footprint of joins (#14197) @wence-
Add nvtx annotations to spilling-based data movement (#14196) @wence-
Optimize ORC writer for decimal columns (#14190) @vuule
Remove the use of volatile in ORC (#14175) @vuule
Add bytes_per_second to distinct_count of stream_compaction nvbench. (#14172) @Blonck
Add bytes_per_second to transpose benchmark (#14170) @Blonck
cuDF: Build CUDA 12.0 ARM conda packages. (#14112) @bdice
Add bytes_per_second to shift benchmark (#13950) @Blonck
Extract debug_utilities.hpp/cu from column_utilities.hpp/cu (#13720) @ttnghia

Contributors

robertmaynard, tpn, and 26 other contributors

Assets 2

16 Nov 22:26

raydouglass

v23.10.02

ece2b2c

v23.10.02

🚨 Breaking Changes

Raise error in reindex when index is not unique (#14429) @galipremsagar
Expose stream parameter in public nvtext ngram APIs (#14061) @davidwendt
Raise MixedTypeError when a column of mixed-dtype is being constructed (#14050) @galipremsagar
Raise NotImplementedError for MultiIndex.to_series (#14049) @galipremsagar
Create table_input_metadata from a table_metadata (#13920) @etseidl
Enable RLE boolean encoding for v2 Parquet files (#13886) @etseidl
Change NA to NaT for datetime and timedelta types (#13868) @galipremsagar
Fix any, all reduction behavior for axis=None and warn for other reductions (#13831) @galipremsagar
Add minhash support for MurmurHash3_x64_128 (#13796) @davidwendt
Remove the libcudf cudf::offset_type type (#13788) @davidwendt
Raise error when trying to join datetime and timedelta types with other types (#13786) @galipremsagar
Update to Cython 3.0.0 (#13777) @vyasr
Raise error on constructing an array from mixed type inputs (#13768) @galipremsagar
Enforce deprecations in 23.10 (#13732) @galipremsagar
Upgrade to arrow 12 (#13728) @galipremsagar
Remove Arrow dependency from the datasource.hpp public header (#13698) @vuule

🐛 Bug Fixes

Raise error in reindex when index is not unique (#14429) @galipremsagar
Fix inaccurate ceil/floor and inaccurate rescaling casts of fixed-point values. (#14242) @bdice
Fix inaccuracy in decimal128 rounding. (#14233) @bdice
Workaround for illegal instruction error in sm90 for warp instrinsics with mask (#14201) @karthikeyann
Fix pytorch related pytest (#14198) @galipremsagar
Pin to aws-sdk-cpp<1.11 (#14173) @pentschev
Fix assert failure for range window functions (#14168) @mythrocks
Fix Memcheck error found in JSON_TEST JsonReaderTest.ErrorStrings (#14164) @karthikeyann
Fix calls to copy_bitmask to pass stream parameter (#14158) @davidwendt
Fix DataFrame from Series with different CategoricalIndexes (#14157) @mroeschke
Pin to numpy<1.25 and numba<0.58 to avoid errors and deprecation warnings-as-errors. (#14156) @bdice
Fix kernel launch error for cudf::io::orc::gpu::rowgroup_char_counts_kernel (#14139) @davidwendt
Don't sort columns for DataFrame init from list of Series (#14136) @mroeschke
Fix DataFrame.values with no columns but index (#14134) @mroeschke
Avoid circular cimports in _lib/cpp/reduce.pxd (#14125) @vyasr
Add support for nested dict in DataFrame constructor (#14119) @galipremsagar
Restrict iterables of DataFrame's as input to DataFrame constructor (#14118) @galipremsagar
Allow numeric_only=True for reduction operations on numeric types (#14111) @galipremsagar
Preserve name of the column while initializing a DataFrame (#14110) @galipremsagar
Correct numerous 20054-D: dynamic initialization errors found on arm+12.2 (#14108) @robertmaynard
Drop kwargs from Series.count (#14106) @galipremsagar
Fix naming issues with Index.to_frame and MultiIndex.to_frame APIs (#14105) @galipremsagar
Only use memory resources that haven't been freed (#14103) @robertmaynard
Add support for __round__ in Series and DataFrame (#14099) @galipremsagar
Validate ignore_index type in drop_duplicates (#14098) @mroeschke
Fix renaming Series and Index (#14080) @galipremsagar
Raise NotImplementedError in to_datetime if Z (or tz component) in string (#14074) @mroeschke
Raise NotImplementedError for datetime strings with UTC offset (#14070) @mroeschke
Update pyarrow-related dispatch logic in dask_cudf (#14069) @rjzamora
Use conda mambabuild rather than mamba mambabuild (#14067) @wence-
Raise NotImplementedError in to_datetime with dayfirst without infer_format (#14058) @mroeschke
Fix various issues in Index.intersection (#14054) @galipremsagar
Fix Index.difference to match with pandas (#14053) @galipremsagar
Fix empty string column construction (#14052) @galipremsagar
Fix IntervalIndex.union to preserve type-metadata (#14051) @galipremsagar
Raise MixedTypeError when a column of mixed-dtype is being constructed (#14050) @galipremsagar
Raise NotImplementedError for MultiIndex.to_series (#14049) @galipremsagar
Ignore compile_commands.json (#14048) @harrism
Raise TypeError for any non-parseable argument in to_datetime (#14044) @mroeschke
Raise NotImplementedError for to_datetime with z format (#14037) @mroeschke
Implement sort_remaining for sort_index (#14033) @wence-
Raise NotImplementedError for Categoricals with timezones (#14032) @mroeschke
Temporary fix Parquet metadata with empty value string being ignored from writing (#14026) @ttnghia
Preserve types of scalar being returned when possible in quantile (#14014) @galipremsagar
Fix return type of MultiIndex.difference (#14009) @galipremsagar
Raise an error when timezone subtypes are encountered in pd.IntervalDtype (#14006) @galipremsagar
Fix map column can not be non-nullable for java (#14003) @res-life
Fix name selection in Index.difference and Index.intersection (#13986) @galipremsagar
Restore column type metadata with dropna to fix factorize API (#13980) @galipremsagar
Use thread_index_type to avoid out of bounds accesses in conditional joins (#13971) @vyasr
Fix MultiIndex.to_numpy to return numpy array with tuples (#13966) @galipremsagar
Use cudf::thread_index_type in get_json_object and tdigest kernels (#13962) @nvdbaranec
Fix an issue with IntervalIndex.repr when null values are present (#13958) @galipremsagar
Fix type metadata issue preservation with Column.unique (#13957) @galipremsagar
Handle Interval scalars when passed in list-like inputs to cudf.Index (#13956) @galipremsagar
Fix setting of categories order when dtype is passed to a CategoricalColumn (#13955) @galipremsagar
Handle as_index in GroupBy.apply (#13951) @brandon-b-miller
Raise error for string types in nsmallest and nlargest (#13946) @galipremsagar
Fix index of Groupby.apply results when it is performed on empty objects (#13944) @galipremsagar
Fix integer overflow in shim device_sum functions (#13943) @brandon-b-miller
Fix type mismatch in groupby reduction for empty objects (#13942) @galipremsagar
Fixed processed bytes calculation in APPLY_BOOLEAN_MASK benchmark. (#13937) @Blonck
Fix construction of Grouping objects (#13932) @galipremsagar
Fix an issue with loc when column names is MultiIndex (#13929) @galipremsagar
Fix handling of typecasting in searchsorted (#13925) @galipremsagar
Preserve index name in reindex (#13917) @galipremsagar
Use cudf::thread_index_type in cuIO to prevent overflow in row indexing (#13910) @vuule
Fix for encodings listed in the Parquet column chunk metadata (#13907) @etseidl
Use cudf::thread_index_type in concatenate.cu. (#13906) @bdice
Use cudf::thread_index_type in replace.cu. (#13905) @bdice
Add noSanitizer tag to Java reduction tests failing with sanitizer in CUDA 12 (#13904) @jlowe
Remove the internal use of the cudf's default stream in cuIO (#13903) @vuule
Use cuda-nvtx-dev CUDA 12 package. (#13901) @bdice
Use thread_index_type to avoid index overflow in grid-stride loops (#13895) @PointKernel
Fix memory access error in cudf::shift for sliced strings (#13894) @davidwendt
Raise error when trying to construct a DataFrame with mixed types (#13889) @galipremsagar
Return nan when one variable to be correlated has zero variance in JIT GroupBy Apply (#13884) @brandon-b-miller
Correctly detect the BOM mark in read_csv with compressed input (#13881) @vuule
Check for the presence of all values in MultiIndex.isin (#13879) @galipremsagar
Fix nvtext::generate_character_ngrams performance regression for longer strings (#13874) @davidwendt
Fix return type of MultiIndex.levels (#13870) @galipremsagar
Fix List's missing children metadata in JSON writer (#13869) @karthikeyann
Disable construction of Index when freq is set in pandas-compatibility mode (#13857) @galipremsagar
Fix an issue with fetching NA from a TimedeltaColumn (#13853) @galipremsagar
Simplify implementation of interval_range() and fix behaviour for floating freq (#13844) @shwina
Fix binary operations between Series and Index (#13842) @galipremsagar
Update make_lists_column_from_scalar to use make_offsets_child_column utility (#13841) @davidwendt
Fix read out of bounds in string concatenate (#13838) @pentschev
Raise error for more cases when timezone-aware data is passed to as_column (#13835) @galipremsagar
Fix any, all reduction behavior for axis=None and warn for other reductions (#13831) @galipremsagar
Raise error when trying to construct time-zone aware timestamps (#13830) @galipremsagar
Fix cuFile I/O factories (#13829) @vuule
DataFrame with namedtuples uses ._field as column names (#13824) @mroeschke
Branch 23.10 merge 23.08 (#13822) @vyasr
Return a Series from JIT GroupBy apply, rather than a DataFrame (#13820) @brandon-b-miller
No need to dlsym EnsureS3Finalized we can call it directly (#13819) @robertmaynard
Raise error when mixed types are being constructed (#13816) @galipremsagar
Fix unbounded sequence issue in DataFrame constructor (#13811) @galipremsagar
Fix Byte-Pair-Encoding usage of cuco static-map for storing merge-pairs (#13807) @davidwendt
Fix for Parquet writer when requested pages per row is smaller than fragment size (#13806) @etseidl
Remove hangs from trying to construct un-bounded sequences (#13799) @galipremsagar
Bug/update libcudf to handle arrow12 changes (#13794) @robertmaynard
Update get_arrow to arrows 12 CMake target name of arrow::xsimd (#13790) @robertmaynard
Raise error when trying to join datetime and timedelta types with other types (#13786) @galipremsagar
Fix negative unary operation for boolean type (#13780) @galipremsagar
Fix contains(in) method for Series (#13779) @gal...

Contributors

robertmaynard, harrism, and 37 other contributors

Assets 2

11 Oct 15:25

raydouglass

v23.10.00

9f0c2f4

v23.10.00

🚨 Breaking Changes

Expose stream parameter in public nvtext ngram APIs (#14061) @davidwendt
Raise MixedTypeError when a column of mixed-dtype is being constructed (#14050) @galipremsagar
Raise NotImplementedError for MultiIndex.to_series (#14049) @galipremsagar
Create table_input_metadata from a table_metadata (#13920) @etseidl
Enable RLE boolean encoding for v2 Parquet files (#13886) @etseidl
Change NA to NaT for datetime and timedelta types (#13868) @galipremsagar
Fix any, all reduction behavior for axis=None and warn for other reductions (#13831) @galipremsagar
Add minhash support for MurmurHash3_x64_128 (#13796) @davidwendt
Remove the libcudf cudf::offset_type type (#13788) @davidwendt
Raise error when trying to join datetime and timedelta types with other types (#13786) @galipremsagar
Update to Cython 3.0.0 (#13777) @vyasr
Raise error on constructing an array from mixed type inputs (#13768) @galipremsagar
Enforce deprecations in 23.10 (#13732) @galipremsagar
Upgrade to arrow 12 (#13728) @galipremsagar
Remove Arrow dependency from the datasource.hpp public header (#13698) @vuule

🐛 Bug Fixes

Fix inaccurate ceil/floor and inaccurate rescaling casts of fixed-point values. (#14242) @bdice
Fix inaccuracy in decimal128 rounding. (#14233) @bdice
Workaround for illegal instruction error in sm90 for warp instrinsics with mask (#14201) @karthikeyann
Fix pytorch related pytest (#14198) @galipremsagar
Pin to aws-sdk-cpp<1.11 (#14173) @pentschev
Fix assert failure for range window functions (#14168) @mythrocks
Fix Memcheck error found in JSON_TEST JsonReaderTest.ErrorStrings (#14164) @karthikeyann
Fix calls to copy_bitmask to pass stream parameter (#14158) @davidwendt
Fix DataFrame from Series with different CategoricalIndexes (#14157) @mroeschke
Pin to numpy<1.25 and numba<0.58 to avoid errors and deprecation warnings-as-errors. (#14156) @bdice
Fix kernel launch error for cudf::io::orc::gpu::rowgroup_char_counts_kernel (#14139) @davidwendt
Don't sort columns for DataFrame init from list of Series (#14136) @mroeschke
Fix DataFrame.values with no columns but index (#14134) @mroeschke
Avoid circular cimports in _lib/cpp/reduce.pxd (#14125) @vyasr
Add support for nested dict in DataFrame constructor (#14119) @galipremsagar
Restrict iterables of DataFrame's as input to DataFrame constructor (#14118) @galipremsagar
Allow numeric_only=True for reduction operations on numeric types (#14111) @galipremsagar
Preserve name of the column while initializing a DataFrame (#14110) @galipremsagar
Correct numerous 20054-D: dynamic initialization errors found on arm+12.2 (#14108) @robertmaynard
Drop kwargs from Series.count (#14106) @galipremsagar
Fix naming issues with Index.to_frame and MultiIndex.to_frame APIs (#14105) @galipremsagar
Only use memory resources that haven't been freed (#14103) @robertmaynard
Add support for __round__ in Series and DataFrame (#14099) @galipremsagar
Validate ignore_index type in drop_duplicates (#14098) @mroeschke
Fix renaming Series and Index (#14080) @galipremsagar
Raise NotImplementedError in to_datetime if Z (or tz component) in string (#14074) @mroeschke
Raise NotImplementedError for datetime strings with UTC offset (#14070) @mroeschke
Update pyarrow-related dispatch logic in dask_cudf (#14069) @rjzamora
Use conda mambabuild rather than mamba mambabuild (#14067) @wence-
Raise NotImplementedError in to_datetime with dayfirst without infer_format (#14058) @mroeschke
Fix various issues in Index.intersection (#14054) @galipremsagar
Fix Index.difference to match with pandas (#14053) @galipremsagar
Fix empty string column construction (#14052) @galipremsagar
Fix IntervalIndex.union to preserve type-metadata (#14051) @galipremsagar
Raise MixedTypeError when a column of mixed-dtype is being constructed (#14050) @galipremsagar
Raise NotImplementedError for MultiIndex.to_series (#14049) @galipremsagar
Ignore compile_commands.json (#14048) @harrism
Raise TypeError for any non-parseable argument in to_datetime (#14044) @mroeschke
Raise NotImplementedError for to_datetime with z format (#14037) @mroeschke
Implement sort_remaining for sort_index (#14033) @wence-
Raise NotImplementedError for Categoricals with timezones (#14032) @mroeschke
Temporary fix Parquet metadata with empty value string being ignored from writing (#14026) @ttnghia
Preserve types of scalar being returned when possible in quantile (#14014) @galipremsagar
Fix return type of MultiIndex.difference (#14009) @galipremsagar
Raise an error when timezone subtypes are encountered in pd.IntervalDtype (#14006) @galipremsagar
Fix map column can not be non-nullable for java (#14003) @res-life
Fix name selection in Index.difference and Index.intersection (#13986) @galipremsagar
Restore column type metadata with dropna to fix factorize API (#13980) @galipremsagar
Use thread_index_type to avoid out of bounds accesses in conditional joins (#13971) @vyasr
Fix MultiIndex.to_numpy to return numpy array with tuples (#13966) @galipremsagar
Use cudf::thread_index_type in get_json_object and tdigest kernels (#13962) @nvdbaranec
Fix an issue with IntervalIndex.repr when null values are present (#13958) @galipremsagar
Fix type metadata issue preservation with Column.unique (#13957) @galipremsagar
Handle Interval scalars when passed in list-like inputs to cudf.Index (#13956) @galipremsagar
Fix setting of categories order when dtype is passed to a CategoricalColumn (#13955) @galipremsagar
Handle as_index in GroupBy.apply (#13951) @brandon-b-miller
Raise error for string types in nsmallest and nlargest (#13946) @galipremsagar
Fix index of Groupby.apply results when it is performed on empty objects (#13944) @galipremsagar
Fix integer overflow in shim device_sum functions (#13943) @brandon-b-miller
Fix type mismatch in groupby reduction for empty objects (#13942) @galipremsagar
Fixed processed bytes calculation in APPLY_BOOLEAN_MASK benchmark. (#13937) @Blonck
Fix construction of Grouping objects (#13932) @galipremsagar
Fix an issue with loc when column names is MultiIndex (#13929) @galipremsagar
Fix handling of typecasting in searchsorted (#13925) @galipremsagar
Preserve index name in reindex (#13917) @galipremsagar
Use cudf::thread_index_type in cuIO to prevent overflow in row indexing (#13910) @vuule
Fix for encodings listed in the Parquet column chunk metadata (#13907) @etseidl
Use cudf::thread_index_type in concatenate.cu. (#13906) @bdice
Use cudf::thread_index_type in replace.cu. (#13905) @bdice
Add noSanitizer tag to Java reduction tests failing with sanitizer in CUDA 12 (#13904) @jlowe
Remove the internal use of the cudf's default stream in cuIO (#13903) @vuule
Use cuda-nvtx-dev CUDA 12 package. (#13901) @bdice
Use thread_index_type to avoid index overflow in grid-stride loops (#13895) @PointKernel
Fix memory access error in cudf::shift for sliced strings (#13894) @davidwendt
Raise error when trying to construct a DataFrame with mixed types (#13889) @galipremsagar
Return nan when one variable to be correlated has zero variance in JIT GroupBy Apply (#13884) @brandon-b-miller
Correctly detect the BOM mark in read_csv with compressed input (#13881) @vuule
Check for the presence of all values in MultiIndex.isin (#13879) @galipremsagar
Fix nvtext::generate_character_ngrams performance regression for longer strings (#13874) @davidwendt
Fix return type of MultiIndex.levels (#13870) @galipremsagar
Fix List's missing children metadata in JSON writer (#13869) @karthikeyann
Disable construction of Index when freq is set in pandas-compatibility mode (#13857) @galipremsagar
Fix an issue with fetching NA from a TimedeltaColumn (#13853) @galipremsagar
Simplify implementation of interval_range() and fix behaviour for floating freq (#13844) @shwina
Fix binary operations between Series and Index (#13842) @galipremsagar
Update make_lists_column_from_scalar to use make_offsets_child_column utility (#13841) @davidwendt
Fix read out of bounds in string concatenate (#13838) @pentschev
Raise error for more cases when timezone-aware data is passed to as_column (#13835) @galipremsagar
Fix any, all reduction behavior for axis=None and warn for other reductions (#13831) @galipremsagar
Raise error when trying to construct time-zone aware timestamps (#13830) @galipremsagar
Fix cuFile I/O factories (#13829) @vuule
DataFrame with namedtuples uses ._field as column names (#13824) @mroeschke
Branch 23.10 merge 23.08 (#13822) @vyasr
Return a Series from JIT GroupBy apply, rather than a DataFrame (#13820) @brandon-b-miller
No need to dlsym EnsureS3Finalized we can call it directly (#13819) @robertmaynard
Raise error when mixed types are being constructed (#13816) @galipremsagar
Fix unbounded sequence issue in DataFrame constructor (#13811) @galipremsagar
Fix Byte-Pair-Encoding usage of cuco static-map for storing merge-pairs (#13807) @davidwendt
Fix for Parquet writer when requested pages per row is smaller than fragment size (#13806) @etseidl
Remove hangs from trying to construct un-bounded sequences (#13799) @galipremsagar
Bug/update libcudf to handle arrow12 changes (#13794) @robertmaynard
Update get_arrow to arrows 12 CMake target name of arrow::xsimd (#13790) @robertmaynard
Raise error when trying to join datetime and timedelta types with other types (#13786) @galipremsagar
Fix negative unary operation for boolean type (#13780) @galipremsagar
Fix contains(in) method for Series (#13779) @galipremsagar
Fix binary operation column ordering and missing column issues (#13778) @galipremsagar
Cast only time of day to nanos to avoid an overflow in...

Contributors

robertmaynard, harrism, and 37 other contributors

Assets 2

Releases: rapidsai/cudf

[NIGHTLY] v24.08.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.04.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v24.04.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v24.02.02

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v24.02.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v24.02.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.12.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.10.02

🚨 Breaking Changes

🐛 Bug Fixes

Contributors

v23.10.00

🚨 Breaking Changes

🐛 Bug Fixes

Contributors