Test narwhals in CI #17884

bdice · 2025-01-31T14:23:58Z

Description

Contributes to #17662.

@MarcoGorelli provided very helpful instructions for running the narwhals test suite. We should examine and correct the failing tests. We are down to 39 failures, shown in this Kaggle notebook.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-01-31T14:24:03Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

bdice · 2025-01-31T14:25:08Z

/ok to test

bdice · 2025-01-31T14:28:13Z

/ok to test

MarcoGorelli · 2025-01-31T15:42:33Z

thanks @bdice !

import polars as pl
E ModuleNotFoundError: No module named 'polars'

there's a PR in progress to allow the test suite to be run without Polars installed, once that's in this issue should resolve itself narwhals-dev/narwhals#1896

Alternatively, you may want to install Polars so you can also run

NARWHALS_POLARS_GPU=1 pytest --constructors=polars[lazy]

?

vyasr · 2025-02-04T02:57:11Z

@bdice depending on how you want to scope this PR you can either address all of #17662 or just one of the different runs since in principle we can run the narwhals test suite either directly with cudf, with cudf.pandas, or with cudf-polars. I expect that we'll find different issues from all three.

bdice · 2025-02-04T06:13:44Z

Sounds good! I have spent very little time on this so far - just a skeleton at the moment. I will try to come up with some reasonable way to present these tests. Perhaps as three separate jobs? How do we want to handle known failures, with some kind of skip list?

bdice · 2025-02-05T17:02:30Z

/ok to test

vyasr · 2025-02-05T18:40:10Z

I think three separate tests makes sense, yes.

For how to handle failures, I would follow the model that we use for running the polars test suite of explicitly listing out tests that we deselect and why. If the list gets too long we can revisit, but I'm hopeful that the narwhals test suite is still small enough that we won't have problems at the scale that we do with the pandas test suite (and getting the narwhals test suite passing should help us with the long tail of pandas test failures).

Matt711 · 2025-02-05T18:59:36Z

I think three separate tests makes sense, yes.

For how to handle failures, I would follow the model that we use for running the polars test suite of explicitly listing out tests that we deselect and why. If the list gets too long we can revisit, but I'm hopeful that the narwhals test suite is still small enough that we won't have problems at the scale that we do with the pandas test suite (and getting the narwhals test suite passing should help us with the long tail of pandas test failures).

Are these failures flaky? If so, skipping them makes sense. If they fail deterministically, we should xfail them, right?

vyasr · 2025-02-06T02:28:49Z

Are these failures flaky? If so, skipping them makes sense. If they fail deterministically, we should xfail them, right?

Ideally yes. I don't think that is so easy to do, though. There is no way to inject per-test fails via the pytest CLI, is there? I think you would have to implement a minimal pytest plugin to programmatically add an xfail marker to the relevant tests. It's not impossible, but it seems like more work than it's worth. I could be wrong though, maybe there's an easier way.

gforsyth · 2025-02-06T21:21:05Z

Ideally yes. I don't think that is so easy to do, though. There is no way to inject per-test fails via the pytest CLI, is there? I think you would have to implement a minimal pytest plugin to programmatically add an xfail marker to the relevant tests. It's not impossible, but it seems like more work than it's worth. I could be wrong though, maybe there's an easier way.

There is no easier way to do this if the tests are defined upstream. I have a skeleton of what is required in a (very non-production-ready) repo here: https://github.com/gforsyth/pytest-salmon that I'll work into a full-fledged pytest plugin eventually, but it always ends up feeling like a bit of a hack.

MarcoGorelli · 2025-02-07T09:13:40Z

Hey all

The tests you're currently running shouldn't be flaky, we run them regularly (but not in CI due to GPU requirements) to check that they're green. We did have an error in the test suite on the main branch when CI for this PR ran, but it's since been fixed, see see https://www.kaggle.com/code/marcogorelli/testing-cudf-in-narwhals

It looks like the polars-gpu CI failures are due to the missing polars-gpu dependency?

>           raise ModuleNotFoundError(err_message) from None
E           ModuleNotFoundError: GPU engine requested, but required package 'cudf_polars' not found.
E           Please install using the command `pip install cudf-polars-cu12` (or `pip install --extra-index-url=https://pypi.nvidia.com/ cudf-polars-cu11` if your system has a CUDA 11 driver).

If you include that and re-run, my expectation is that the CI from this PR should be green 🥦

FWIW, my suggestion would be:

start running the tests for cudf and polars[gpu] in CI
for the cudf.pandas, there's still a lot of bugs related to null value handling (e.g. AssertionError: Mismatch at index 4: 6.0 != None), once the number of failures is down to say single-digits as opposed to 40, I'd say we can add some xfails for the remaining ones in Narwhals, and then consider running those in CI too?

bdice · 2025-02-07T13:52:17Z

/ok to test

bdice · 2025-02-07T17:46:25Z

@MarcoGorelli Thanks! I've been slow on this PR, just making gradual progress when I circle around to it. I think I fixed the cudf-polars installation issues and CI is passing now.

How do you envision this going forward? Obviously we want to know if there are failures that are specific to cuDF / cudf.pandas, but it may be hard to track those down if they are xfailed in the upstream. One possible solution would be to have only a minimal number of xfails for cudf in the upstream narwhals repo, for things we don't intend to fix (like iter_rows). We could manually skip any other failing tests in our narwhals CI, so that we have a short and hardcoded list of tests that we need to fix.

A few xfails I see that are specific to cudf -- just trying to sample through what I see, to get a sense of what we might want to do:

Pivot issue here should be fixed now that Support pivot with index or column arguments as lists #17373 is in: https://github.com/narwhals-dev/narwhals/blob/20d52d7ac554fe7d903394939818f162184d2a0c/tests/frame/pivot_test.py#L158-L161
iter_rows is expected to fail, so we should leave this xfail in place: https://github.com/narwhals-dev/narwhals/blob/20d52d7ac554fe7d903394939818f162184d2a0c/tests/frame/rows_test.py#L38-L39
Join duplicate columns xfail here seems like Raise NotImplementedError if .merge(suffixes=) introduces duplicate labels #17905 but maybe it's different? https://github.com/narwhals-dev/narwhals/blob/20d52d7ac554fe7d903394939818f162184d2a0c/tests/frame/join_test.py#L675-L677
There are several more but I'll stop here for now.

MarcoGorelli · 2025-02-07T18:02:15Z

Great!

We've got an issue about unifying test failures narwhals-dev/narwhals#1893, and would like to only use xfail for "it currently fails but in theory should really be passing"
In addition to that, for every xfail, there should be an associated open issue in the relevant repo

So, as we move forwards, we can open more issues, report them here, and gradually remove the xfails as they get addressed (or turn them into pytest.raises contexts for ones aren't intended to be supported)

bdice · 2025-02-07T18:07:51Z

In addition to that, for every xfail, there should be an associated open issue in the relevant repo

This is great! Filing issues to cudf is a good strategy for tracking problems going forward -- and we appreciate your willingness to do that.

So, as we move forwards, we can open more issues, report them here, and gradually remove the xfails as they get addressed (or turn them into pytest.raises contexts for ones aren't intended to be supported)

I agree that pytest.raises is the right answer for some of these, like iter_rows. Thanks for your help!

Matt711

Thanks @bdice! I'll focus on running the test with cudf.pandas in a follow-up PR, but this is a great starting point.

Test narwhals in CI

34b0268

github-actions bot assigned bdice Jan 31, 2025

bdice changed the base branch from branch-25.02 to branch-25.04 January 31, 2025 14:24

Add narwhals-tests to pr-builder.

554428c

github-actions bot removed the Python Affects Python cuDF API. label Jan 31, 2025

bdice added 2 commits February 3, 2025 19:29

Add polars.

6e3dd37

Merge branch 'branch-25.04' into test-narwhals

4b09019

Merge branch 'branch-25.04' into test-narwhals

190a244

bdice added 2 commits February 7, 2025 07:50

Install cudf-polars properly.

fbddda2

Merge remote-tracking branch 'upstream/branch-25.04' into test-narwhals

ee06f1f

bdice marked this pull request as ready for review February 7, 2025 17:46

bdice requested review from a team as code owners February 7, 2025 17:46

bdice requested a review from gforsyth February 7, 2025 17:46

bdice added 2 commits February 7, 2025 10:09

Merge branch 'branch-25.04' into test-narwhals

7b6d824

Update branch

6702bce

Matt711 approved these changes Feb 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test narwhals in CI #17884

Test narwhals in CI #17884

bdice commented Jan 31, 2025 •

edited by Matt711

Loading

copy-pr-bot bot commented Jan 31, 2025

bdice commented Jan 31, 2025

bdice commented Jan 31, 2025

MarcoGorelli commented Jan 31, 2025

vyasr commented Feb 4, 2025

bdice commented Feb 4, 2025

bdice commented Feb 5, 2025

vyasr commented Feb 5, 2025

Matt711 commented Feb 5, 2025

vyasr commented Feb 6, 2025

gforsyth commented Feb 6, 2025

MarcoGorelli commented Feb 7, 2025

bdice commented Feb 7, 2025

bdice commented Feb 7, 2025

MarcoGorelli commented Feb 7, 2025

bdice commented Feb 7, 2025

Matt711 left a comment •

edited

Loading

Test narwhals in CI #17884

Are you sure you want to change the base?

Test narwhals in CI #17884

Conversation

bdice commented Jan 31, 2025 • edited by Matt711 Loading

Description

Checklist

copy-pr-bot bot commented Jan 31, 2025

bdice commented Jan 31, 2025

bdice commented Jan 31, 2025

MarcoGorelli commented Jan 31, 2025

vyasr commented Feb 4, 2025

bdice commented Feb 4, 2025

bdice commented Feb 5, 2025

vyasr commented Feb 5, 2025

Matt711 commented Feb 5, 2025

vyasr commented Feb 6, 2025

gforsyth commented Feb 6, 2025

MarcoGorelli commented Feb 7, 2025

bdice commented Feb 7, 2025

bdice commented Feb 7, 2025

MarcoGorelli commented Feb 7, 2025

bdice commented Feb 7, 2025

Matt711 left a comment • edited Loading

Choose a reason for hiding this comment

bdice commented Jan 31, 2025 •

edited by Matt711

Loading

Matt711 left a comment •

edited

Loading