Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test narwhals in CI #17884

Open
wants to merge 9 commits into
base: branch-25.04
Choose a base branch
from
Open

Test narwhals in CI #17884

wants to merge 9 commits into from

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jan 31, 2025

Description

Contributes to #17662.

@MarcoGorelli provided very helpful instructions for running the narwhals test suite. We should examine and correct the failing tests. We are down to 39 failures, shown in this Kaggle notebook.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jan 31, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Jan 31, 2025
@bdice bdice changed the base branch from branch-25.02 to branch-25.04 January 31, 2025 14:24
@bdice bdice added feature request New feature or request non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Jan 31, 2025
@bdice
Copy link
Contributor Author

bdice commented Jan 31, 2025

/ok to test

@bdice
Copy link
Contributor Author

bdice commented Jan 31, 2025

/ok to test

@github-actions github-actions bot removed the Python Affects Python cuDF API. label Jan 31, 2025
@MarcoGorelli
Copy link
Contributor

thanks @bdice !

import polars as pl

E ModuleNotFoundError: No module named 'polars'

there's a PR in progress to allow the test suite to be run without Polars installed, once that's in this issue should resolve itself narwhals-dev/narwhals#1896

Alternatively, you may want to install Polars so you can also run

NARWHALS_POLARS_GPU=1 pytest --constructors=polars[lazy]

?

@vyasr
Copy link
Contributor

vyasr commented Feb 4, 2025

@bdice depending on how you want to scope this PR you can either address all of #17662 or just one of the different runs since in principle we can run the narwhals test suite either directly with cudf, with cudf.pandas, or with cudf-polars. I expect that we'll find different issues from all three.

@bdice
Copy link
Contributor Author

bdice commented Feb 4, 2025

Sounds good! I have spent very little time on this so far - just a skeleton at the moment. I will try to come up with some reasonable way to present these tests. Perhaps as three separate jobs? How do we want to handle known failures, with some kind of skip list?

@bdice
Copy link
Contributor Author

bdice commented Feb 5, 2025

/ok to test

@vyasr
Copy link
Contributor

vyasr commented Feb 5, 2025

I think three separate tests makes sense, yes.

For how to handle failures, I would follow the model that we use for running the polars test suite of explicitly listing out tests that we deselect and why. If the list gets too long we can revisit, but I'm hopeful that the narwhals test suite is still small enough that we won't have problems at the scale that we do with the pandas test suite (and getting the narwhals test suite passing should help us with the long tail of pandas test failures).

@Matt711
Copy link
Contributor

Matt711 commented Feb 5, 2025

I think three separate tests makes sense, yes.

For how to handle failures, I would follow the model that we use for running the polars test suite of explicitly listing out tests that we deselect and why. If the list gets too long we can revisit, but I'm hopeful that the narwhals test suite is still small enough that we won't have problems at the scale that we do with the pandas test suite (and getting the narwhals test suite passing should help us with the long tail of pandas test failures).

Are these failures flaky? If so, skipping them makes sense. If they fail deterministically, we should xfail them, right?

@vyasr
Copy link
Contributor

vyasr commented Feb 6, 2025

Are these failures flaky? If so, skipping them makes sense. If they fail deterministically, we should xfail them, right?

Ideally yes. I don't think that is so easy to do, though. There is no way to inject per-test fails via the pytest CLI, is there? I think you would have to implement a minimal pytest plugin to programmatically add an xfail marker to the relevant tests. It's not impossible, but it seems like more work than it's worth. I could be wrong though, maybe there's an easier way.

@gforsyth
Copy link
Contributor

gforsyth commented Feb 6, 2025

Ideally yes. I don't think that is so easy to do, though. There is no way to inject per-test fails via the pytest CLI, is there? I think you would have to implement a minimal pytest plugin to programmatically add an xfail marker to the relevant tests. It's not impossible, but it seems like more work than it's worth. I could be wrong though, maybe there's an easier way.

There is no easier way to do this if the tests are defined upstream. I have a skeleton of what is required in a (very non-production-ready) repo here: https://github.com/gforsyth/pytest-salmon that I'll work into a full-fledged pytest plugin eventually, but it always ends up feeling like a bit of a hack.

@MarcoGorelli
Copy link
Contributor

Hey all

The tests you're currently running shouldn't be flaky, we run them regularly (but not in CI due to GPU requirements) to check that they're green. We did have an error in the test suite on the main branch when CI for this PR ran, but it's since been fixed, see see https://www.kaggle.com/code/marcogorelli/testing-cudf-in-narwhals

It looks like the polars-gpu CI failures are due to the missing polars-gpu dependency?

>           raise ModuleNotFoundError(err_message) from None
E           ModuleNotFoundError: GPU engine requested, but required package 'cudf_polars' not found.
E           Please install using the command `pip install cudf-polars-cu12` (or `pip install --extra-index-url=https://pypi.nvidia.com/ cudf-polars-cu11` if your system has a CUDA 11 driver).

If you include that and re-run, my expectation is that the CI from this PR should be green 🥦

FWIW, my suggestion would be:

  • start running the tests for cudf and polars[gpu] in CI
  • for the cudf.pandas, there's still a lot of bugs related to null value handling (e.g. AssertionError: Mismatch at index 4: 6.0 != None), once the number of failures is down to say single-digits as opposed to 40, I'd say we can add some xfails for the remaining ones in Narwhals, and then consider running those in CI too?

@bdice
Copy link
Contributor Author

bdice commented Feb 7, 2025

/ok to test

@bdice
Copy link
Contributor Author

bdice commented Feb 7, 2025

@MarcoGorelli Thanks! I've been slow on this PR, just making gradual progress when I circle around to it. I think I fixed the cudf-polars installation issues and CI is passing now.

How do you envision this going forward? Obviously we want to know if there are failures that are specific to cuDF / cudf.pandas, but it may be hard to track those down if they are xfailed in the upstream. One possible solution would be to have only a minimal number of xfails for cudf in the upstream narwhals repo, for things we don't intend to fix (like iter_rows). We could manually skip any other failing tests in our narwhals CI, so that we have a short and hardcoded list of tests that we need to fix.

A few xfails I see that are specific to cudf -- just trying to sample through what I see, to get a sense of what we might want to do:

@bdice bdice marked this pull request as ready for review February 7, 2025 17:46
@bdice bdice requested review from a team as code owners February 7, 2025 17:46
@bdice bdice requested a review from gforsyth February 7, 2025 17:46
@MarcoGorelli
Copy link
Contributor

Great!

We've got an issue about unifying test failures narwhals-dev/narwhals#1893, and would like to only use xfail for "it currently fails but in theory should really be passing"
In addition to that, for every xfail, there should be an associated open issue in the relevant repo

So, as we move forwards, we can open more issues, report them here, and gradually remove the xfails as they get addressed (or turn them into pytest.raises contexts for ones aren't intended to be supported)

@bdice
Copy link
Contributor Author

bdice commented Feb 7, 2025

In addition to that, for every xfail, there should be an associated open issue in the relevant repo

This is great! Filing issues to cudf is a good strategy for tracking problems going forward -- and we appreciate your willingness to do that.

So, as we move forwards, we can open more issues, report them here, and gradually remove the xfails as they get addressed (or turn them into pytest.raises contexts for ones aren't intended to be supported)

I agree that pytest.raises is the right answer for some of these, like iter_rows. Thanks for your help!

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bdice! I'll focus on running the test with cudf.pandas in a follow-up PR, but this is a great starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants