Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README: use another dataset #4

Open
BradKML opened this issue Oct 3, 2024 · 1 comment
Open

README: use another dataset #4

BradKML opened this issue Oct 3, 2024 · 1 comment

Comments

@BradKML
Copy link

BradKML commented Oct 3, 2024

Got this error from the newest version of Scikit-Learn

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
...
[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<[https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>](https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air%3E)
@BradKML
Copy link
Author

BradKML commented Oct 3, 2024

Testing with the diabetes dataset with datasets.load_diabetes() and there are some error popping up

100%|██████████| 42/42 [00:04<00:00,  8.45it/s]
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000276 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 640
[LightGBM] [Info] Number of data points in the train set: 397, number of used features: 10
[LightGBM] [Info] Start training from score 151.722922
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
...

And if I use datasets.fetch_california_housing() then this happens instead, seems like they really want force_col_wise=true

100%|██████████| 42/42 [04:59<00:00,  7.14s/it]
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001478 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 18576, number of used features: 8
[LightGBM] [Info] Start training from score 2.063611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant