Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to specifying position using position_column parameter #6825

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

NProkoptsev
Copy link

@NProkoptsev NProkoptsev commented Feb 15, 2025

#5929 introduces a method for treatment of position bias, but it had limited options of specifying the position column (.position file or numpy array), uou can't have position column as a part of your train file.
In contrast, you can specify queries and weights with arguments "query_column" and "weight_column", this will make LightGBM load these columns from the train file directly.

This PR adds support for loading position column from file.

  • Adding support for loading positions as a column in train file.
  • Removed std::vectorstd::string position_ids_. It was used before to hold a list of unique positions. Now, number of unique positions is "max(positions) + 1".
  • Refactored loading positions in Metadata::SetPosition

@jameslamb
Copy link
Collaborator

Thanks for your interest in LightGBM.

To set some expectations for how things work here:

  • until you take this PR out of draft or explicitly ask for a review, we won't review it
  • changes this large should be accompanied by new tests thoroughly covering the new behavior (not just modifying existing tests just enough to make them pass)
  • you will have to make a strong case for why this is valuable, given that what you're proposing is an ABI-breaking change

@NProkoptsev
Copy link
Author

@microsoft-github-policy-service agree

@NProkoptsev
Copy link
Author

@jameslamb can you take a look at this PR?

I added test, but it currently breaks for some gpu builds (perhaps because gpu is not deterministic?)
The goal of this PR is to reach parity in how positions can be loaded. Other columns(query and weights) can be specified by column id or column name.

@jameslamb jameslamb marked this pull request as ready for review February 27, 2025 20:23
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I took it of draft since you said it's ready for review.

Before I help with the build errors, I'd like @metpavel or @shiyu1994 to look and give you some feedback on the design. This is a significant ABI-breaking change, I'd like them to comment on:

  • whether this functionality is worth it
  • whether this could be done without breaking LightGBM's ABI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants