Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the number of projects we can identify for good first issues #78

Open
Ly0n opened this issue Nov 29, 2023 · 9 comments
Open

Increase the number of projects we can identify for good first issues #78

Ly0n opened this issue Nov 29, 2023 · 9 comments

Comments

@Ly0n
Copy link

Ly0n commented Nov 29, 2023

Here some thoughts to increaste the number of projects / issues we discover:

  1. Include other projects from the same namespace. Since many name ranges are used for many general projects outside the area of sustainability, this is more complex. We would first have to filter the namespaces that have a clear reference to sustainability.

  2. Increase the time frame to last_updated in the last two years. I think that's an easy and valid way to go. Open Source Software is often an slow staty process and older issuer are often still relevant over years.

  3. Add popular dependencies of projects. We could create a list of the X most popular / highly used first level dependencies to the list of projects we investigate. We could still use the sustainability category of the main projects so that a dependency get's various use case labels.

  4. Increase the total number of projects with automatic discovery based on NLP.

@andrew @Codeshark-NET

@andrew
Copy link
Member

andrew commented Nov 29, 2023

I've increased the time frame to two years now

@Ly0n
Copy link
Author

Ly0n commented Dec 6, 2023

Another way to identify projects that welcome external developers is by looking for projects with a contribution guide. There are over 400 projects on OpenSustain.tech with a contribution guide. However, such a filter would be rather crude. Many projects have a contribution guide more for their own community and thus express less interest in welcoming external developers into the project.

@Ly0n
Copy link
Author

Ly0n commented Dec 25, 2023

Here some repository topics to increase the numbers of projects that are welcoming developers:

  • Hacktoberfest
  • citizen science and citizenscience

Some projects use beginner* issue labels instead of Good First Issue: https://github.com/Growstuff/growstuff/issues

@andrew
Copy link
Member

andrew commented Dec 25, 2023

We don't currently use repository topics for filtering repositories, just if they have at least one open issue for one of the following labels: "open climate action", 'help wanted', 'good first issue', 'Good First Issue','hacktoberfest', 'Hacktoberfest', 'good-first-issue'

@Ly0n here's a list of all the labels used across all the OST projects issues, both open and closed (lowercased to remove duplicates, count > 1): https://gist.github.com/andrew/89cabfe6adcbe136ae8cddc54f928e67

Here's a list of the labels from currently open issues: https://gist.github.com/andrew/d0369c3e1264bca51fc0c03b897a2ce0

If you want to give me a list of strings to add I can deploy that quickly.

Or if you want to start adding in any open issue from repositories with certain topics we can investigate that after xmas.

@andrew
Copy link
Member

andrew commented Dec 25, 2023

And here's a list of all project keywords (again lowercased and count > 1): https://gist.github.com/andrew/14f700b3a6e42086b778cacd60690b77

Project keywords is made up of repository topics and package manager keywords for each project.

@Ly0n
Copy link
Author

Ly0n commented Jan 25, 2024

I found another relevant namespace that is using no standard Good First Issues:
good first issue ❤️

Here a better regex that should be able to match all the issues:
https://regex101.com/r/unD3xH/3

@andrew
Copy link
Member

andrew commented Jan 25, 2024

The query is done as an exact match in sql rather than a regex (over 200k issues in the database!) for performance, so I'll pull a list of all labels that contain good and use that:

[":beginner: good first issue",
 "Good First Issue",
 "Good as first PR",
 "Good first issue",
 "Misc: good first issue",
 "contrib-good-first-issue",
 "first-good-issue",
 "good first contribution",
 "good first issue",
 "good first issue :heart:",
 "good first issue :star:",
 "good first issue 🐤",
 "good first issue 🐾",
 "good first review",
 "good for beginners",
 "good for new contributors",
 "good-first-issue",
 "good_for_beginners",
 "i-good-first-issue",
 "issue type: good first issue",
 "status --- good first issue",
 "status: good first issue",
 "🏁 Good first issue",
 "🏄‍♂️ good first issue"]

plus a similar list for /help/:

[":open_hands: help wanted",
 "Help Text",
 "Help Wanted",
 "Help needed",
 "Help wanted",
 "contrib-help-wanted",
 "help",
 "help needed",
 "help wanted",
 "help wanted 🆘",
 "help wanted 🖐",
 "help wanted 🦮",
 "help wanted!",
 "help-wanted",
 "question & help wanted",
 "status --- help wanted :heart:",
 "status: help wanted",
 "status: needs help",
 "tag:help-wanted",
 "🛟 help wanted"]

@andrew
Copy link
Member

andrew commented Jan 25, 2024

Added in a0fb5c5, the result is 18 more repos showing up in climatetriage (from 246 to 264)

@Ly0n
Copy link
Author

Ly0n commented Jan 25, 2024

Beautiful! Thanks @andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants