Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import entire EGSnrc commit history #1216

Open
14 tasks
ftessier opened this issue Nov 12, 2024 · 1 comment
Open
14 tasks

Import entire EGSnrc commit history #1216

ftessier opened this issue Nov 12, 2024 · 1 comment

Comments

@ftessier
Copy link
Member

Feature Request: salvage full project history

EGSnrc’s current git history (git log) dates back to 2015, when the project was ported to git and licensed under AGPL v3. However, EGSnrc’s version control spans back to the early 1990s across various version control systems (SCCS, RCS, CVS). Recovering this full history would help attribute code to original authors and preserve the project’s legacy.

Proposed solution

Tools exist to convert legacy version control systems to git:

After conversion, we can rebase the current repo on the legacy branch to incorporate the full history. If this significantly increases repo size, the legacy branch could be maintained separately or in a linked repository.

I’ve tested this with CVS history from July 2003 using cvs2git, which worked well.

Handling renames

Frequent file renames (e.g., in 2015, when everything was moved under an EGSnrc directory) may require git log --follow to trace the entire file history.

Author information

Legacy commit emails may no longer be valid. For accuracy, I propose preserving original emails unless we cannot match authors to current GitHub handles for code attribution.

Author consent

As the pre-2015 log was private, we should seek consent from past authors before public release.

Alternatives considered

  • A private git repo with CVS history from 2003 exists but hasn’t been public. Full history would benefit all users and developers.

  • A separate (public) repo remains an option, if the repo becomes too large for example. The current legacy repo 2003–2015 is heavy at about 270 Mb (The current EGSnrc project 2015–present is 72 Mb). Perhaps we can identify a few large binaries that can be BFG cleaned in the legacy repo?

Implementation steps

  • Convert SCCS to RCS (using sccs2rcs)
  • Convert all RCS and CVS to git (using cvs-fast-export), update authors as needed
  • Seek consent from past authors for public release
  • Resolve a couple capitalization conflicts (i6702.png and i6702a.png)
  • Move all files under a HEN_HOUSE sub-directory, resolve a couple capitalization conflicts
  • In EGSnrc repo, add a remote to the legacy repo, e.g., legacy
  • Create new branch for rebase from current master: git checkout -b main origin/master
  • Rebase main onto legacy/main, resolving conflicts with newer commits: git rebase -X theirs legacy/main
  • There are a couple file permissions on the way: simply stage -u and rebase --continue
  • Find a way to add 50-char commit titles for legacy commits!
  • Review options to reduce the repo size (BFG clean large binaries and images, etc.)
  • Preserve master branch for commit references made since 2015
  • Preserve original release tags on master branch
  • Set main as the default branch for EGSnrc going forward

Comment welcome!

@crcrewso
Copy link
Contributor

This is a significant amount of work. Did you read up on gcc's storied past with this type of work? Partial Source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants