-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbranching-strategies.qmd
246 lines (177 loc) · 12.2 KB
/
branching-strategies.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# Branching Strategies
OK, so we've covered quite a lot of ground here, and this should be enough to get you started with Git.
But there's one more helpful feature of Git I'd like to talk about: **branching**.
As mentioned [earlier](how-git-works.qmd#branches), **branches** are a way to keep track of different versions of your code. They're also a way to keep your code organized and to keep your work separate from other people's work.
For the purposes of this workshop, we're just going to look at how we can use **branching** to avoid making breaking changes to the code that others see and are basing our work on (that includes ourselves, too!).
## When To Branch
There are a couple of different philosophies on when to **branch**, but the one that's now encouraged in software development is to make small, discrete **branches** for each feature, and regularly **merge** them back into the **main** branch.
It is commonly referred to as the [GitHub Flow](https://guides.github.com/introduction/flow/) - see [this video](https://www.youtube.com/watch?v=GgjIvUrOpmg) for an example using the command line instructions.
What you generally want to avoid are long-lived **branches** that are worked on for a long time, because they can get out of sync with the **main** branch and become difficult to merge back in, which was traditionally the workflow that was advocated for by the ["git flow" philosophy](https://nvie.com/posts/a-successful-git-branching-model/).
Fortunately for us, the short-lived **branch** workflow is well-suited to the pace and development style of scientific software.
We could just code directly on the **main** branch, but if we make a mistake and want to roll back changes, this could have a big impact on the work of others.
Instead, we can create a new **branch** for this feature, and then **merge** it back into the **main** branch when we're done.
This allows others to keep working off the code on **main** while we work on our new feature in isolation.
When our code is working as we want it, we can **merge** it back into the **main** branch, and they will then be able to access it.
Let's go back to our SIR model example and see how we can use **branching**.
## Branching Example
### Creating a Branch
Let's say that we want to add some code that calculates the final size of the epidemic for a range of transmission parameters (i.e., varying $R_0$ to adjust $\beta$).
Because our desired feature is a nice discrete piece of work that isn't dependent on other work, we can make a **branch** for it.
Importantly, it's a relatively small and self-contained feature so it shouldn't proliferate into a long-lived **branch** as we work on it i.e., we shouldn't end up in a situation where there is a never-ending list of tasks to perform before we can consider the **feature** complete.
To create a new **branch**, we navigate to GitKraken and make sure our **local** machine is up to date with the **remote** repository i.e., perform a **pull** and resolve any conflicts.
Making sure we're on the **main** branch (we should be as we haven't created any branches yet), we can create a new **branch** by clicking on the *"Branch"* button in the middle of the GitKraken toolbar, and then typing in a branch name.

After entering a name, a new **branch** will be created, and we can see that we're now on the new **branch**, **final-size-calc**.
Hovering our mouse of the branch icon, we can see that both **main** and **final-size-calc** are listed, and we can switch between them by double-clicking on the branch we want to switch to.
This also shows that both **branches** are currently at the same **commit**, as indicated by the horizontal marker pointing to the same **commit**.

Now we're ready to start adding code.
<details>
<summary>Terminal Commands</summary>
<p>
```bash
# checkout changes the branch
# -b creates a new branch with the given name
git checkout -b final-size-calc
```
</p>
</details>
### Adding Code
Add the following R code to the end of the ***sir_model.R*** / ***sir_model.py*** file:
<details>
<summary>
R Code
</summary>
<p>
```r
# Candidate values for R0 and beta
R0 <- seq(0.1, 5, length = 50)
betas <- R0 * 1/2
# Calculate proportion infected for each value of R0
# map2_dfr is a {purrr} function that applies a function to two vectors i.e., it is a vectorized version of a for loop, and returns a data frame
final_size_df <- map2_dfr(
.x = betas,
.y = R0,
.f = function(.x, .y) {
equil <- runsteady(
y = c(S = 1 - 1E-5, I = 1E-5, R = 0),
times = c(0, 1E5),
func = sirmod,
parms = c(mu = 0, N = 1, beta = .x, gamma = 1/2)
)
tibble(
R0 = .y,
final_size = equil$y["R"]
)
}
)
ggplot(final_size_df, aes(x = R0, y = final_size)) +
geom_line(linewidth = 2) +
labs(x = "R0", y = "Final size")
```
</p>
</details>
<details>
<summary>
Python Code
</summary>
<p>
```python
# Candidate values for R0 and beta
R0 = np.linspace(0.1, 5, 50)
betas = R0 * 1 / 2
solve_ivp(
sirmod, [tmin, 1e5], start, args=parms
).y[2, -1]
final_size_df = pd.DataFrame({"R0": R0, "final_size": np.zeros(len(R0))})
for (index, beta) in enumerate(betas):
p = (beta, mu, gamma, N)
final_size_df.final_size[index] = solve_ivp(sirmod, [tmin, 1e5], start, args=p).y[2, -1]
(
ggplot(final_size_df, aes(x='R0', y='final_size'))
+ geom_line(size=2)
+ labs(x='R0', y='Final size')
)
```
</p>
</details>
After **staging** and **committing** the changes, we'll see that our **final-size-calc** branch is now ahead of the **main** branch by one **commit**.
We'll also notice that **final-size-calc** only exists locally, and not on the **remote** repository.

This is expected as we haven't **pushed** the **final-size-calc** branch to the **remote** repository yet, so GitHub doesn't know about it.
So let's **push** the **commit**.
GitKraken will immediately ask us to name the **remote** branch.
Click *"Submit"* with the default to avoid any confusion that may be caused by using a different name **locally** and **remotely** for the same **branch**.

We'll now see that both our **local** machine and GitHub are pointing to the same **commit** for the **final-size-calc** branch.
<details>
<summary>Terminal Commands</summary>
<p>
```bash
# Stage and commit the changes
git add .
git commit -m "Add final size calculation code"
# Show git tree
git log --graph --decorate --oneline --all
# Push the commit to the remote repository
# -u sets the upstream branch to the remote branch of the same name
# origin is the name of your remote (the default name)
# final-size-calc is the name of the local branch you want to push
git push -u origin final-size-calc
```
</p>
</details>
### Merging the Branch
Now we've added code and **pushed** it to GitHub, we can keep going and add more code to the **final-size-calc** branch.
But in this case, we're done, so we can **merge** the **final-size-calc** branch back into the **main** branch.
You can do this on your **local** machine, but let's do it on GitHub with a **pull request**, as this is more appropriate for collaborative work, and doesn't require any additional steps when we're working on our own, so is a better default workflow.
Navigating to GitHub, we see that **final-size-calc** has recent **pushes** and GitHub prompts us to start a **pull request**.
Go ahead and click *"Compare & pull request"*.

Doing so, we are presented with a lot of information.
The key points are that there are two buttons at the top of the page that allow you to select which **branches** are being compared in the **pull request**, i.e., you want to **merge** together.
The **base** branch is the **branch** that you want to **merge** into, and the **compare** branch is the **branch** that you want to **merge** from.
So in this example, our changes happened on the **final-size-calc** (compare) branch, and we want to **merge** them into the **main** (base) branch.
Because there are no conflicting changes, e.g., changes to the same lines of code, GitHub can automatically **merge** the **branches** for us, hence the "Able to merge" message.

As we only made one **commit**, the **pull request** message defaults to the **commit** message.
In instances where you've made multiple **commits**, you can edit the **pull request** message to provide a more detailed description of the changes you've made.
On the right hand side of the **pull request** page, you will see the option to add reviewers, assignees etc.
This is useful for collaborative work where you want collaborators to check your work for correctness, and is a good habit to get into.
You can also use [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects) to keep track of your work and upcoming tasks, so linking to a project can help you visualize the work you're doing and how it fits into the bigger picture.
After clicking *"Create pull request"*, GitHub will go through some checks to make sure that the **branches** can be **merged**, and notify any reviewers that you've added.
It uses [GitHub Actions](https://docs.github.com/en/actions) to do this, which is a powerful tool for automating tasks, and worth exploring in more detail when you're comfortable with Git and GitHub.
Assuming everything checks out, you can click *"Merge pull request"* to **merge** the **branches**.

::: {.callout-note}
Clicking on the drop-down menu of the *"Merge pull request"* button will allow you to choose how to **merge** the **branches**.
The default retains all of the **commits** in the comparison **branch** separate, which is useful if you want to keep a record of all the changes you made.
*"Squash and merge"* will combine all of the **commits** on the comparison **branch** into a single **commit**, which is useful if you want to don't need to keep a record of exactly how you ended up with the final version of the new code, and just want to show that you added the feature.
*"Rebase and merge"* will add all the **commits** from the comparison **branch** to the end of the **base** branch, which is useful the files modified in the **pull request** are in conflict with changes in the **base** branch.
You can find more information about the different **pull request merge** methods [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#squash-and-merge-your-pull-request-commits).
:::
### Cleaning Up Our Repository
After successfully **merging** the **branches**, we can delete the **final-size-calc** branch as it is no longer needed.
Don't worry, you still have your full Git history, and the branch can always be restored if required, but this is a good way to keep your repository tidy and avoid accidentally making changes to an old **branch**.
To delete the **remote** branch, just click on the *"Delete branch"* button that appears after merging the **pull request**.
At this point, we can head back to GitKraken.
One really nice feature of GitKraken is that it's easy to see what we've done as it shows us the Git tree.
This is something that you can't do in the command line, or the GitHub Desktop application, and is a really useful way to keep track of your work, particularly if you have a lot of, or long-lived, **branches** (which you know you shouldn't!).
Here, we can see that there is now only one **branch remotely**, but two **locally**.

If we double-click on the **main local branch**, we will **checkout** that branch, i.e., we will switch to it.
We then **pull** the changes from the **remote** and **main** will now be up to date on our **local** machine.
If we now right click on the **final-size-calc local branch**, we can delete it by selecting the option *"Delete final-size-calc"*.
We will now see the updated Git history.

<details>
<summary>Terminal Commands</summary>
<p>
```bash
# Delete the remote branch
git push -d origin final-size-calc
# Delete the local branch
git branch -d final-size-calc
```
</p>
</details>