Suggestion for OSS PR workflow: Rebasing instead of Merging?

wookayin · November 10, 2021, 6:13am

Hello TensorFlow team,

I just got to know this forum after the TF contributor summit which was held today. I would like to appreciate the organizers and all the contributors for their such a great effort to keep the TF community growing.

I have a request or suggestion for the entire TensorFlow team’s workflow, or more broadly, for the majority of all the Google OSS projects. I wasn’t sure whom or how should I reach out to about this, but it looks like this is the perfect place to make the suggestion regarding general development workflow.

TL;DR: Can we have pull requests from OSS contributors checked in by “rebasing” or kind of “squashing” rather than “merging” (i.e. creating a merge commit), so that the git graph doesn’t look intimidating?

For example, if you clone the tensorflow repo and look at the git commit tree, you can see a large number of Kraken-like branches and parallel lines:

This happened due to merge commits for PRs: the main branch changes so fast, so the merge-base commit can be hundreds of commits ago when the PR gets checked in, unless rebased. In my opinion, this makes reading and following the commit history very difficult.

I can see why this might happen: creating a merge commit is the default behavior in merging PRs in Github, and in Google’s internal VCS (the primary repository to host the code) everything gets “synchronized” and linearized but the workflow hasn’t cared as much how the synchronized git repository will look like.

However, it would be really great if the git history could be maintained as linear and clean as possible, which I think will be helpful for OSS contributors. Most of PRs would consist of one or few commits, so I think there is no much reason of retaining their base commit unless really needed (for example, some branch had been long-lived when consisting of many big commits). Google’s internal VCS (Piper) maintains also all the changesets in a linear fashion, so it would be a reasonable choice for the OSS repositories as well. That being said, I also understand that “merging” (rather than rebasing) often might be a better choice when merging a PR, but in most of the case rebasing or squashing would work better as a default choice.

Any opinions, second-thoughts, discussion about the OSS workflow will be greatly appreciated.

Thank you!

Best,
Jongwook

mihaimaruseac · November 10, 2021, 2:24pm

Hi.

In general, I agree that rebasing is better. However, this does not really work here. Internally we use a different version control software that can hold code from all of Google. It is similar to perforce.

This entails that every internal change must be atomic, just on single version number change. That is, each PR must correspond to just one single item in history. So, merge commits are the only solution here.

wookayin · November 10, 2021, 3:34pm

Hello, thanks for your message!

Yes I can see how the internal VCS works. To ensure every change to be atomic (i.e., single CL or single reivision), the PR can be still “squashed” (if it consists of multiple commits) or rebased (if it has only one commit). So I don’t think merge commits is the only option, but think the “squash and merge” option is what exactly corresponds to how changesets would get merged internally. Maybe I should’ve said ‘squash merging’ instead of rebasing in this thread.

Thanks,

Bhack · November 10, 2021, 3:43pm

Do you mean something like this?

https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/configuring-commit-squashing-for-pull-requests

wookayin · November 10, 2021, 3:45pm

Yes exactly – the squash merging.

Bhack · November 10, 2021, 11:51pm

@mihaimaruseac What do you think?

mihaimaruseac · November 11, 2021, 12:08am

That could be possible and will actually help with the cherrypicks. Need to see if copybara supports this though

wookayin · December 3, 2021, 11:02pm

Any updates since then? Thank you!