Version control using Git: What if MS Word track changes was 1,000 times better?
A Track Changes Example
If you have done any work at all in Microsoft Word, you've probably run across track changes. Track changes is a feature that marks up a Word document as you edit it--as the name suggests, if you turn on track changes, MS Word "tracks" the changes you have made to the document. Delete something? It gets crossed out with a red line. Add something? It shows up in a blue font. Track changes is useful if you are working with one or more person on a document. Say your co-worker, Jane does the first draft and emails it to you; you open it up in Word, turn on track changes, and edit it, then send the document back to her which now includes your markup. Using track changes, she can scroll through your changes to the draft and with a click of the mouse decide which changes to keep and which changes to reject. Sounds great, right? In theory it is, but (besides being slightly buggy and leaving a whole ton of metadata (hidden stuff) in your Word document that you might not want someone else to see later, especially if you are a lawyer like I was sending drafts to opposing counsel) track changes is also completely inefficient and does not allow you to keep track of prior versions of the draft.
What if, instead of you and Jane passing drafts back and forth, after Jane is done with the initial draft you could both work on it at the same time? Even though Jane is in charge of the master copy, if you send your revisions off to Jane, you don’t have to wait for her to response if you think about another change you want to make. If Jane has sent a draft off to you, she doesn’t have to wait around for your markup until she can make revisions to her master draft. And what if the two of you decided, on draft number 35, that you were completely going down the wrong path and you wanted to go back and use draft number 22?
As far as I know, the above is impossible in track changes. If two or more people try to work on the same document at once, their changes will conflict and it would be a major headache to try to resolve the changes among different versions of the document. You would have to sit there with both versions opened up in word and compare the changes, noting any conflicts. It would be tedious and slow. If you used Git instead of track changes, you and Jane would be able to edit the same document at the same time and effortlessly (almost!) merge each of your changes into a single master document, resolving conflicts and going back to prior drafts if you needed to.
So what is Git?
Git is a version control system, or VCS for short. It is software created by Linus Torvalds for use in the continuing development of an open source operating system called Linux (more on him here). There is more than one VCS out there, but Git is an extremely popular VCS because it is fast, reliable, free, open source (anyone can see and use the code that runs Git), and pretty easy to use. You can run Git locally on your own computer, and you can also use Git in conjunction with Github, a website that is also very popular, especially with the open source community. More on Github below.
To use Git locally on your computer, you create a folder. Let’s call it “drafts”. In your drafts folder, you save the first version of the letter Jane sent you (and let’s assume that Jane also has a folder called drafts on her computer, with the letter saved). Assuming you have Git installed on your computer, you would initialize the drafts folder as a Git repository (think of a repository as just a fancy name for a folder). Initializing the folder with Git means that the contents of the folder will be able to utilize the Git VCS. You would then make any changes you wanted to the letter and save them. If you were done making changes, you would add your draft of the letter to the Git “staging” area (think of it this way: your computer is a local airport, the letter is an airplane, and you’ve just closed the doors, no one else is coming on board), and then you would “commit” the changes and include a brief message explaining what changes you made (committing is sort of just the Git way of saying you are saving your changes, or, to continue our airplane metaphor, you’re version of the airplane/letter has pulled out of the gate and you have radioed the control tower to let them you are ready for takeoff).
Github: pulling (and pushing) it all together
If you and Jane were particularly computer savvy, or worked at a place that was, you might have a server in another room somewhere in your office building that stores all your and Jane’s Git initialized folders/repositories so you can each access them from other computers. Let’s assume you don’t. This is where Github comes in. Github is free (assuming you don’t mind your repository being public). Github is easy to use. Github is your destination airport. Once you have committed your changes and are taxing down the runway, you can use Git to push your changes up to a remote repository at Github (a “remote” is a repository that isn’t the repository/folder on your computer; in this case we are assuming the remote is located on a server at Github). Your changes then land at the remote repository in Github-land. Now you and Jane have access to your changes–assuming Jane has Git initialized the drafts folder/repository on her computer, she can use Git to fetch your changes from the Github remote repository–she can, if you will, send her own private jet off to the remote repository in Github-land to bring your changes back to the local repository/folder on her computer. Jane can then review your changes, including your commit message which explains why you have made the changes you made, and accept or reject them and make her own changes. She can then take the draft and add it to the staging area, commit it, and push it to the remote repository on Github, which is where the master draft of the letter now resides.
But wait, there’s more! Backups, Merging and Teams
A side effect of all this pushing and pulling to a remote repository in Github-land is that if your office building burns down and melts your computer and Jane’s computer, you have a backup of your draft safely stored on Github. This is one benefit of Github, but it isn’t the primary benefit (and you are backing up the entire contents of your computer to a remote server anyway, right? RIGHT?). What Github does best is manage versions and changes to files. Let’s say instead of you and Jane playing ping-pong with the draft (you edit it and push your draft up to Github, she pulls it down and edits it and pushes it back up, repeat), you both are working on the draft at the same time. You push your changes up, and Jane pushes her changes up, and uh-oh, you both edited the same sentence! Git will let you know. It will compare your changes and Jane’s changes and you can decide (either mutually, or if Jane is in control of the master draft, Jane can decide) which set to keep, again with a couple keystrokes. You could also look at the changes and decide that no, neither set of changes looks right, and with a few more changes go back to an earlier draft, making that the master draft. Now imagine instead of just you and Jane, it is a team of 50 people working on a 500 page book at once. Imagine the nightmare if those 500 people were working in track changes on MS Word. Git would make such a project like that manageable and efficient. You don’t have to sit around waiting for the other 499 people to finish, everyone can be working on different parts (or even the same part) of the same document at once, and you and your organization save a ton of time this way.
Git’s On (groan…)
In the real world people do use Git and Github as their VCS for drafting documents, poetry, even books (here is a whole novel on Github). But where Git sees the most action is in the world of code: software developers use Git many times a day, and there are both public (for open source projects where you are actively seeking public input, or for projects where you just don’t care if someone sees your code) and private (including by companies who produce software for profit) repositories containing millions of lines of code. If the repository is public or you have access to a private repository, you can even “fork” (that means make a copy of) the repository to your own computer and make changes to the code, push those changes up to Github, and open a pull request asking the owner of the repository to look at your changes–if they like your changes they can accept them and you’ve made a contribution to the master repository.
More on Git and Github
Want to know more about VCS, Git and Github? Checkout the below (I particularly like the last video.)
- Git Basics: What is Version Control? A video introduction to VCS
- What Exactly is Githb Anyway? From TechCrunch
- How the Heck do I use Github? From LifeHacker
- Github Tutorial for Beginners A really helpful tutorial video with screenshots