Friday, July 10, 2009

The Great Git In The Sky

Every developer should know that having a good versioning system for your source files is crucial. Having the possibilities to go back in time and see what your class or module or project looked like is indispensible. And if you’re more than one developer on a project, having a common place – a repository – to store all files is even more indispensible.

Throughout the years I’ve tried a couple of different source control systems. Being a .Net developer on the Microsoft platform, I’ve tried both Visual SourceSafe (VSS) and Team Foundation Server (TFS) and I’ve also used the open source alternative SubVersion (SVN). Lately there’s a new source control system that has drawn my attention, namely Git.

Git is a fairly new source control system and was originally developed by Linus Torvalds to be used developing the Linux kernel. The first version of Git came in 2005, but wasn’t available on the Windows platform until late 2007 through the open source project msysgit (unless you were running the unix emulator cygwin, that is).

Git has a bit different take on how you manage source control than what I’ve been used to from the previous tools I’ve used. TFS, VSS and SVN lets you setup a centralized repository where you store all your files, keep track of version history, do branching, etc. But Git is a bit different in the sense that the repository is now localized on your machine and when several people are working on the same project, all repositories are essentially synchronized across all development machines – a so called distributed source control system. Which means that you have access to the full history of your source files locally. You can also have a remote Git-repository which local repositories can push and pull changes to and from, but all local histories still have the full version history.

Creating a Git repository using msysgit

For each project you want to put under source control, you just add a Git repository to the root folder of your project. Say you have some code laying on ‘C:\Code\Work\MyProject’. If you want to place this under Git’s source control and you’ve installed msysgit with the integration to Windows Explorer, then you just right click on the folder and choose ‘Git GUI Here’ (or ‘Git Bash Here’ if you’d rather use the command prompt). You have to choose ‘Create New Repository’ and then put in the directory ‘C:\Code\Work\MyProject’ in the textbox that follows (a little glitch in the UX design there; it would have been friendlier if it actually had remembered where I opened the GUI from and then put in that directory by default).

In the list of ‘Unstaged Changes’ in the upper right corner you can now choose the files you want to include in the repository. Select the appropriate files and then press Ctrl+T (Or Commit > Stage to commit). Then you can commit these into the repository with Ctrl+Return (Commit > Commit).

Alternatively you can do the same from the command line, and in fact there are a couple of things you should do on the command line before you start committing to the repository. First of all you should enter your name and email, as this will be used on all commits;

Configure name and email

The ‘—global’ parameter hints that this is a configuration setting that will be effective across all Git repositories on this machine. You could also have done this through the Git GUI (Edit > Options…), but the next thing I’ll setup I couldn’t find a way to do in the current version (v 0.12.0.23); adding ignore patterns. In a typical .net project you wouldn’t want to add for instance the bin and obj folders to the repository, and the way to ignore these files is to add a ‘.gitignore’ file to the root folder of your project. You can try to do this in Windows Explorer, but I think you’ll quickly find that this is actually not possible. But through the bash it’s a walk in the park.

To make my point I’ve created a console app called GitExample and I’ve open the bash in the root directory. I can now issue a git init command that will initialize a Git repository here, and by calling git status I can list all files and folders that are currently not under source control:

Initialize git repository

And as we can see there are a lot of things here that I really don’t want to add to the repository. Let’s be ignorant;

Create ignore file

The touch command will create the file and you can now edit it in your favorite text editor. The following list shows my ignorance;

Ignore pattern

With this in place we can now run the status command and we’ll see that there’s a lot less to care about for our Git commit process;

List of files without the ignored ones

Now it’s time to add the files to the repository, which you off course can do either through the GUI or the command line, but since we’re already in Unix mode let’s do it the hard core way.

Add to staging area and commit to repository

The procedure around committing files to the repository is 2-phased. We do a “add .” to add all untracked files in our working directory into the staging area of the repository. The staging area is set of files that are ready to be committed and you can do a lot of add (and remove) before you finally decide to commit all changes into the actual repository. And to commit the files you call the commit command with a “-m” parameter to add a comment to the files you’re committing. And as you can see from the screenshot above, after we’ve moved the files from the working directory to the staging area and then from the staging area to the repository, our working directory is now clean.

Mesh It Up!

I mentioned a bit earlier that if you want to share the repository with someone you can setup a remote repository. A popular repository host is GitHub which let you store up to 300mb free of charge (unlimited storage for public repositories). You can then push and pull changes to and from this location (take a look at this excellent blog post by Jason Meridth to see how you can do this).

Another alternative is to use your Live Mesh account as the remote repository. Pål Fossmo did a great blog on how you can setup Git together with Mesh which shows you how to configure your mesh folders. To initialize the Mesh repository you can use the clone command as shown below:

image

The clone command does what you think it does; it makes a copy of your repository and the bare parameter tells Git to strip down the repository to only what is necessary for the change tracking. That means no working copy of the source files – only the binaries, diffs, etc, that the Git database needs. You can then push and pull between local repositories and the Mesh repository which then will be synced with the cloud.

image

Why use Git?

“Git – the fast version control system”. I guess that the slogan will make a solid statement by itself, and if you’ve worked with source control systems before you’ll definitely appreciate the speed of Git. Visual SourceSafe is notoriously slow on just about every operation you perform (especially over http(s)). SubVersion is pretty fast on check-ins, but not that fast on check-outs. And TFS is pretty fast overall and also gives you the possibility to setup local proxies if you have distributed teams. But for a more quantified view on Git’s performance you can check out Scott Chacon which has compared the speed of Git to Mercurial and Bazaar.

TFS might compete to a certain extent on speed, but when it comes to the install footprint and not at least the effort it takes to actually install TFS 2008, Git will outperform TFS any day of the week. That said; TFS is a lot more than just a version control system. But if you plan on using TFS solely for the purpose of tracking your precious source files, my advice is pretty clear; Don’t! It’s not worth it – neither in time nor money.

Compared to SubVersion it strikes me that the merging capabilities of Git are a bit better. Git tracks the content of files – not the files itself, and so merging operations seams more likely to be correct in Git. In my opinion the merge operations in SVN is probably one of its weakest point; doing large merge operations in SVN is just pain and you just know you’re about to get burned. TFS on the other hand seems a bit better on the merging than SVN, but then again; time and money…

I guess it’s time for a little disclaimer here; I haven’t really used Git much yet, and so I haven’t done any large merge operations and so I might be wrong here. But from what I’ve read and from how Git is built as a distributed source control system, I have a strong feeling that merging is really one of Git’s sweat spots.

Anyways, if you have any other opinions on the subject – or to anything else in this post – please feel free to speak your mind in the comments below :)

Resources

“Git Manual Page” is the official documentation on Git and it’s actually quit good. Lot’s of good examples and pretty well written. RTFM, right?

“Everyday GIT With 20 Commands Or So” from the official tutorial will give you a head start on the most used commands.

“Git Ready” has put some of the commands into 3 categories; beginner, intermediate, and advanced.

“Git For Windows Developers” – the title says it all I guess.

“Git – SVN Crash Course” will give you a head start on using Git if you’re already familiar to SubVersion.

“Why Git is better than X” has done some (slightly biased?) comparisons against other source control systems.

msysgit is the tool to download and install if you need Git to run on a Windows box.

TortoiseGit is another client for Git repositories. If you’re familiar with TortoiseSVN for SubVersion, the learning curve will be close to zero.

GitHub lets you store up to 300mb in private repositories (unlimited storage for public repositories).