Monday, December 19, 2011

What I want from github.com

I like github, it's probably the primary reason I use git. Here's what I'd like about it to change:


Add a forum/mailing-list. It should be like a simple web forum, but also work as a mailing list via email



When you fork or branch a github repository, you lose both the wiki and issue tracker. For this reason, I think the wiki and issue tracker should be stored in a _directory_ in github. For bonus points, they should be very readable in a plain-text format.

The wiki is lame, especially when you're using git. Instead of a wiki, it should html pages. You'd still keep the WYSIWYG interface (like blogger), but it should make a commit to the actual repository (ala google code) and the underlying stuff should be plain-old html (probably only allow a subset of it). Html has a bad rep, but it's completely undeserved (maybe a separate blog post on that) and has the huge advantage of I can actually view it locally. It's quite important (imho) to be on a per-branch basis.

Also the current website concept is very lame. And almost entirely because its in it's own (shared-nothing) branch (gh-pages), which makes it very difficult to locally look at the website (when I'm coding), as well to make it track my dev branches. This could be unified with the html-wiki somehow.

Monday, December 5, 2011

Git is Ghetto

My biggest complaints with git, is it's full of fanatics, fanboys and otherwise clueless people who pretend: 
   * They know how to use it. 
   * They're getting productivity boosts
   * It's awesome

I think most of these people actually believe what they say, and justify pretending to make themselves look smart/trendy.   I've known people who've started a conversation with how fantastic git is, and end up asking how branching works.

But for this blog, I'm going to focus on a more technical issue, merges. For those that don't know, merges are the bread and butter of a distributed version control system like git. Yet it's unjustifiably atrocious, even compared to Bazaar.

Edit: I was under the wrong impression that Mercurial was any better, but it appears not.

Here's my example to illustrate the ghettoness of git. Imagine I'm managing a simple software project. In the morning I wake up, and there's been two new (unrelated) fixes. Perhaps they need some some mild formatting changes, but otherwise they're both pretty good.


Here's what our situation looks like:
Ok. So, obviously I want to add Daniela's and Bob's changes to my official line. The traditional git workflow would be to merge them in:

 

 Ewwwwwwww! Our history is a mess. Any one who looks at this, has little idea what's going on. If you look at this, you have no ability to go "back" in the *official* history. Like, you have no way of knowing if 12cca34 or 12ce39 was ever at the HEAD of the repository at any point in time. Not very useful for HISTORY. This might not look so bad now, but let's see how terribly it scales. Here's a (very pretty) graphical representation of what this looks like in a more complex project, managed by none-other than the original author of git):



No one sane can say this is acceptable. Look at the time stamps to get even further confused. The history almost useless for everything other than forensic investigation.  Clearly merging with a way that loses a linearized branch state is unacceptable. Fortunately git is quite flexible, and we can work around the issue. Using rebase (or something like it) we can do this:

Ok. This is hot. Super nice history, and how I recommend people try keep their repositories, but it's not without drawbacks:

We've changed the sha's. The first changed, (maybe because we fixed some white space). And the second one changed because it's now based on a new parent commit.

This is (slightly) problematic for anyone with the "old" sha's, and needs to be manually resolved (normally via a hard reset). Furthermore, if I need to make changes to the commit (to either resolve conflicts, or to fix trivial issues) it will look as if the initial commiter did that as part of their commit. If during this process I introduce a bug, this is obviously very not nice. And worst, if it's not noticed for a while -- there's nothing that says "Here is the initial commit" and "Here is how I included it in". Knowing those two parts is very important for fixing a bad merge.


Also, this whole approach works rather poorly with commits that have been cryptographically signed (because when we rebase, we're actually changing the commit in one shape or another). Also, while I can deal with it -- I really don't like the idea of someone making changes to my commits (without me having reviewed it first) and attributing it to me  [Which reminds me of a ghetto idea called "Google Wave"]


So both have pretty steep draw backs, but here's what makes it the worst. Conceptually there's a very simple solution. Merges that preserve a linear history of the state of each branch!

e.g.






And this is how bazaar does it. Graphically this can be awesomely represented (a linear list of changes to a branch, and for any change you can drill-down and see how it was developed in a completely lossless fashion).


This leads me to my conclusion: git is ghetto, and anyone who tells you otherwise is either lying or linus (or both).