Change We Need
Change is the only thing that is constant, yet it is the thing for which we are least prepared. In some ways we measure progress by how many changes we make in completing something.
Human error is unavoidable, and in today’s computing world, we have become used to undoing and redoing our changes. We have now come to rely on such measures to the extent that otherwise risky steps becomes trivial. Yet the scope of such systems is often limited to the application we use. There is no easy cure for a file deleted weeks ago, correcting a mistake you have been building upon for a long time is no easy task.
Such problems are common in the programming world, where a large number of people modify a single code base towards a single common goal. How can changes be tracked? How can we ensure two people don’t change the same lines of code simultaneously? There are numerous methods to track and maintain changes made to the source code. Let us see how you can use these technologies for purposes other than coding.
The need for revision control
If you’ve ever worked on a complicated project, you will surely understand the benefits of saving multiple copies of your work in progress. Imagine working on your history project for weeks, only to realize that the original introduction would fit in better. Without backups at multiple stages – you’re helpless.
Often we accomplish this by repeatedly saving the file with a different name, hopefully with some kind of patter such as “HistoryProject-v1.doc”, “HistoryProject-v2.doc” or “HistoryProject-20090615.doc”. However, over time this will surely create a mess.
The problems complicate even further when you have multiple people working on the same project. How do you reconcile the barrage of changes that a team can inflict?
It may seem simple to have the file hosted on a shared server where everyone can access and modify it. However, if everyone were to save files affixed with their name and timestamps, you would soon end up with an unmanageable mess of files of which only a few will ever be of any use, and many changes will be lost as people will use outdated revisions of the file and build on those.
Wouldn’t it be great if your file system itself could store each change separately? Such that it would be invisible until needed? Many such file systems indeed exist (ext3cow, wayback, tux3 etc). However a simpler solution, which doesn’t involve shifting to *NIX and messing around with kernel modules is to use revision control software.
We can categorize revision control systems based on how the system manages its repository of data, and how it manages to avoid conflicts.
Two approaches to resolving conflicts as multiple people try to modify the same file at the same time are the lock-modify-unlock model, and the copy-modify-merge model.
Lock-Modify-Unlock
This system is remarkably simple. Whenever one member of a team needs to modify a particular file, it is locked, so that no one else can make changes to it until it is unlocked again.
Think of it as setting a file as read only for everyone else while you modify it to your pleasure; once you are done, you can “unlock” the file again and other people can begin their work.
The disadvantages here are quite evident. Firstly, you are at the mercy of the first person who locked the file, even though in most cases the changes you might make would not conflict with what they are doing. This is clearly a loss of productivity, especially if that file is essential to your work.
Then again, if a person locks a file and forgets to unlock before going on a vacation / quitting his job / or even going to sleep, you are at a deadlock until an administrator frees the shackles.
Finally, it doesn’t even really solve the problem completely since changes in one file may still conflict with the changes being made in another.
Copy-Modify-Merge
A more elegant solution for managing simultaneous access, is to simply give it. Each user can create their own copy of the file which they then proceed to modify as if no one is watching.
Finally, they can “merge” back their changes. By this, we mean that the revision system itself determines if there is any conflict in the changes made in the file contents. In most cases, different people will modify different parts of the file and the system will work quite well. However, in the rare case of a conflict that cannot be solved by the software itself, users can be informed of the problem and can make the decision themselves.
To see how this would work, let us take revision 1 of a document, which two people AlphaGeek and BetaDood copy simultaneously. AlphaGeek, alpha as he is, manages to complete his changes first and merges the changes into the main document bringing it up to par with his copy.
When BetaDood tries to update his copy, he will be told that his copy is out of date, since his updates were made to revision 1, while the current revision is 2.
BetaDood will now have to first merge the changes in revision 2 into his local copy. This will be a simple process, as changes will usually be made in different parts of the document and will not conflict. His latest copy, now based on revision 2 can now be merged into the main file, giving us revision 3.
This may seem like a tedious process but there is rather little effort involved in most common scenarios, and this ends up being a much more efficient and element solution.
Change again is the most important factor here and that is what drives any revision control software.
Quantifying change
A variety of systems exist for maintaining revisions. However, there is one thing they all have in common. They include a mechanism to detect the changes made to files.
Using revision control you have access to every change ever made that was submitted or committed to the repository. Like a time machine that can stop at any and every revision ever made in any file!
It may seem like storing each and every version of each and every file will take a rather large amount of space, however, usually, revision control programs are very optimized for changes in files, since that it their primary metric anyway.
Revision control softwares use smart algorithms to store the minimal amount of data needed to differentiate each revision from the previous one. Only the changes made in each revision are actually stored.
If a copy of the repository contents is made either for release (tagging) or for starting an offshoot / spin-off of the project (branching) or experimenting with a new feature (branching), the system usually creates only an internal reference to the revision it is associated with, instead of an actual physical copy.
When working with files in a revision controlled system, you will often come across trunks and branches, tags and tips – it’s time for a biology lesson.
Of trunks and branches
In the programming world, branching and tagging are concepts that pop up frequently. For any software project, the “trunk” of the code is that which contains the latest revision of the code made to date. It is the one shooting upwards towards the next version. Branches in a code occur when a developer decides to take a different approach to the code thereby “branching” out in a different direction; maybe a different UI, or a different feature, or an experimental approach to a problem. These changes are often merged back into the original trunk when they are considered stable enough.
Take for example an evolving work of art done in Paint, a silly concept, but humor us. As you make each significant change, you save and “commit” it to the trunk. At one point though we decide that the background is too dull, and decide to try a different color, and here, we make a new branch. When you have a piece that you think stands on its own, or something that is otherwise special, you can tag it.
For storing your files in a revision control system, branches and trunks may not seem as useful, but still it is nice to know they are always there to lend support.
How and where a revision control system manages its content is also an important factor, and based on this concept we have centralized and distributed revision control systems.
Models of Revision
While the centralized system is based on the old client-server model, decentralized revision control systems are built on the increasingly popular P2P model. Decentralized version control systems have only recently started to become popular, and some of the best known decentralized solutions are just a couple of years old.
Centralized Revisioning
In the client-server model, the code is stored and maintained in a central repository. People send changes directly to this server, and the server alone maintains the histories and change logs of the files. This concept is quite easy to grasp, as it is one we are quite familiar with already. If you take revisioning out of the equation it is basically like having an FTP server.
In a centralized system, the repository is hosted on a server, which all the team members access in order to access the latest revision of the code, or to submit changes.
It is clear to see where such a system would be advantageous. As all the files are stored in one convenient location, it is easier to control and moderate. It is easier to track changes and establish a time line. Access restrictions are also easier to establish, as the server is in control of a single entity.
However, this is also akin to putting all your eggs in to one basket, as all the content and its revision history is maintained at one single place.
One of the most popular examples of this is Subversion or SVN.
Decentralized revisioning
In the essence of P2P, distributed revision control systems give people equal power. It is in essence a true democracy. These are much more recent; however, they are seeing rapid adoption in the development community.
In the distributed system, as the name suggests, there is no need for a central repository, although one may be present. Instead, individuals maintain their own copy of the repository where they can make changes without the need for a network connection.
Each person is in charge of their own repositories which they maintain. However, you may wonder, how does work get done in a group if each person works independently.
With decentralization, changes can not only be made to and loaded from a repository, but also between repositories. Using the “push” operation, the changes made in one repository (which could consist of multiple commits) can be transmitted to another repository, and using the “pull” operation, changes can similarly be merged with your own repository.
This may seem like a nightmare if you imagine even a small group of 10 people pushing and pulling each other and committing, syncing, merging and branching. With such a cacophony of actions it seems more like a soap opera than a team programming effort!
To simplify this you can of course have a central repository for storing the final work of the team, pretty much the same as a central system.
GIT is a distributed revision control system designed by Linus Torvalds, the creator of Linux, which is used for maintaining the Linux kernel. Another popular example of a distributed system is Mercurial, now used for Mozilla projects.
Evidently, distributed systems are perfect for single user scenarios. You can easily set up such a system and use it to maintain, for example, your documents folder.
Sub Version
After reading about the power of Revision control systems, you might be excited to set up your own. For some systems this can be as easy as simply installing a single software tool!
Subversion (SVN) uses a Client Server system for revision control, and as such you will need to set up both a client and server if you intend to use it. If you need it simply for your documents which you will probably work on alone, this can be quite an overkill.
To get Subversion up and running on your computer, you need to install SVN, following which you will need to set up the subversion server. Subversion can work with Apache server, using a module provided with the SVN setup. Unless you have Apache already installed and running, however, it is much easier to set up and configure it’s own server. Apache provides some more facilities such as WebDAV support, auto-versioning and a web interface of you repository contents.
Once the server is set up, you can access it using the computers local IP (127.0.0.1) or “localhost” (if you are using it from the same computer).
With SVN you are left always shuffling files back and forth from a server, which can be quite slow, even one which is running on your own computer. Also the server will need to be running if you wish to access the repository, this basically means that the server process will always be running on your computer hogging up memory.
If you intend to use a revisioning system alone on your computer, Mercurial might turn out to be a better option.
Collaboration in Google Wave
Google Wave is a powerful service indeed, and being open source and free, it has the potential to change the way the internet works. It is powerful enough to replace current email and IM systems and perhaps change a lot more. Google Wave is a powerful collaboration tool, where teamwork and versioning form an integral part of the service. Every step of the conversation is recorded, and every change in the structure of the posts is recorded, so that it can be played back at any given point. Teamwork is the very concept Google Wave is based on. Being primarily a communication tool, it quite obviously requires the participation of many people. Whether it is in holding a conversation, or editing an online document, or even playing a game of chess.
Conclusion
As you begin work on your next masterpiece keep in mind, how revision control software can play a vital role. These systems are usually associated only with large software projects, but you can clearly see how they can be useful even to the most casual computer users.
Even without being able to employ it to the fullest, and with little need for branching and tagging in a casual or individual work environment, the advantages are enough to justify the learning curve. Don’t dismiss these systems as something that can only be of advantage to developers, find your own usage scenarios. Think of it as a supercharged backup solution (although it is not to be taken as a replacement for one!). May your future come in multiple revisions.
Mercurial
Mercurial is a decentralized system, despite the apparent complexity, it is a much simpler and faster system to set up and use.
The simplest way to use Mercurial, and one which will appease GUI buffs as well, is by using the TortoiseHG tool. This software not only installs the Mercurial system, but also installs extensions to Windows Explorer. As a result, it makes it much simpler to perform the most common tasks. Using TortoiseHG you can manage your repositories easily from explorer.
After installing TortoiseHG, all you need to do to enable revision control is to simply right click in a folder and under the TortoiseHG menu click on “Create Repository Here”. That’s it! After a repository is created in a folder, you can instruct it to add the files to the repository using the Commit command.
Mercurial comes out as a much simpler solution for single people maintaining a project, however as you bring in multiple people, each with their own version of the repository, SVN may seem worth the whole effort of setting up a server.