Bazaar-NG: 7 years of hacking on a distributed version control system

For the last 7 years I've been involved in the Bazaar project. As I'm slowly stepping down from the project, I thought I would write up a retrospective of both my involvement in the project and thoughts on what went well and what did not.

There are a lot of details in here, and I'm sure I got some of them wrong. Please let me know if you spot any mistakes.

The Early Days

< ddaa> There are two kinds of dscms... the obsolete ones and the experimental ones

Back in 2004 I was an occasional reader of a blog by Martin Pool, who I vaguely knew as a fellow Samba contributor. At the time, Martin was investigating different version control systems and documenting the process on his blog. I myself had recently migrated from CVS to Subversion, and I was interested in some of the other new source control management systems that had sprung up, like darcs and tla. I had tried tla for a while, and though I liked its model, I found it to be too cumbersome to use in practice.

Sometime late in 2004 or early 2005 Martin announced that he had joined Canonical Ltd., and was going to be working on a prototype of a new version of their version control system, Bazaar. Bazaar - baz on the command-line - was a fork of GNU arch.

In the spring of that year I attended Linux.Conf.Au in Canberra. Martin gave a talk about "Bazaar-NG" as it was then called, and he gave a really basic demo. The storage format at the time still included full texts of all versioned files, thus not particularly efficient. I liked the simplicity of the UI and the promised features that come with a distributed version control system, so I put Bazaar-NG on my list of interesting things to keep an eye on and started lurking on the mailing list.

Another notable event at LCA 2005 was tridge's infamous presentation about the BitKeeper protocol. This talk eventually resulted in Larry McVoy stopping the free version of BitKeeper, and that prompted Linus to start hacking on Git and Matt Mackall to start on Mercurial. Of course, none of us could predict any of this was going to happen at the time.

Samba's use of bzr

Later that year, Samba was participating in the first Google Summer of Code. We didn't want to bother with the logistics of giving all our students commit access to the main repository in Subversion, so we decided to experiment with Bazaar. We published a Bazaar import of our source code, and asked students to publish a clone of it with their changes.

Bazaar-NG was the most obvious choice since we already knew Martin, and it was as good a choice as any DVCS at the time. It was a huge pain to actually get all sourcecode converted from Subversion to Bazaar. There was no fastexport/fastimport or any specific support for Subversion in Bazaar, just a tool called Tailor that tried to support conversions from any VCS to any other VCS.

However I tried to use Tailor, I hit bugs. I later ended up contributing some bug fixes but never got it to work reliably. It became clear that one of its major disadvantages was that it did conversions by replaying all changes on disk, which was very slow.

Using Bazaar for the Samba Summer of Code was not an overwhelming success by any measure. It was dead slow, and we hit a number of bugs. Clearly bzr wasn't mature enough to be used for a project the size of Samba just yet. I was still hooked; the model was promising, the bzr developers had been responsive to our bug reports, and I really liked the UI, which was simple and familiar. For the small projects I was using it on myself it already worked great.

I did play with other distributed version control systems as well, but interoperability with Subversion was problematic for all of them and they lacked in other areas, too. I remember Git's UI in particular being really basic at this point, and requiring wrapper scripts ("cogito") to be somewhat usable.

First Contributions

After the summer I contributed my first bzr patch. Apart from some dabbling in Tailor's sources this was my first Python code and it shows. Getting the patch landed required some perseverance.

Later that year, Gustavo published the first version of svn2bzr, which I started testing and contributing to. My reply to the svn2bzr announcement was indicative of the problems we kept encountering for the next couple of years:

I tried converting the Samba repository, but it took me 6 hours to convert the first 10 commits (out of ~11000 commits), whereas it takes less then 10 hours total with tailor (remotely, but converting just the main branch with 6000 commits).

(For reference, importing the first 6k mainline revisions of the old Samba Subversion repository in 2012 takes less than 5 minutes)

Looking back, it seems like Bazaar-NG was much more a community project at the time than it is now. Sure, Canonical funded its main developer and thus had copyright on most of the files but there were plenty of other people contributing and the roadmap seemed to be defined by the needs of free software projects, not Canonical. Bazaar-NG had its own domain at "bazaar-ng.org". Nobody asked me to sign a contributor license agreement for the patches I submitted in those first few years.

One of the concepts that intrigued me was that of Foreign Branches. Aaron had suggested on the wiki that Bazaar-NG's existing abstraction for multiple file formats could be used to support formats from other version control systems, such as Subversion. Inspired by this, I started hacking on a plugin that provided Subversion integration in Bazaar, unoriginally named bzr-svn. Since Samba had no plans of migrating away from Subversion any time soon, I hoped this would allow me to hack on Samba with bzr anyway.

First Sprint

LarstiQ and I were invited to attend a Bazaar sprint in London, sometime in May 2006 - my first. This sprint was back-to-back with the first and only joint Bazaar-Mercurial sprint at the Canonical offices.

There were 6 of us at the sprint - Martin Pool, Robert Collins, John Meinel, Aaron Bentley, Wouter van Heijst and myself (Jelmer Vernooij). Only Martin and Rob were working for Canonical at this point, but there were other Canonicalers occasionally popping in. Rob was in charge of baz, the user-friendly fork of GNU arch by Canonical, and John and Aaron had both worked on tla or forks of it of their own.

Sprints are very intense, and there is usually a lot of new stuff that happens that one week when everybody is in the same room together. I didn't write many lines of code myself, but I don't think I ever learned so much as in the span of those couple of days. I was introduced to extreme programming, unit testing, test driven development, advanced Python programming and all kinds of new version control concepts with odd names that sounded like they were made up by somebody who had watched too much episodes of the X-Files ("ghost revision", "patience diff", "history horizon", "nuclear launch codes", ...).

A few mental images from that week stick. I remember sitting in our small hotel conference room, pair-programming on the first implementation of the CommitBuilder API with Rob. I remember John and Wouter hacking on John's big branch to make bzr unicode compatible, getting the last remaining tests to pass. I remember Aaron introducing me to revision id aliases and ghosts over lunch. I remember us sitting at a certain somebody's kitchen table after we found out that we didn't fit into the small Canonical office. I remember David Allouche telling me about Bazaar-NG's predecessor and tla over beer in the hotel bar. I also remember talking about nested trees that week; six years later we are still talking about it. But mostly I remember the fun, the enthusiasm and the optimism.

This particular sprint made a big impression on me. It mattered that I had excellent mentoring at such a key moment. I really appreciate Martin organizing for us to come over, Canonical for sponsoring and those four key hackers for their endless patience.

baz/arch influences

Bazaar-NG's model was in a lot of ways meant to cope with the shortcomings of the arch model that the previous Bazaar incarnation ('baz') had inherited. That said, it was still supposed to be able to import repositories from tla/baz/arch - to allow users to upgrade their existing repositories. Originally Bazaar-NG was just meant as a prototype in Python for the next version of baz; once the model was right, it would be redone in C for baz 2.0. At some point - I'm not sure when - it was decided that bzr itself (in Python) would become the successor of baz, and that the existing baz codebase would be abandoned.

I never used Arch or Baz in anger, but over the years I learned quite a lot about the model - and its shortcomings. Arch imposed quite a few restrictions on the user about how revisions, directories and tags could be named. There was a steep learning curve; users had to know about the complex underlying model and the numerous subcommands that came with operating on it. It was fairly easy to shoot yourself in the foot if you weren't quite sure what you were doing, and there were some operations that were impossible to undo. Rather than copy existing data when deriving changes somebody else's branch, it would make your derived branch refer to the parent branch for the preceding history. If their repository (and their revisions along with it) disappeared from the net, you would become unable to traverse the history beyond your own revisions.

bzr (and baz before it, so far as the existing Arch data format allowed it) tried to be more user friendly and robust. Their user interfaces were more akin to that of the (familiar) CVS or Subversion command lines. It provided the user with revision numbers in addition to the long pseudorandom strings that were used to identify revisions internally. Almost all operations were reversible. Repositories always contained all data so access to historical access was fast, and you didn't have to rely on the rest of the world to keep their HTTP servers up and running well. Windows and Mac were no longer second class citizens; bzr supported backslash as path separator in addition to the UNIX-centric forward slash, and it defined a canonical encoding for filenames.

The data format was in many ways a result of the data model, and not the other way around. If a change was made to the data model, that was cause for matching changes to the file format. This is certainly the most sensible thing to do for a traditional database, but for a distributed database like a DVCS repository less so.

bzr inherited a couple of key features from arch. Since branches were prone to disappear from the net, revisions would often be lost; the version control system knew about them (since they were referenced elsewhere) but was unable to access their contents. These were so-called "ghost revisions". baz allowed sharing branches over plain http. Revisions and files in baz had their own unique identifiers to track their identity even if their name changed; bzr kept the concept of revision ids and file ids, but changed their constraints - it made them just plain pseudorandom strings. The focus was very much on identity tracking rather than "stupid" content tracking as Git does.

While some of these features are very nice, in hindsight I wonder if they were worth the trouble. There seemed to be a notion that bzr should be able to support all the features that people had come up with for arch and its various forks throughout the years - the one DVCS to rule them all. It was questionable how useful some of these features actually were.

I was one of the biggest supporters of ghost revisions, as they were very useful for foreign format support, but they had a significant impact on the performance and complexity of the algorithms to find missing revisions during fetch operations. We were also, erhm, haunted by a steady flow of bugs in code that didn't cope well with ghosts. If it wasn't for the baz imports, we could and would have left support for ghosts until much later.

Supporting dumb transports - being able to access branches over HTTP or FTP without bzr-specific code on the other end - was a great move. The advantage of this is that you can use any ordinary HTTP or FTP server for hosting your development branches; especially in the pre-cloud era this was a huge advantage over having to run custom code on the server, as CVS or Subversion required.

Because of the Transport abstraction in Bazaar the same code could be used against local repositories and remote repositories. It is a great feat to be able to run bzr log against a remote repository without having to do a full clone. There was some price we had to pay for supporting dumb transports this way. Operations to access the repository couldn't involve a lot of seeks or reading large amounts of data as it would impact the performance over HTTP.

But perhaps more importantly, it set the expectation that you could use all operations against remote repositories. This later bit us when the bzr smart server had to support all operations that could be done against repositories and branches, rather than taking the easy way out and forcing users to fetch data and then work on it locally (the approach git took).

One of the notable features from baz that was left out for performance reasons was support for cherry-pick merge tracking.

(Some of Martin's early notes on Arch).

In the early days, Bazaar releases were rapidly succeeding each other. For a while there were monthly releases; three weeks of development followed by one week of stabilization and fixing up plugins. Development was done in small bites, which would land rapidly. Every change that introduced even the smallest new features in the file format (or configuration) that older versions of bzr might not be able to understand was cause for incrementing the file format version. We just made these changes to the file format as they came along, rather than accumulating them. This meant that at one point there was a new format every couple of months. Later on, we did slow down on format changes and no new format has been introduced since 2009. Unfortunately we have been unable to shake the image that we introduce a new file format every fortnight.

Ogres

Another project I worked in had something called the "Ogre Model", because it, like ogres - and apparently unlike anything else in software - had layers.

One of the reasons why Bazaar had been able to introduce new versions of the file formats so frequently is that has fairly clean abstractions between its UI, its APIs and its file format implementations. This is quite a stark contrast with e.g. Git, which exposes various things in its UI that are a direct result of the disk serialization (e.g. the sha1sum of the on-disk representation is used to address revisions). It would be very hard to change the existing Git tools to support a wildly different file format; for Bazaar, it is quite easy.

I think this is a significant difference between Bazaar and Git, both of which have fairly similar models. It is the reason for Bazaar's reasonably easy transition from the old inefficient weave format to its pack format, its ability to support foreign formats as first class citizens but also for a lot of extra glue code and complexity. In many ways, Bazaar is built top-down rather than bottom-up like Git.

Note that I'm not saying that this makes Bazaar better than Git. To quote (I think) David Wheeler:

All problems in computer science can be solved by another level of indirection... Except for the problem of too many layers of indirection.

Hard to land patches

Bazaar has always had fairly strict requirements for landing changes. This has been a mixed blessing. All bug fixes or behaviour changes should be accompanied by matching changes or extension of the testsuite. All changes have to be reviewed by two committers (though committers can self-review, so they only need one extra vote). Canonical owns the copyright to almost all Bazaar sourcecode (there are some historical exceptions), and since 2009 or so contributors have to sign the Canonical contributor license agreement.

It took me a while to get my first patch landed - the fix was easy enough, but writing a matching test was more of a challenge. More occasional contributors wrestled with this issue, and I know it deterred some of them from contributing the occasional change. Eventually the Canonical Bazaar hackers started "piloting" patches from contributors, which made it much easier to help get bugs fixed or changes landed.

Of course, being so strict about review and testing also has its perks. Most changes that landed were of high quality. Because of the high test coverage, refactoring or making other rigorous changes can be done with a lot of confidence. I don't recall ever seeing significant data loss bugs in Bazaar. Regressions happen but are rare (with the one exception of API changes breaking plugins) - I happily ran bzr's trunk for quite a number of years.

But I don't think the quality requirements were really what set us apart from our competitors in this regard - Git and Mercurial have requirements that are perhaps not quite as strict, but similar. I'm sure they have also had contributions fall through the cracks.

I wonder if the relative complexity of Bazaar was a large contributing factor. Most Git users know the basics of its file format or can learn it in a day; it is much easier for them to find a bug in one of gits few source files than it is to understand the various layers involved in Bazaar. The newer Bazaar file formats are also notoriously underdocumented; even after so many years in the bzr world, I personally would find it easier to fix a bug in Git's pack implementation than in Bazaar's.

Perhaps more tellingly, there was a LWN article on the internals of Git a mere 5 days after its initial release.

This matters not because it scares away the few geeks who are interested in file formats for the sake of it, but because some of the users who hit issues in the lower layers of the code are not empowered to fix bugs they run into. I can very much sympathise with people leaving (and speaking ill of) bzr when they encounter a bug that prevents them from accessing their repository, no matter how trivial that bug actually is.

Plugins

Another mixed blessing for Bazaar has been the support for plugins. Almost all of the infrastructure in bzrlib can be extended with plugins. By far the most popular way for plugins to modify Bazaar is registering new subcommands. Once a plugin is installed it is always loaded when bzrlib itself gets initialized.

Anybody with some minimal Python knowledge can write a plugin. There is pretty good documentation available and writing a basic plugin doesn't require a lot of knowledge of the ins and outs of bzr itself. Almost anything can be done from a plugin, including providing support for repository file formats, overriding output formats for bzr log or adding support for protocols over which repositories can be accessed.

To some degree, plugins are used as a way to cope with the fact that it is hard to get new code landed in bzr core. There is no need to convince a reviewer you're doing the right thing. You can skip the writing of tests. In other words, plugins are a great way to test drive new functionality. There is no mandatory copyright assignment to Canonical.

Plugins tend to walk out of pace with Bazaar core. Any changes to the API are prone to break plugins that use it. There are mechanisms for marking older APIs as deprecated and encouraging users to switch to their successors. However, bzrlib has a large number of of APIs that are (in name) public. In practice, there are only a few that are used by plugins and deprecating with the official mechanism is quite time-consuming - and is therefore sometimes skipped.

Since it is so easy to stick new functionality in plugins, there also isn't much incentive to get changes merged upstream. In practice, this means that keeping a Bazaar installation up-to-date is a constant juggling of different Bazaar versions and plugin versions, which have to be compatible wth each other.

We should have tried to bundle more plugins with Bazaar itself - batteries included, like Python. Mercurial and Git both did this.

Rapid development

After 2005 the Bazaar team at Canonical slowly grew to five people. For most of 2006 and 2007 there was a steady stream of new and exciting changes. There were more sprints and I got to meet more fellow bzr contributors in person - Andrew Bennetts, Vincent Ladeuil, Ian Clatworthy, Martin Albisetti.

Bazaar moved to a significantly improved file format - knits - and then changed again to put them into pack files. The high performance smart server was put together, though admittedly the "high performance" bit was still hypothetical at that point. We started sending each other serialized revisions ("bundles") by email. Several people put together graphical frontends for Bazaar, and various web frontends.

Most of the Bazaar development discussion happened on IRC or on the mailing list. To propose a change, one would submit a patch or a merge request to the mailing list, and a committer would review it and submit it to the bot that manages the main Bazaar branch. Eventually, Aaron Bentley set up a web site - BundleBuggy - which tracked changes submitted to the list and their status.

The big migration

Sometime near the end of 2007 the first larger free software projects started looking at distributed version control. Throughout 2008 and 2009 a significant number would migrate away from centralised version control.

The previous VCS migration - from CVS to Subversion - had worked out so well for Samba that one of the other developers proposed a migration to Git in October 2007. I had been using bzr-svn for a while to contribute to Samba, but as much as I wanted to propose Bazaar as an alternative to Git, I realized that it wasn't quite there yet in terms of performance. After Samba migrated, I started hacking on bzr-git so I could keep using Bazaar.

In February of 2008, Bazaar became a GNU project. In practice there wasn't much that changed.

In 2008 more and more projects looked at migrating to one of the big three distributed version control systems - Mercurial, Git and Bazaar. I was still mostly involved with the foreign branch plugins and by now was working on the bzr-git and bzr-hg plugins that Rob had started, in addition to bzr-svn and bzr-gtk.

Now that Launchpad had proper support for merge proposals, Bazaar itself migrated to Launchpad and started using Launchpad to discuss merge proposals, replacing the bundles sent to the mailing list.

In the mean time, as the DVCS tools themselves became more mature, various people focused more on integrating them in Visual Studio, Eclipse, and Nautilus. Ian Clatworthy got the documentation to a reasonable state and started bzr-explorer. The Canonical Bazaar team tried to accomodate requests from various projects that were interested in migrating to Bazaar.

I worried that we didn't have good enough integration with the various external tools and that that was scaring users away. In hindsight, that was probably mixing up cause and effect.

Immutable and full history

Bazaar, like Arch and Subversion before it, has always had a strong focus on accurately recording history. History can be referenced but missing (ghost revisions) because it is simply unavailable or because somebody has actively made it disappear from the repository (e.g. if they had accidentally committed their password).

The Bazaar developers have always considered rewriting history a bad idea. The main argument against mutable history - apart from the history no longer being accurate - is that allowing changes to the history is a way for users to shoot themselves in the foot. Changing history results in two slightly different versions of the same data. Propagating these changes is another can of worms that we didn't want to touch.

There were originally plans to allow annotations in Bazaar, much like Git allows notes to its objects, e.g. so typos in commit messages could be fixed.

Another major design principle in Bazaar is that it's always good to hold on to information where possible, in case it is needed later. Bazaar explicitly records renames, and it encourages inclusion of history when e.g. submitting a changeset.

Rather than discarding the users' history for a patch, it is referenced in the new "mainline" commit that lands the change.

The "mainline" is the left hand side ancestry - or in other words, the revisions you would see if you traversed a DAG down over the left hand side from the tip (visiting the first parent of a node only) back to a tree root.

Bazaar's UI is also oriented around the concept of the mainline; for example, revision numbers without dots are revisions on the mainline, "bzr log" and "bzr qlog" only show the mainline by default. The right hand side history with the changes in between that a contributor made before submitting their patch are present if you really need them but usually hidden.

As this point there were a lot of projects trying to import their existing history in the various next generation DVCSes to testdrive them. The Git, Mercurial and Bazaar mailing lists and IRC channels were filled with discussions and flamewars over which system lacked the essential features that the other had and vice versa. As more and more projects picked their new shiny next generation source code management tool, three issues with bzr slowly became clear.

Performance

While there had been gradual improvement to the performance over the years, Bazaar still did not perform with acceptable speed for medium to large projects. A project would also require more space on disk with Bazaar than it would if it was versioned with Git or Mercurial.

Workflows

The number of workflows available for Bazaar was overwhelming and confused people. bzr can be used in a true distributed fashion with independent branches that pull from and push to each other, or with a local branch that is "bound" to a central branch - simulating the way in which one would work in a centralized environment with Subversion or CVS. Since none of the core Bazaar developers worked used the centralized workflow, it had some rough edges.

The one-branch-per-directory philosophy that worked well for small projects broke apart when the repositories grew in size - users had to create repositories that were shared by multiple branches for disk efficiency. For C projects it is often useful to reuse the same working tree for multiple branches. This has always been possible with Bazaar, but never very convenient.

History tracking versus content tracking

History is a set of lies agreed upon.

The third problem was Bazaar's stance on completely immutable history. Software developers aren't historians or lawyers. A lot of users were keen to keep their history simple. They cared about tracking what changes had happened to their trunk and having that sequence of changes be legible, not really about the individual commits and mistakes that led up to the patch that landed on trunk. Git offered "rebase", which basically reapplies local changes on top of the upstream tree. Git is often referred to as a content tracker, not a version control tool. That makes a lot more sense with this in mind.

And related, Git also just gave users the ability to muck about with their repository on a low level using simple command-line tools. For example, you could use these to exclude a big iso from a repository that was freshly imported from SVN. Bazaar for a long time didn't have a rebase plugin, and when it finally grew a rebase plugin it was slow and not very powerful. Bazaar doesn't really have many other tools for rewriting odd bits of history when necessary.

Somewhat surprising to me, at least at the time, was that it didn't seem to matter to most people that rename inference in Git isn't 100% reliable. Apparently the few corner cases that you can run into with incorrectly detected renames are uncommon enough (and reasonably easy to work around) that people just live with it. I live with it now too, and I have only encountered issues a handful number of times.

Bazaars solution for dealing with renames has its problems as well; it uses unique identifiers, assigned to files when they are first added to the working tree, to track files in spite of changes to their filename. This is problematic when the same file is versioned independently and thus gets two different unique identifiers.

It is interesting to note that technically the data models of Bazaar and Git aren't wildly different. Bazaar explicitly tracks directories and assigns file ids to track renames, while Git uses heuristics. The difference is more in the philosophy, the tools and the layout on disk.

The Canonical Bazaar folks set about designing a new format that would address the performance issues for once and for all, and improve the size of a repository on disk. brisbane-core (2a) arrived somewhere in the summer of 2009, and delivered, although there were still a fair number of bugs that people stumbled over, including performance regressions for specific use cases. Unfortunately Bazaar already had the reputation for being slow at this point, and it was hard to shake that off despite the fast-as-lightning new format.

Some people claimed Bazaar did not have many community contributions, and was entirely developed inside of Canonical's walled garden. The irony of that was that while it is true that a large part of Bazaar was written by Canonical employees, that was mostly because Canonical had been hiring people who were contributing to Bazaar - most of which would then ended up working on other code inside of Canonical.

I myself joined Canonical in 2009 to work on the packaging side (Soyuz) of Launchpad, a few months after it had been released under the AGPL. I actively kept working on Bazaar in my spare time.

Decline and focus on Ubuntu and UDD

Near the end of 2009 it was becoming slowly clear that Git was winning the big DVCS war. We would see community contributions to Bazaar gradually decline.

In late 2009 Canonical gradually became more and more visible in the Bazaar world. It started offering Commercial support and around the same time contributors were asked for the first time to sign a contributor agreement when they submitted patches online. Sometime in 2010, the domain name changed from bazaar-vcs.org to bazaar.canonical.com.

The focus for the Bazaar team in Canonical changed from making sure Bazaar was a decent general purpose version control system to ensuring that Bazaar worked well for package management in the Ubuntu project. One of the early goals for Bazaar in Canonical had been that it could be used for all source packages, and bring the advantages of version control to the Ubuntu developers, who were still mucking about with tarballs and diffs.

Despite my disappointment with its lack of popularity, I was still passionate about Bazaar, and the changed role it could play - as source control integration tool in Ubuntu. The foreign branch support seemed to be key in a world with a large number of VCS tools for which we wanted to have a single interface. I moved to the Bazaar team in late 2010 to replace Rob, who joined Launchpad as technical architect.

James Westby had written a set of scripts to automatically import all Ubuntu source packages into Bazaar, which was based around his bzr-builddeb plugin. The Bazaar team took over after his proof of concept and tried to get all 17k-something Ubuntu source packages imported into Bazaar. The goal was that once this worked for all packages, explicit tarball source uploads would be disabled in Launchpad and new uploads would happen by pushing a branch up to Launchpad.

Importing all packages without problems proved to be a very hard problem. There are packages with odd filenames (non-utf8 characters, backslashes, newlines), weird ancestry (merging changes from debian, changing from native to regular and back, etc), very large files, multiple debian tarballs, and all other kinds of weird.

After a long struggle with cancer, Ian passed away. He had been the driving force behind the documentation and bzr-explorer, but he was most of all a really nice person, with contagious enthusiasm.

The fact that package branches were occasionally out of date and took longer to fetch than the old-style tarballs (which are widely mirrored) didn't earn us a lot of goodwill with the Ubuntu developers. In hindsight trying to get all packages to import correctly before moving over the managing all packages in Bazaar was probably the wrong move. Migrating packages over to Bazaar one at a time would have been better - there is only a limited number of packages that matters for Ubuntu anyway, and it would eliminate the hard step in between of making the package importer work properly.

Bazaar on the slow track

Contributions from people outside of the Canonical Bazaar team had become rare by mid-2011. In early 2012 the members of the Canonical Bazaar team were assigned to other projects, though we would still fix the occasional bug in Bazaar. Martin left Canonical in April 2012.

During my spare time in the first 6 months of 2012 I tried to finish my remaining in-progress branches. After that, I thought I would see how it would go.

I think it's time to move on. There are still some things I don't like about it, but Git is a decent source code management system. Bazaar isn't going anywhere; no doubt there will be users for a few years to come, and people contributing fixes, but it hasn't been adopted to the level I was hoping.

Could bzr have a second life as another UI for the Git file format, becoming part of the Git world rather than competing with it? Sure, I guess. Several people, including myself, have suggested this in the past. It would however still require a fair amount of work - bzr-git is unfinished. If it's just the UI you're after then it is probably easier to simply build a bzr-like Git UI from scratch, directly on top of something like libgit2 or Dulwich.

Conclusion

Bazaar had a lot going for it. It is written in a popular and accessible programming language. The code is clean, with lots of comments and good test coverage. It has a mostly simple command-line UI, some nice graphical frontends, decent cross-platform support and is reasonably well documented. For a number of years there were several full-time engineers dedicated to it full time, and it had a vibrant community.

We lost sight of what mattered for our users, focusing on features that were nice but perhaps not as necessary as we thought. We overengineered. We didn't get rid of the crufty unnecessary features. It's harder to comprehend, contribute to or fix performance issues in a large layered codebase. And the larger a codebase becomes, the larger the surface for bugs, the harder it is to refactor.

This applies to the foreign branch plugins - my area - as well. It is certainly neat that bzr log, bzr diff -c 1200 or Bazaars standard API work inside of a Git repository without having to convert the full repository from Git to Bazaar first. But is that really worth the extra code, the extra complexity, and the extra (performance) bugs that come with the architecture that enables those features?

It was a lot of fun working on Bazaar, with so many nice and talented people. I learned an awful lot. Time for something new.

Comments