This section will cover the parameters that make software high quality. However, it will not cover the means to make it so. These include such non-essential things as having good, modular code [having_good_code] good marketing, or having good automated tests. These things are definitely important, but will be covered only later, and a lot of popular, high-quality software lacks some of them, while its competition may do better in this respect.
One note that is in order is that these are many parameters in a generic “weight function”, and not a list of requirements which must all be satisfied. [Licence]
That may seem like a silly thing to say, but you’ll be surprised how many times people get it wrong. How many times have you seen web-sites of software that claim that the new version of the program (or even the first) is currently under work, will change the world, but is not available yet? How many times have you heard of web-sites that are not live yet, and refuse to tell people exactly what they are about?
Alternatively, in the case of the “Stanford checker”, which is a sophisticated tool for static code analysis, it is not available for download, but instead is a service provided by its parent company.
A program should be available in the wild somehow (for downloading or at least buying) so people can use it, play with it, become impressed or unimpressed, report bugs, and ask for new features. Otherwise it’s just in-house software or at most a service, that is not adequate for most needs.
In the “Cathedral and the Bazaar”, Eric Raymond recommends to “release early and release often”. Make frequent, incremental releases, so your project won’t stagnate. If you take your project and work on it yourself for too long, people will give up.
If you have a new idea for a program, make sure you implement some basic but adequate functionality, and then release it so people can play with it, and learn about it. Most successful open-source projects, that have been open-source since their inception, have started this way: the Linux kernel, gcc, vim, perl, CPython. If you look at the earliest versions of all of them, you’ll find that they were very limited if not downright hideous, but now they are often among the best of breed.
Which version of your software are you using? How can you tell? It’s not unusual to come to a page where the link to the archive does not contain a version number, nor is it clearly indicated anywhere. What happens if this number was bumped to indicate a bug fix. How do you indicate it then?
A good software always indicates its version number in the archive file name, the opening directory file name, has a --version
command-line flag, and the version mentioned in the about dialogue if there is one.
Public availability of the source is a great advantage for a program to have. Many people will usually not take a look at it otherwise, for many reasons, some ideological but some also practical. [Ideology] Without the source, and without it being under a proper licence, the software will not become part of most distributions of Linux, the BSD operating systems, etc.
If you just say that your software “is copyright by John Doe. All Rights Reserved”, then it may arguably induce an inability to study its internals (including to fix bugs or add features) without being restricted by a non-compete clause, or even that its use or re-distribution is restricted. Some software ship with extremely long, complicated (and often not entirely enforceable) End-User-Licence-Agreements (EULAs) that no-one reads or cares to understand.
As a result, many people will find a program with a licence that is not 100% Free and Open Source Software - unacceptable. To be truly useful the application also needs be GPL compatible, and naturally usable public-domain licences such as the modified BSD licence, the MIT X11 licence, or even pure Public Domain source code[public-domain], are even better for their ability to be sub-licensed and re-used. This is while licenses that allow incorporation but not sub-licensing, like the Lesser General Public License (LGPL) are somewhere in between.
While some programs on Linux have become popular despite being under non-optimal licences, many Linux distributions pride themselves that the core system consists of “100% free software”. Most software that became non-open-source, was eventually forked or got into disuse, or else suffered from a lot of bad publicity.
As a result, a high-quality software has a licence that is the most usable in the context of its common use cases. These licences are doubly-important for freely-distributed UNIX software.
This is a meta-parameter for quality. When people say that something “just works”, they mean that you don’t have to be concerned about getting it up and running, not spend too much time to learn it, not worry about it destroying data, or have to wonder how to troubleshoot problems with it.
A program that just works is the holy grail of high-quality software. In practice this means several things:
A “just works” software also doesn’t have any show-stopping bugs. While it may still have some bugs, it should mostly function correctly.
It has most of the features people want and does not lack essential ones. For example, GNU Arch, an old and now mostly unused version control program, did not work on Windows 32-bit, while Subversion, a different and popular alternative, has a native port. Moreover, Mercurial, a different alternative, cannot keep empty directories (or trees of directories not containing files) in the repository. This may make both Mercurial and GNU Arch a no-starter for many uses.
Tendra is the most prominent alternative C and C++ compiler to GCC, but it’s hardly as advanced as GCC is, does not have all of GCC’s features and extensions, and is not usable as a replacement for GCC for most needs. As such, it is hardly ever used.
A “just works” software also has good usability. What it means is that it behaves like people expect it to. The Emacs-based editors, which are an alternative to Vim, do not invoke the menus upon pressing “Alt+F”, “Alt+E”, etc. which is the Windows convention to them.
Furthermore, when putting a single-line prompt, the prompt cannot be dismissed with either Ctrl+C or ESC, while in Vim, both keys dismiss the prompt. The key combination to dismiss it is not written anywhere on the screen and I won’t tell you what it is. According to User Interface Design for Programmers, “A user-interface is well-designed when the program behaves exactly how the user thought it would”.
While some people may be led to believe this is not applicable to terminal applications, TTY applications, command line applications, or even Application Programming Interfaces (APIs) - it still holds there. One thing that made me like gvim (the Graphical front-end to vim) was that it could be configured to behave much like a Windows editor. I gradually learnt more and more vim paradigms, but found the intuitive usability a great advantage. But I could never quite get used to Emacs.
Most software applications and libraries of high-quality have a homepage which introduces them, has download information, gives links, and provides a starting point to receive more information. And no - a /project/myprogram/
page on Source Forge or a different software hub - is much more sub-optimal than that, and leaves a bad impression.
A high-quality program is easy to compile, deploy and install. It builds out of the box with minimal hassles. There are several common standard building procedures for such software:
The common standard building procedure using the GNU Autotools is ./configure --prefix=$PREFIX ; make ; make install
.
There are now some more modern alternatives to the GNU Autotools, which may also prove useful.
CPAN Perl distributions have a similar perl Makefile.PL
procedure or more recently also one using perl Build.PL
which tends to be less quirky (see Module-Build ).
Generally, one usually installs them using the CPAN.pm or CPANPLUS.pm interfaces to CPAN, or preferably using a wrapper that converts every CPAN distribution to a native (or otherwise easy to remove) native system package.
Python packages have the standard setup.py
procedure which can also generate Linux RPMs and other native packages.
There are similar building procedures for most other technologies out there.
However, it’s not uncommon to find a program that fails to build even on GNU/Linux on an x86 computer, which is the most common platform for development. Or the case of the qmail email server, which has a long and quirky build process. It reportedly fails to compile on modern Linuxes, and someone I know who tried to build it said that it did not work after following all the steps.
One thing that detracts from a piece of software being high-quality is a large amount of dependencies.
If we take Plagger, a web-feed mix-and-match framework in Perl (not unlike Yahoo Pipes, but predates it), then its Plagger distribution on CPAN contains all of its plug-ins inside, and as a result requires “half of CPAN” including such obscure modules, as those for handling Chinese and Japanese dates.
Popular programs like GCC, perl 5, Vim, Subversion and Emacs have very few dependencies and they are normally included in the package, if necessary to build the system. They are all written in very portable ANSI C and POSIX and have been successfully deployed on all modern UNIX-flavours, on Microsoft Windows and on many other more obscure systems.
While reducing the number of dependencies often means re-inventing wheels, it still increases the quality of your software. I’m not saying a program cannot be high-quality if it has a large amount of dependencies, but it’s still a good idea to keep it to a minimum.
A good program has packages for most common distributions, or such packages can be easily prepared.
Lack of such packages will require installing it from source, using generic binary packages, or other workarounds that are harder than a simple command to install the package from the package manager, and may prevent it from being maintained into the future.
A good example for how this can become wrong is the qmail SMTP server, before it became public-domain. The qmail copyright terms prevented distributing modified sources, or binary packages. As a result, the distributions that supported it packaged it as a source package, with an irregular build-process. Since the qmail package had its own unconventional idea of directory structure, some of the distributions had to extensively patch it. This in turn prevented more mainstream patches from being applied correctly to correct the many limitations that qmail had, or accumulated over the years due to its lack of maintenance.
If your GUI program is simple and well-designed, then you normally don’t need good documentation. However, a command line program, a library, etc. does need one, or else the user won’t know what to do.
There are many types of documentation: the --help flag, the man page, the README/USAGE/INSTALL files, full-fledged in-depth guides, documents about the philosophy, wikis, etc. If the program is well-designed, then the user should be able to get up and running quickly. An exception to this are various highly specialised programs, such as 3-D graphics programs, or CAD programs, that require some extensive learning.
If we take Subversion as an example, then it has a full Book online, several tutorials, an svn help
command which provides help to all the other commands, and a lot of help can be found using a Google search. GNU Arch, on the other hand, only had one wordy tutorial, that I didn’t want to read. Most of the other tutorials people wrote, became misleading or non-functional as the program broke backwards compatibility.
Vim has an excellent internal documentation system. It’s the first thing you are directed see when invoking it. It has a comprehensive tutorial, a full manual, and the ability to search for many keywords, with a lot of redundancy. As a result, one can easily become better and better with vim or gvim, albeit many people can happily use it with only the bare essentials.
Emacs’ help on the other hand is confusing, dis-organised, lacking in explanation and idiosyncratic. It doesn’t get invoked when pressing “F1”, is not directed to when the program starts, and most people cannot make heads nor tails of it. There is a short Emacs tutorial, but it isn’t as extensive as Vim’s. Nor does it explain how to configure Emacs to behave in a better way than its default, in which it behaves completely differently to what people who are used to Windows-like conventions or vim-like conventions expect.
It is tempting to believe that by writing a program for one platform, you can gain most of your market-share. However, people are using many platforms on many different CPU architectures: Windows 32-bit/64-bit on Intel machines, Itanium, or x86-64; Linux on a multitude of platforms; BSD systems (NetBSD, FreeBSD, OpenBSD and others) on many architectures as well; Mac OS X; Sun Solaris (now also OpenSolaris), and more obscure (but still popular) Unix-clones like AIX, HP-UX, IRIX, SCO UNIX, Tru64 (formerly Digital Unix), etc. And to say nothing of more exotic, non-UNIX, non-Microsoft operating systems like Novell Netware, Digital Corp.s’s VMS or OpenVMS, IBM’s VM/CMS or OS/390 (MVS), BeOS, AmigaOS, Mac OS 9 or earlier, PalmOS, VxWorks, etc. etc.
As a general rule, the only thing that runs on top of all of these systems (in the modern “All the world is a VAX.” world) is a C-based program or something that is C-hosted. [non-vax-like]Most good programs are portable to at least Windows and most UNIXes and potentially portable to other platforms.
For example, Subversion has made it a high priority to work properly on Windows. On the other hand many of its early alternatives, especially GNU Arch, could not work there due to their architectures. As a result, many mixed shops, Windows-only shops, or companies where some developers wanted to use Windows as their desktop OS, could not use Arch. So Arch has seen a very small penetration.
The bootstrapping ANSI C compiler of gcc for example, is written in very portable K&R C, so it can be compiled by any C compiler. Later on, this compiler can be used to compile most of the rest of the GCC compilers.
Compare that to many compilers for other languages that are written in the same language. For example, GHC - The Glasgow Haskell Compiler is written in itself, and requires a relatively recent version of itself to compile itself. So you need to bootstrap several intermediate compilers to build it.
A high-quality program is secure. It has a relatively small number of security issues, and bugs are fixed there as soon as possible.
Some people believe that security is the most important aspect of software, but it’s only one factor that affects its quality. For example, once I was talking with a certain UNIX expert, and he argued that the Win32 CreateProcess() system call was superior to the UNIX combination of Fork() and Exec(), just because it made some bugs harder to code. However, some multitasking paradigms are not possible, without the fork() system call, which is not present in the Win32 API at all, and needs to be emulated (at a high run-time cost) or replaced with thread-based multitasking, which is not identical. Finally, it is still possible to get fork()+exec() right, and there’s a spawn() abstraction on many modern UNIXes.
While I don’t mean you shouldn’t pay attention to security, or keep good security practices in mind when coding, I’m saying that it shouldn’t slow down the process by much, or prevent too many exciting features from being added, or cause the development to stagnate.
A high-quality program maintains as much backward compatibility with its older versions as possible. Some backward compatibility, like relying on bugs or other misbehaviours (“bug-to-bug compatibility”), is probably too extreme to consider. But users would like to upgrade the software and expect all of their programs to just continue to work.
A bad example for software that does not maintain backwards compatibility is PHP, where every primary digit breaks the compatibility with the older one: PHP 4 was not compatible with PHP 3 and PHP 5 was not compatible with PHP 4. Furthermore, sometimes existing user-land code was broken in minor-digit releases. As such, maintaining PHP code into the future is a very costly process, especially if you want it to work with a range of versions.
On the other hand, the perl5 developers have been maintaining backwards compatibility between 5.000, 5.001, 5.002 up to 5.6.x, 5.8.x and now 5.10.x. Therefore, one can normally expect older scripts to just work. perl5 can also run a lot of Perl 4 code and below, and Perl 4 code can be ported to modern versions of perl5 with relative ease. While sometimes scripts, programs or modules were broken (due to lack of “bug-to-bug compatibility”), or became slower, upgrading to a new version of Perl is normally straightforward.
A piece of high-quality software has good ways for its users to receive support. Some examples for ways to do that are:
A Mailing List.
IRC (Internet Relay Chats) Channels.
An email address for questions.
Web Forums.
Wikis.
Without good ways to receive support, users will be unnecessarily frustrated when they encounter a problem, which cannot be answered by the documentation. Refer to Joel Spolsky’s “Seven Steps to Remarkable Customer Service” for more information on how to give good support.
The reason I mentioned this quality parameter so late is because it was what Mr. Campbell stressed in his argument about “Industrial Strength” Freecell solvers. So I wanted to show that there are other important parameters beside it. However, raw performance is important, too.
If a program is too slow, or generates sub-optimal results, most people will be reluctant to use it and find using it daunting. They will either give up waiting for it to finish, or get distracted. If the output results of the program are too sub-optimal (assuming there’s a scale to their optimality), then they will probably look for different alternatives.
As a result, it is important that your software will run quickly, and will yield good results. There are many ways to make code write faster, and covering them here is out of the scope of this article.
A good example for how such optimisations can make such a huge difference are the memory optimisations done to Firefox between Firefox 2 and Firefox 3, which greatly improved its performance, memory consumption, and reduced the number of memory leaks. It should be noted that often, reducing memory consumption can yield better performance because of a smaller number of cache misses, process memory swapping, and other such factors.
There are several related aspects of performance, that also affect the general quality and usability of a program. One of them is responsiveness, which is often manifested when people complain that the program is “sluggish”. Java programs are especially notorious for being such, for some reason, while programs written in Perl and Python are more responsive and feel snappy, despite the fact that their backends are generally slower than the Java virtual machine.
A tangential aspect is that of startup time. Many programs require or have required a long time to start, which also makes using them frustrating, even if they are later responsive and quick.
A good program (or web site) or other resource is aesthetically pleasing. Aesthetics in this context, does not necessarily mean very “artsy” or having a breath-taking style. But we may have run into software (usually one for internal use or one of those very costly, bad-quality, niche, software) that seemed very ugly and badly designed, with a horrible user-interface, etc.
Different types of applications, and those running on different platforms, have different conventions for what is considered aesthetic. In The Art of UNIX Programming, Eric Raymond makes the case for the “Silence is Golden” principle of designing UNIX command-line interfaces. Basically, a command line program should output as little as possible. Now observe the behaviour of aptitude (a unified interface for package management) on Ubuntu Gutsy Gibbon, when trying to install a non-existing package name:
root@shlomif-desktop:/home/shlomif# aptitude install this-does-not-exist Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Building tag database... Done Couldn’t find any package whose name or description matched "this-does-not-exist" The following packages have been kept back: firefox firefox-gnome-support 0 packages upgraded, 0 newly installed, 0 to remove and 2 not upgraded. Need to get 0B of archives. After unpacking 0B will be used. Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Building tag database... Done
15 lines of output, and only one of them in the middle is the informative one. Why is all this information a concern of mine, especially given the fact that they are all given in the same monotonous default colour.
On the other hand, here’s what urpmi (a similar package management interface for Mandriva) says on Mandriva Cooker:
[root@telaviv1 ~]# urpmi this-does-not-exist No package named this-does-not-exist [root@telaviv1 ~]#
Exactly one line and it’s informative. While aptitude certainly has its merits, its verbosity still makes it much more painful to use than urpmi, when I have to work on Ubuntu.
Back to more visual aesthetics, one of the reasons that made me want to use Linux more than Windows 95’ or 98’ was the fact that its desktops were truly themable and could be made to look much better without effort. If I got tired of the same look, I could easily switch. While Windows XP shipped with a more attractive theme, and also had some proprietary and non-gratis theming software, Linux supplied all of that out-of-the-box and with a more attractive theme. The effects supplied by the Linux 3-D desktops, which have put the 9-milliard Dollar effects of Vista to shame, have convinced some people to install Linux on their computer after seeing them.
There are probably several parameters for software quality that I’m missing. However, the point is that one should evaluate the general quality of the software based on many parameters and not exclusively “security” or “speed” or whatever.
For example, many proponents of BSD operating systems claim that the various BSDs are superior to Linux because they are more secure, or because they are (supposedly) faster or are easier to manage, because their licence is less problematic than the GPL, etc. However, they forget that Linux has some advantages like being more popular (and so one can get support more easily), or like the fact that its kernel supports much more hardware, or that it has better vendor acceptance, and because more software is guaranteed to run with less problems on Linux than on the BSDs. [linux-bsd-soft]
I’m not saying the BSDs are completely inferior to Linux, just that Linux still has some cultural and technical advantages. Quality in software is not a linear metric, because it is affected by many parameters. If you’re a software developer, you should aim to get as many of the parameters I mentioned right.
[having_good_code] What? Shouldn’t high quality software have a good codebase. Surprisingly no. What would you prefer: having a very modular codebase that does something pretty useless like outputting the string “Hello World” or having a large codebase of relatively low quality with a large amount of useful functionality and relatively few bugs?
I would certainly prefer the other alternative. That’s because I know I can always refactor that codebase, either by large scale refactoring or by continuous refactoring or even “Just-in-Time” refactoring, only to add a new feature.
[Licence] For example, I mention later that the more liberal the source licence of a program, the higher quality it is. Obviously, a lot of non- “Free and Open Source Software” or even binary-only applications are high-quality too. But, the use of such software is still more limited than an open-source one, so it would be of lesser quality than an identical software that is open-source.
Similarly, libraries or programs that are distributed under the relatively restrictive GNU General Public Licence (GPL) (and which are considered open-source and usable by most Linux distributions) cannot be used in many common situations, and so would be of lesser quality than programs under a more permissive licence.
[Ideology] Assuming there was ever a true ideology that was not also practical.
The way I see it, an ideology (and ethics in general) are a strategy that aims to make a person lead a better, happier life. If it isn’t, then it’s just a destructive dogma, or just plain stubbornness.
[public-domain] Using a pure-public-domain licensing terms for your software is problematic because not all countries have a concept of “public-domain”, similar to that of the United States, because many people misinterpret it, and because it is not clear whether software can be licensed under the public-domain to begin with. (And other such issues).
While quite a lot of important programs has been released under the public domain, and they are doing quite fine, they may have some problematic legal implications.
For these reasons, I now prefer the MIT X11 Licence for software that I originated instead of the “public domain”.
[non-vax-like] I’m fully aware that before C-based UNIX and UNIX-like systems became dominant there were some more exotic architectures that could not run C comfortably. Prime examples for them are the PDP-10 and the Lisp machines.
However, such more “unconventional” architectures are now dead, and no CPU architecture developer in their right mind would want to create a CPU that won’t be able to run C and C-based UNIX-based or UNIX-like operating systems such as Linux. (Unless it’s probably a relatively niche micro-processor for embedded systems).
Lisp, and similar higher-level languages, run on modern UNIX-based OSes very well, so there’s not a big problem there.
[linux-bsd-soft] Naturally, this is a problem with the fact that most developers are developing on Linux (mostly x86), don’t test it on other Unix flavours, and are too careless or unaware to write their programs portably enough.
However, it’s still a quality parameter, because it still affects the way you’re using the operating system.