Monday, May 22, 2006

Character Density as a Measure of Programming Power

Having just written a post about how there are really only two sets of languages (work vs. home), I am going to compare the two (in my case C# and Ruby) using the metric I discussed in that article, namely that expressive power of an implementation (and by extension the language used for that implementation) can be measured by the minimization of characters required to define the exact same programmatic functionality.

I argue that if a program with the same functionality takes less characters to produce, then that program can be produced with greater ease and in less time. Less typing strain on a programmer's hands is always a good thing. For the purposes of this metric, I chose to remove white space from the count. I admit that this might skew the results towards certain languages; Python comes to mind because of its use of syntactic indentation. Other programming languages usually have delimiters that Python won't (C languages have {}, Ruby has begin..end, etc.).

It is a commonly held belief that less code results in less bugs. This argument is based around the assumption that the rate of bugs per lines of code is consistent across large and small programs (for an individual programmer, or for a group of programmers). So, if you have less lines of code, you have less bugs. I think it is more accurate to say that less characters of code means less bugs. Measuring by character takes into account syntactic sugar that a programming language offers, or special constructs it might employ to get the job done. By leaving the white space out of the count, we can balance the need for human readibilty while still preserving our desire to minimize the number of characters.

Another advantage of this metric is that it can be used to simplify existing code in whatever language you are working in. Even if you are working in a generally higher character count language, you can still shrink your code by removing unnecessary characters. I would caution that this technique should not be taken to the extreme. Program readibility and maintainability is far more important that character density.

Character counting is also very easy to implement in a variety of languages. You can grow your own in about 20 minutes including testing. I did mine in Ruby; it's 11 lines of code including the object definition. Another 3 lines of code was required to instantiate the object and pass it a message to count the characters in my target files. The total character count for my character counter is 411, 174 of which is the actual class, the rest is the calls to my specific (and rather long) file locations.

Domain Specific Languages (DSL) are something that Lispers (and sometimes Ruby users) point to as being a major benefit of the flexibilities of the language. Basically, a domain specific language is one that is purposefully geared to solving your specific domain problem. A good example is SQL. SQL is uniquely positioned to work with large groups of data. The advantage given by a DSL is that it has specific language constructs that help solve your problem (imagine trying to do SQL without Update, for example). These core constructs are either given their own syntax, or indicators (symbols that represent the use of the construct) or both. Often, since these constructs are meant to be used over and over, they are represented by short atoms, or a string without very many characters. Part of the advantage of a DSL is that not only does it provide you with the constructs you need often, but it does so with minimal typing. Reducing the characters makes it easier to use your specific important constructs.

Paul Graham talks about the advantage of renaming the lambda form in his version of Lisp (Arc) to fn. That's 4 less characters. He also mentions several other forms which he renamed in Arc, and how they are making code more readible and easier to produce. If you reduce the amount of characters you have to type in order to do common tasks, you have a net gain in program readibility and length (from length reduction you get error reduction as well). This improvements range from the renaming of commonly used functions to the simplification of commonly used constructs (in Arc's case, the let construct).

Basically less characters leads to less work and less code which leads to less bugs. Who doesn't want a quick metric to measure that?

Ruby vs. Python vs. Lisp vs. Smalltalk vs ... X

I have been reading a lot of articles comparing languages on reddit lately. They talk about Ruby, Python, Smalltalk, Lisp (and all its dialects, and all their comparisons), C#, C/C++, Java, et cetera. What strikes me most about these articles is that they are basically doing binary comparisons between two languages in an attempt to setup some kind of hierarchy. I don't think there is, or should be, a hierarchy for languages. I propose a new model, one consisting of two sets of languages. The sets have different entries for different people. The two groups are:

- The languages you use at work
- The languages you use at home

Most everyone can identify with both of these two groups. Obviously, different languages are going to show up in different groups depending on the person. Paul Graham can say that he uses Lisp (Arc?) for both home and work. I can say that I use C# for work and Ruby for home. In fact, my groups look like:

Home = (Ruby (learning), C#, Python (learning), Scheme, SQL)
Work = (C#, SQL)

From those lists it seems clear that my home languages are a superset of my work languages. It can be argued that programmers when working for themselves are likely to choose the most powerful language they can wield, because they aren't afraid they are going to have to replace the programmer, and one who knows the powerful language might be hard to find, or very expensive. Home programmers don't have to make this tradeoff, they are the programmer. [Technical note: I define the power of a language to be the minimization of characters that are requied to perform a specific task. A future article will explore this metric using C# and Ruby to complete the same simple task.]

Languages fall into two groups, those that are safe for large companies to use for large projects (work group aka Java, C#) and those that are more powerful but also have more risk if improperly used (home group aka Python, Ruby, Lisp, Perl, etc.). Obviously, there are people using Java and C# at home, and there are (very lucky!) people using Ruby, Python and Lisp at work. This doesn't negate my assertion that most people don't code at home in the language they use at work.

Because of the existence of these two groups, I think it isn't worthwhile to argue endlessly about whether Python or Ruby is better at some specific syntax or task. I think it would be more worthwhile to use the advantage gained from more powerful and flexibile languages (whichever you favorite might be) to produce software that is cleaner, shorter (in terms of code), easier to maintain, and more powerful. We shouldn't let the risk avoidance of businesses get in the way of our producing excellent software from more powerful constructs.

Take a chance. Fail. Then do it again. Don't spend your time arguing about what way is least likely to fail (because you are using the most productive language/framework/methodology). Nike got it right, Just do it.