Monday, May 22, 2006

Character Density as a Measure of Programming Power

Having just written a post about how there are really only two sets of languages (work vs. home), I am going to compare the two (in my case C# and Ruby) using the metric I discussed in that article, namely that expressive power of an implementation (and by extension the language used for that implementation) can be measured by the minimization of characters required to define the exact same programmatic functionality.

I argue that if a program with the same functionality takes less characters to produce, then that program can be produced with greater ease and in less time. Less typing strain on a programmer's hands is always a good thing. For the purposes of this metric, I chose to remove white space from the count. I admit that this might skew the results towards certain languages; Python comes to mind because of its use of syntactic indentation. Other programming languages usually have delimiters that Python won't (C languages have {}, Ruby has begin..end, etc.).

It is a commonly held belief that less code results in less bugs. This argument is based around the assumption that the rate of bugs per lines of code is consistent across large and small programs (for an individual programmer, or for a group of programmers). So, if you have less lines of code, you have less bugs. I think it is more accurate to say that less characters of code means less bugs. Measuring by character takes into account syntactic sugar that a programming language offers, or special constructs it might employ to get the job done. By leaving the white space out of the count, we can balance the need for human readibilty while still preserving our desire to minimize the number of characters.

Another advantage of this metric is that it can be used to simplify existing code in whatever language you are working in. Even if you are working in a generally higher character count language, you can still shrink your code by removing unnecessary characters. I would caution that this technique should not be taken to the extreme. Program readibility and maintainability is far more important that character density.

Character counting is also very easy to implement in a variety of languages. You can grow your own in about 20 minutes including testing. I did mine in Ruby; it's 11 lines of code including the object definition. Another 3 lines of code was required to instantiate the object and pass it a message to count the characters in my target files. The total character count for my character counter is 411, 174 of which is the actual class, the rest is the calls to my specific (and rather long) file locations.

Domain Specific Languages (DSL) are something that Lispers (and sometimes Ruby users) point to as being a major benefit of the flexibilities of the language. Basically, a domain specific language is one that is purposefully geared to solving your specific domain problem. A good example is SQL. SQL is uniquely positioned to work with large groups of data. The advantage given by a DSL is that it has specific language constructs that help solve your problem (imagine trying to do SQL without Update, for example). These core constructs are either given their own syntax, or indicators (symbols that represent the use of the construct) or both. Often, since these constructs are meant to be used over and over, they are represented by short atoms, or a string without very many characters. Part of the advantage of a DSL is that not only does it provide you with the constructs you need often, but it does so with minimal typing. Reducing the characters makes it easier to use your specific important constructs.

Paul Graham talks about the advantage of renaming the lambda form in his version of Lisp (Arc) to fn. That's 4 less characters. He also mentions several other forms which he renamed in Arc, and how they are making code more readible and easier to produce. If you reduce the amount of characters you have to type in order to do common tasks, you have a net gain in program readibility and length (from length reduction you get error reduction as well). This improvements range from the renaming of commonly used functions to the simplification of commonly used constructs (in Arc's case, the let construct).

Basically less characters leads to less work and less code which leads to less bugs. Who doesn't want a quick metric to measure that?

0 Comments:

Post a Comment

<< Home