Literate Programming, Code Readability

December 28, 2020

Recently, I heard about a concept called Literate Programming. It’s an approach that focuses on letting people write code for people, rather than in a way dictated by the language or computer. It sort of combines prose and code together, in a more human-friendly way.

I admittedly know very little about it at the moment, but it seems like an interesting way to try to improve code maintenance. It reminds me of some clean code patterns–specifically, writing methods that stay at the right abstraction level. I’m interested in trying to tie together the two ideas.

An Example

Here’s an example from Physically Based Rendering, which is where I found out about literate programming:

<<Main program>>= 
int main(int argc, char *argv[]) {
    Options options;
    std::vector<std::string> filenames;
    <<Process command-line arguments>> 
    pbrtInit(options);
    <<Process scene description>> 
    pbrtCleanup();
    return 0;
}

While some of this is code, some of it is also not code–namely the lines with <<>> in them. There are three such lines here; they specify main program, processing command-line args, and processing the scene description. The main program line is a definition, indicated by the =. The others are, essentially, code that is defined elsewhere.

Here’s the code for processing the scene description:

<<Process scene description>>= 
if (filenames.size() == 0) {
    <<Parse scene from standard input>> 
} else {
    <<Parse scene from input files>> 
}

My understanding is that this code is “inserted” at the appropriate place in main program via a preprocessing step. The code that gets compiled looks as if you had just written this code in main.

This is not a function call, but (for me) it’s easier to think about as if it is.

The thing that excites me here is that this is a form of abstraction. It’s a more conceptual abstraction than technical abstraction, but it’s still abstraction.

Abstraction Levels

Now, abstraction levels. Whenever we discuss a topic, there are multiple “levels” that we can discuss it at.

The principle here: keep explanations at the same level.

Another way to phrase this: explain what you are doing, not how you are doing it.

A Good Example

For example, to get changed, I do three things:

Select a new outfit.
Take off the clothes I’m wearing.
Put on the clothes I’ve selected.

You now understand what I do to get changed. You’d also understand this code:

void changeClothes() {
	Collction<Garment> newOutfit = selectOutfit();
	takeOffCurrentClothes();
	putOn(newOutfit);
}

Both the English and the code examples stay at the same abstraction level.

A Bad Example

That example might have seemed painfully simple and/or stupid. It helps to contrast against an example where we don’t stay at a single level. Consider this approach to changing clothes:

Go through each garment I own.
If it’s dirty, ignore it.
If it’s clean and it’s the first clean garment of that type (shirt, pants, socks, etc.), take it.
Ignore all the other ones.
Go through each item I’m currently wearing.
Take it off.
Go through each item I picked out.
Put it on.

This explanation gets obnoxiously into details that you don’t care about. And that’s only going one level deeper–imagine if we tried to explain what “putting on” an individual item means.

In Practice

In practice, this gets you pretty far. However, there are some scenarios where writing things at the right level seems either hard to determine or obnoxious to do.

I have thoughts on this, but getting into it is a bit out of scope. I’m going to move on.

Synthesis

Both of these approaches aim for the same thing: abstraction. Literate programming and function extraction work very differently, but the overall shape and goal are very close.

The ultimate question I’m interested in: what practices should I follow to write maintainable code?

There are sub-questions here as well. Literate programming seems powerful–should I use it? Is it still relevant? If yes, why haven’t I heard about it until now?

For the latter questions, in a nutshell, it seems rather inconvenient to do (based on this source). In particular, it seems like tooling is lacking and literate programming might not adapt well when code is significantly changed.

However, literate programming makes it very obvious what is going on because that’s how you must write your code. That seems like the essence that I’d want to take away.

Hypothesis

I’ll close on what I think should be followed.

The main heuristic: aim for readers to never have to figure out what you’re doing.

Stated differently, I never want to have to figure out what your code is trying to do. I want to already know what a piece of code aims to do. Then, I can decide whether or not it actually does that.

The main strategy to accomplishing this is to make code simple enough to follow. One key technique is the one we already looked at: function extraction. Or to be more precise, naming. The name that we give a piece of code should often explain what it aims to do.

Naming isn’t enough, however. There are times when a name can’t fully capture what a reader has to be aware of. That’s where documentation and comments come in. Notably, I think there should be a comment that explains each step during a method.

I’d also say that the bar should be very low for whether or not something deserves a comment. I believe that only obvious things don’t need comments and that most things aren’t obvious.