After the more abstract talk I’d like to come back to something concrete. Regular Expressions, or regex, are powerful but often inscrutable. Today let’s see how we could make them easier to use through typography and visualization without diminishing that power.

Regular Expressions are essentially a mini language embedded inside of your regular language. I’ve often seen regex written like this,

new Regex(“^\\s+([a-Z]|[0-9])\\w+\\$$”) // regex for special vars

then sometimes reformatted like this

//regex for special vars
new Regex(
     “^” // start of line
     +“\\s+” // one or more whitespace
     +“([a-Z]|[0-9])” //one letter or number
     +”\w+” // one word
     +”\\$” // the literal dollar sign
     +”$” // end of line
)

The fact that the author had to manually split up the text and add comments simply screams for a better syntax. Something far more readable but still terse. It would seem that readability and conciseness would be mutually exclusive. Anything more readable would also be far longer, eliminating much of the value of regex, right?

Au contraire! We have a special power at our disposal: we can render whatever we want in the editor. The compiler doesn’t care as long as it receives the description in some canonical form. So rather than choosing readable vs terse we can cheat and do both.

Font Styling

Let’s start with the standard syntax but replacing delimiters and escape sequences with typographic choices. This regex looks for variable names in a made up language. Variables must start with a letter or number, followed by any word character, followed by a dollar sign. Here is the same regex using bold italics for all special chars:

text

Those problematic double escaped backslashes are gone. We can tell which dollar sign is a literal and which is a magic variable thanks to font styling.

Color and Brackets

Next we can turn the literals green and the braces gray, so it’s clear which part is actual words. We can also replace the $ and ^ with symbols that actually look like beginning and ending of lines. Open and close up brackets or floor brackets.

text

Now one more thing, let’s replace the w and s magic characters with something that looks even more different than plain text: letter boxes. These are actual unicode characters from the "Unicode Characters in the Enclosed Alphanumeric Supplement Block".

text

Now the regex is much easier to read but still compact. However, unless you are really familiar with regex syntax you may still be confused. You still have a bunch of specific symbols and notation to remember. How could we solve this?

Think Bigger

Let’s invent a second notation that is even easier to read. We already have a clue of how this should work. Good programmers write the regex vertically with comments as an adhoc secondary notation. Let’s make it official.

If we have two views then the editor could switch between them as desired. When the user clicks on the regex is will expand to something like this:

text

We get what looks like a tiny spreadsheet where we can edit any term directly, using code completion so we don’t have to remember the exact terms. Furthermore the IDE can show the extra documentation hints only when we are in this "detailed" mode.

Even with this though, there is still a problem with regex. Just by looking at it you can’t always tell what it will match and what it won’t. Without a long comment, how do programmers today know what a regex will match. They use their brains. Yes, they actually look at the text and imagine what it will match in their heads. They essentially simulate the computer the regex mentally. This is totally backwards! The whole point of a computer is to simulate things for us, not the other way around. Can’t we make the computer do it’s job?!

Simulate

Instead of simulating the regex mentally, let’s give it some strings and have the computer tell us what will match. Let’s see what that would look like:

regex tester

We can type in as many examples as we want. Having direct examples also lets us test out the edge cases, which is often where regexes fail.

Hmm. You know… this is basically some unit tests for the regex. If we just add another column like this:

regex unit tests

then we can see at a glance if the regex is working as expected. In this example the last test is failing so it is highlighted in red.

Most importantly, the unit tests are right next to the regex in the code; right where they are used. If we collapse the triangle then everything goes away. It’s still there, just hidden, until we need it.

This is the power of having a smart IDE and flexible syntax. Because these are only visualization changes it would work with any existing compiler. No new language required.

I’ve talked a lot about ways to improve the syntax and process of writing code. In Typographic Programming Language, Fonts, and Tabs Vs Spaces I've talked about the details of how to improve programming. However, I haven't really talked about my larger vision. Where am I actually going with this?

My real goal is to build the Ultimate IDE and Programming Language for solving problems cleanly and simply. Ambitious much?

Actually, my real goal is to create the computer from Star Trek (the one with Majel Barrett's voice).

Actually, my real goal is to create a cybernetically enhanced programmer.

Okay. Let’s back up a bit.

The Big Picture

When we are programming, what are we really doing? What is the essence? Once we figure that out we should be able to work backwards from there.

Programming is when you have a problem to solve and you tell the computer how to solve it for you. That’s really it. Everything we do comes back to that fundamental. Functions are ways of encapsulating solving stuff in small chunks so we can reason about it. The same thing with objects (often used to "model" what we are solving). Unit tests are to help verify what we are solving, though it often helps to define the problem as well. At the end of the day it's all to serve the goal of teaching the computer how to solve problems for us.

So how could we compare one PL against another?

Metrics

From here out I'm going to use Programming Language, PL, to mean the entire system of IDE, compiler, editor, build tools, etc. The entire system that lets you have the computer solve a problem for you.

How can we say that one PL is better than another? By what metric? Well, how about what lets you solve the program the fastest, or the easiest, or with the least headache. That sounds nice and is certainly true, but it's rather subjective. We need something more empirical.

Hmm. That's not very helpful. Let's back up again.

Paper and Pencil

What is a pad of paper and a pencil? It’s a thinking device. If you want to add up some numbers you can do it in your head, but after more than a few numbers it becomes tricky to manage them all. So we write them down. We outsource part of the process to some paper.

If we want to do division we can do it with the long division process. This is actually not the most efficient way to divide numbers, but it works well because you can have the paper manage all of the state for you.

What if you need to remember a list of things to do? You write it down on paper. The paper becomes an extension of your brain. It is a tool for thinking. This perhaps explains some people’s fetish over moleskin style sketch books (not that I would ever invest in a Kickstarter for sketchbooks.

A Tool for Thinking

If we think of a programming language as a tool for thinking, then it becomes clear. The PL helps you tell the computer what to do. So a good PL would help you around the parts of programming that are hard: namely keeping state in your head. A bad PL requires you to remember a lot of things.

For example, suppose you write a function that calls another function named 'foo'. You must remember what this other function does and what it accepts and what it returns. If the function is named well, say 'increment', then the function name itself helps your brain.You have less to remember because the function name carries information.

Now suppose we have type information in our PL. The function increment only takes numbers. Now I don’t have to remember that it only takes numbers; the compiler will enforce it for me. Of course I only find this out when I compile the code. To make the PL give me less to remember the IDE can compile constantly, giving me an immediate warning when I do something bad. Code completion can also help by only suggesting local values that are numbers.

With these types of features the PL acts as a tool for thinking. An extension of the brain by holding state.

So we can say a PL is “better” if it reduces the cognitive load of the programmer. Things like deterministic performance and runtime speed are nice, but they are (or at least should be) secondary to reducing the cognitive load of the programmer. Of course a program which is too slow to be used is a failure, so ultimately it does matter. However computers get faster. The runtime performance is less of a concern than the quality of the code. Perhaps this explains the resurgence in functional programming styles. The runtime hit matters less than it did 30 years ago, so we can afford to waste cycles on things which reduce the cognitive load of the programmer.

Regex

Now, where were we? Oh, right; the Fire Swamp.

Let's look at a more concrete example. Regular expressions. Regexes are powerful. They are concise. But a given regex is not easy to read if you didn’t write it yourself; or even if you did but you didn’t just write it. Look at your regex from 6 months ago sometime. I hope you added good documentation.

A regex conveys a lot of information. You have a lot to load up into your brain. When you look at a regex to see what it does you have to start simulating it in your brain. Your brain basically becomes a crappy computer that executes the regex on hypothetical strings.

That's crazy. Why are we simulating a state machine in our heads? That’s what we have computers for! To simulate things. We should have the PL show the regex to us in an easier to understand form, or perhaps even multiple forms. With unit tests. And visualizers. Or an embedded regex simulator to show how it works. I have a lot to say about regular expressions in an upcoming blog, but for now I'll just say they are some extremely low hanging fruit in the mission to reduce cognitive load.

Expanding Complexity

Now you might think, if we make a PL which sufficiently reduces cognitive load then programmers will have little to do. Programming will become so easy that anyone could do it. True, this might happen to some degree. In fact, I would argue that letting novices program is actually a good thing (though we would call it problem description rather than programming). However, in general this won’t happen. As we decrease the cognitive load we increase the complexity of the programs we can make.

There seems to be a limit to how much complexity the human brain can handle. This limit varies from person to person of course. And it is affected by your health, hunger level, stress, tiredness, etc. (Never code on an empty stomach.) But there is a limit.

Historically better tools have reduced the complexity at hand to something below the cognitive limit of the programmer. So what did we do? We tackled more complex tasks.

Josh’s first postulate of complexity: Just as data always expands to fill available disk space, programing tasks always increase in complexity to fit our brain budget.

Offloading complexity to tools merely allows us to tackle bigger problems. It’s really no different then the brain boost we got from inventing paper and pencil, just separated by a few thousand years. [1]

Now I think we can agree that a PL which reduces more cognitive load of the human is better than one which reduces less. That's our metric. So how can we turn this into something actionable? Does it suggest possible improvements to real world programming languages?

The answer is yes! If we stop focusing on PL implementations but rather the user experience of the programmer, then many ideas become readily available. Everything I’ve discussed in previous blogs is driven by this core principle. Showing a color or image inline reduces cognitive load because you don’t have to visualize in your head what the color or image actually looks like. The editor can just do it for you. This is just the beginning.

My Forever Project

These are ideas I’ve been working on a long time. In fact I recently realized some of these ideas have been in my head for twenty years. I found evidence of tools I (attempted) to write from my college papers. It’s only recently that everything has started to gel into an expressible form. It’s also only recently that we’ve had the "problem" of computers with too much computational power going to waste. This is my Forever Project.

There’s so much possibility now. If a notepad and a text editor are our current cybernetic enhancements, what else could we build? Could Google Glass help you with programming? How about an Oculus Rift? Or using an iPad as an extra screen? The answer is Yes! We could definitely use these to reduce cognitive load while interacting with a computer. We just might not call all of these tasks "programming" but they are (or will be shortly as software continues to eat the world).

deep breath

My concept summarized: Programming systems should not be thought of as ways to make a computer do tricky things. They are ways to make a computer solve problems for you. Problems you would have to do in your head without them (assuming you could do them at all). Thus PLs are tools for thinking. This gives us a metric. Do particular PLs help us think better and solve problems better? As it turns out, most PLs fail miserably by this metric. Or at least they could be a whole lot better. Many ideas from the 60s and 70s still remain unimplemented and unused.

But don’t be sad. Our field is less than 100 years old. We are actually doing pretty well. It took a few thousand years to invent suspension bridges and they still don’t work 100% of the time.

So from now on let us consider how to make better tools for thinking. Everything Bret Victor has been doing, and Jonathan Edwards, and even the work from the 70s & 80s that Alan Kay did with Smalltalk and the Dynabook come back to the same thing: building better tools for thinking. With better tools we can think better thoughts. With better tools we can solve bigger problems.

So let’s get crackin'!

footnote: [1] Before someone calls me on this, I have no idea when paper and pencil were first invented for the purposes of reducing problem complexity. Probably in ancient Egypt for royal bookkeepers. I'll leave that as an exercise to the reader.

So far my posts on Typographic Programming have covered font choices and formatting. Different ways of rendering the source code itself. I haven’t covered the spacing of the code yet, or more specifically: indentation. Or even more specifically: tabs vs spaces.

Put on your asbestos suits, folks. It’s gonna get hot in this kitchen.

Traditionally source code has been rendered with a monospace font. This allows for manual horizontal positioning with spaces or tab characters. Of course the tab character doesn’t have a defined width (I’ll explain in a moment why) so flame wars have erupted around spaces vs tabs, on par with the great editor wars of the last century. Ultimately these are pointless arguments. Tabs vs spaces is an artifact of trying to render code into a monospace grid of characters. It’s the 21st century! We can do better than our dad's 1970s terminal. In fact, they did better in the 19th century!

In The Beginning

Let’s start at the beginning. Fixed whitespace indenting can be used to line things up so they become pretty, and therefore easier to read. But that's a lot of work. All that pressing of space bars and adjusting when things change.

Instead of manually controlling whitespace what if we used tab stops. I don’t mean the tab character, which is mapped to either 4 or 8 spaces, but actual tab stops. Yes. They used to be a real physical thing.

image of typewriter tab stop

In the olden days, back when we used manual typewriters (I think I was the last high school class to take typing on such machines), there was such a thing as a tabstop. These were vertical brackets along the page (well, along that metal bar at the bottom of the current line). These tiny pieces of metal literally stopped the tabs, thus giving them the name tabstops. We were so creative with names in those days.

When you hit the tab key the cursor (a rapidly spinning metal ball imprinted with the noun: “Selectric”) would jump from the left edge of the paper to the first tabstop. Hit tab again and it will go to the next tabstop. Now of course, these tab stops were adjustable, so you could choose the indenting style you wanted for your particular document.

Let me repeat that. The tabs stops could be adjusted to the indenting style of your particular document. Inherent is the concept that there is no “one right way”, but rather the format must suit the needs of the particular document, or part of a document, that you are writing.

When WYSIWYG editors came along they preserved the notion of a tabstop. They even made it better by giving you nice vertical lines to see the effect of changing the tabstop. When you hit tab the text would move to the stop. If you later move the stop then the text aligned with it will magically move as well. Dynamic tabstops! Yay. We can finally rock like its the 1990s.

Word for Mac, circa 1991

Semantic Indentation

So why do we go back to the 1970s with our text editors? Tabstops are a simple concept for semantically (sorta) indenting our code. Let’s see what some code would look like with simple tabular semantic indenting.

Here’s some code with no formatting other than a standard indent.

text

This is your typical Cish code with brackets and parameters. It would be nice to line up the parameters with their types. The drawRect code is also similar between lines. We should clean that up too.

Here is code with semantic indenting.

text

How would you type in such code? When you hit the tab key the text advances to the next tabstop. these tabstops are dynamic, however. Instead of giving you a line with a ruler at the top, the tabstops automatically expand to fit the text in the column. Essentially they act more like spreadsheet cells than tab stops.

Furthermore, the text will be left aligned at the tabstop by default, but right aligned for text that ends with a comma or other special character. This process is completely automatic and hidden, of course. The programmer just hits the tab key and continues typing, the IDE handles all of the formatting details, as it should be. We humans write the important things (the code) and let the computer handle the busy work (formatting).

The tab stops (or columns if you think of it as a table) don’t extend the entire document. They only go down as far as the next logical chunk. There could be multiple ways to define a ‘chunk’ semantically, but one indicator would be a double space. If you break the code flow with a double space then it will revert to the document / project wide defaults. This lets us use standard indentation for common structures like functions and flow control bracing, while still allowing for custom indentation when needed.

Ludicrous Speed

Furthermore, using semantic indentation could completely remove the need for braces as block delimiters. Semantic indentation can replace where blocks begin and end.

if(x==y) {
     foo
} else {
     bar
}

becomes

if x==y
     foo
else
     bar
and
if(x==y) {
     foo
}
can become
if x==y
     foo
or even
if x==y  foo

using the tab character.

This might be a tad confusing, however, because there is only whitespace between the x==y and the foo. How do we know its not a space instead? If you hit the tab key, which indicates you are going to the next chunk instead of just a long conditional expression, the editor could draw a light glyph where the tab is. Perhaps a rightward unicode arrow.

Now I know the Rubyists and Pythoners will say that they've already removed the block delimiters. Quite true, but this goes one step further.

Python takes the choice of whitespace away from the programmer, but the programmer still has to implement it. With semantic indentation the entire job of formatting is taken away. You just type your code and the editor does the right thing. Such a system also opens the door for alternative rendering of the code in particular circumstances.

Better Fonts

And of course we come to our final advantage. Without manual formatting with spaces we don't need to be restricted to a monospace font anymore. Our code could look like this:

text

Semantic indenting. Less typing, more readable code. Let’s rock like it’s the 1990s!

The Art of LEGO Design: Creative Ways to Build Amazing Models

by Jordan Schwartz

No Starch Press is really doubling down on their Lego books. Their latest is a stunner. The Art of Lego Design by Jordan Schwartz is less of an art book and more of a hands on guide. It shows actual techniques used by the Lego artists featured in other No Starch books like Mike Doyle’s Beautiful Lego.

As one of the LEGO Group’s youngest staff designers; Jordan Schwartz worked on a number of official LEGO sets. His attention to detail and gift for teaching really come through in the book. Many of the models that seem impossible at first, such as stained glass windows, are actually pretty simple once explained in the book.

The Art of Lego Design is the perfect entry point to rabbit hole that is Lego sculpture. With Lego specific jargon like cheese slopes, SNOT (Studs Not On Top), and the Lowell sphere; it would be easy to get lost; but Jordan explains it all clearly, along with general design principles like texture and composition.

The book takes the reader through many styles of construction and technical advantages of different pieces, with as dash of Lego history thrown in. I didn’t know there once was a Lego set with a fuzzy bear rug or a line of big Lego people for the Technic sets.

Read or Read Not? Read!

Get it from No Starch Press

Apparently my last post hit HackerNews and I didn’t know it. That’s what I get for not checking my server logs.

Going through the comments I see that about 30% of people find it interesting and 70% think I’m an idiot. That’s much better than usual, so I'm going to tempt fate with another installment. The general interest in my post (and let’s face it, typography for source code is a pretty obscure topic of discussion), spurred me to write a follow up.

In today’s episode we’ll tour the font themselves. If we want to reinvent computing it’s not enough to grab a typewriter font and call it a day. We have to plan this carefully, and that starts with a good selection of typefaces.

Note that I am not going to use color or boxes or any other visual indicators in this post. Not that they aren’t useful, but today I want to see how far we could get with just font styling: typeface, size, weight, and style (italics, small caps, etc.).

Since I'm formatting symbolic code, data, and comments; I need a range of typefaces that work well together. I’ve chosen the Source Pro family from Adobe. It is open source and freely redistributable, with a full set of weights and italics. More importantly, it has three main faces: a monospace font: Source Code Pro, a serif font: Source Serif Pro, and a sans-serif font: Source Sans Pro. All three are specifically designed to work together and have a range of advanced glyphs and features. We won't use many of these features today but they will be nice to have in the future.

Lets start formatting some code. This will be a gradual process where we chose the basic formatting then build on top for different semantic chunks.

For code itself we will stick with the monospace font: Source Code Pro. I would argue that a fixed width font is not actually essential for code, but that’s an argument for another day when we look at indentation. For today we’ll stick with fixed width.

Code comments and documentation will use Source Serif Pro. Why? Well, comments don’t need the explicit alignment of a monospace font, so definitely not Source Code Pro. Comments are prose. The sans serif font would work okay but for some reason when I think "text" I think of serifs. It feels more like prose. More texty.

So I won’t use Source Sans Pro today but I will save it for future use. Using the Source [x] Pro set gives us that option.

Below is a simple JavaScript function set with the default weights of those two fonts. This is the base style we will work from.

text

So that’s a good start but.., I can immediately think of a few improvements. Code (at least in C derived languages) has five main elements: comments, keywords, symbols, literals, and miscellaneous — or what I like to call ‘extraneous cruft’. It’s mainly parenthesis and brackets for delimiting functions and procedure bodies. It is possible to design a language which uses ordering to reduce the need for delimiters, or to be rid of them completely with formatting conventions (as I talked about last week). However, today’s job is to just restyle without changing the code so let’s leave them unmolested for now. Instead we will minimize their appearance by setting them in a thin weight. (All text is still in black, though).

Next up is symbols. Symbols the part of a program that the programmer can change. These are arguably the most important part of the program; the parts we spend the most time thinking about, so let’s make them stand out with a very heavy weight: bold 700.

text

Better, but I don’t like how the string literal blends in to with the rest of the code. String literals are almost like prose so let’s show them in serif type, this time with a bolder weight and shrunk a tiny bit (90% of normal).

For compatibility I did the same with numeric literals. I’m not sure if ‘null’ is really a literal or a symbol, but you can assign values to it so I’ll call it a literal.

text

Next up is keywords. Keywords are the part of the language that the programmer cannot change. They are strictly defined by the language and completely reserved. Since they are immutable it doesn’t really matter how we render them. I could use a smiley face for the function keyword and the compiler wouldn’t care. It always evaluates to the same thing. However, unlike my 3yr old’s laptop, I don’t have a smiley face key on my computer; so let’s keep the same spelling. I do want to do something unorthodox though. Let’s put the keywords in small caps.

Small caps are glyphs the size of lower case letters, but drawn like the upper case letters. To do small caps right you can’t just put your text in upper case and shrink it down. It would look strange. Small caps are actually different glyphs designed to have a similar (but not identical) width and a shorter height. They are hard to generate programmatically. This is one place where a human has to do the hard work. Fortunately we have small caps at our disposal thanks to the great contributions by Logos and type designer Marc Weymann. Open source is a good thing.

text

Now we are getting somewhere. Now the code has a dramatically different feel.

There’s one more thing to address: the variables. Are they symbols like function names? Yes, but it feels different than function names. They are also not usually prefixed with a parent object or namespace specifier. Really we have three cases. A fully qualified symbol like ‘baz’ in foo.bar.baz, the prefix part (foo.bar), and standalone variables that aren’t qualified at all (like ‘x’). This distinction applies whether or not the symbol is a function or an object reference (it could actually be both in JavaScript).

In the end I decided these cases are related but distinct. Standalone symbols have a weight of 400. Technically this is the default weight in CSS and shouldn’t appear to be ‘bold’, but since the base font is super light, regular will feel heavier against it. The symbol at the end of a qualifier chain will also be bold, but with a weight of 700. Finally the prefix part will be italics to further distinguish it. There really isn’t a right answer here; other combinations would work equally well, so I just played around until I found something that felt right.

This is the final version:

text

I also shrunk the comments to 80%. Again it just felt right, and serifed fonts are easier to read in longer lines, so the comments can handle the smaller size.

Here’s a link to the live mockup in HTML and CSS. This design turned out much better than I originally thought it would. We can do a lot without color and spacing changes. Now imagine what we could do will our full palette of tools. But that will have to wait for next time.

BTW, if you submit this to Hacker News or Reddit please let me know via Twitter so I can answer questions.