I need to move on to other projects so I’m wrapping up the rest of my ideas in this blog. Gotta get it outta my brainz first.

The key concept I’ve explored in this series is that the code you see in an editor need not be identical to what is stored on disk, or the same as what is sent to the compiler. If we relax this constraint then a world of opportunity opens up. We’ve been writing glorified text files for 40 years. We can do better. Let’s explore.


Why can’t you name a variable for? Because in many common languages for is a reserved word. You, as the programmer, aren’t allowed to use for because it represent a particular loop construct. The underlying compiler doesn’t actually care of course. It doesn’t care about the name of any of your variables or other words in your code. The compiler just needs them to be unique symbols, some of which are mapped to existing constructs like conditionals and loops.

If the compiler doesn’t care then why can’t we do it? Because the parser (the ‘front end’ of the compiler) does care. The parser needs to unambiguously transform a stream of ASCII text into an abstract syntax tree. It’s the unambiguous part that’s the trouble. The syntax restrictions in most common languages are there to make the parser happy. If the parser was magical and could just "know what we meant" then any syntax could be used. Perhaps even syntax that made more sense to the human rather than the computer.

Fundamentally, this is what typographic programming does. It lets us tell the parser which text is what without using specific syntax rules. Instead we use color or font choices to indicate whether a given chunk of text is a variable or keyword or something else. Of course editing in such a system would be a pain, but we already know how to solve that problem. Graphical word processors are proof that it is possible. Before we get to how we solve it let us consider why. Would such a system have enough benefits to outweigh the cost of building it. What new things could we do?

Nothing’s reserved

If we use typography to indicate syntax, then keywords no longer need to be reserved. Any keyword could be used as a variable and any text string could be used as a keyword. You could swap for with fore or thusly. You could use spaces in keywords as for each of. These aren’t very useful examples but the compiler could easily handle them.

With the syntactic restrictions lifted we are free to explore new control flow constructs. How about forever to mean an infinite loop and 10 times for standard for fixed length loops? It’s all the same to the compiler but the human reading it would better understand the meaning.

Custom Operators

If nothing is reserved then user defined operators become easy. After all; what is an operator but a function with a single letter name from a restricted character set. In Python 4 + 5 is just sugar for add(4,5).

With no syntax rules anything could be an operator. Operators could have multiple letter names, or other symbols from the full unicode set. The only reason operators are given special treatment to begin with is because they represent functions which are so commonly used (like arithmetic) that we want a shorthand. With free syntax we can create a shorthand for the functions that are useful to the task at hand rather than the abstract general purpose tasks the language inventors imagined.

Let’s look at something more concrete. Using complex numbers and vectors is common in graphics programming, but we have to use clumsy and verbose syntax in most languages. This is sad. Mathematics already has invented compact notation for these concepts but we can’t use them due to ASCII limitations. Without these limitations we could add complex numbers with the plus sign like this:

A +B

instead of


To help the programmer remember these are complex numbers they could be rendered in a different color.

There are two ways to multiply vectors: the dot product and the cross product. They have very different meanings. With full unicode we could use the correct symbols like this:

A ⋅ B  // dot product
A ⨯ B // cross product

No ambiguity at all. It would be too much to expect a language to support every possible notation. Much better instead to have a language that lets the programmer create their own notation.

Customization in Practice

So how would this work in practice? At some point the code must be transformed into something the compiler understands. Let’s postulate a hypothetical language called X. X has no syntax rules, only the semantic rules of it’s AST. To tell the complier how to convert the code into the AST we must provide our own rules. Something like this.

fun => function
cross => cross
dot => dot
|x| => magnitude(x)

fun intersection(V,R) {
     return V dot R / |V|;

We have now defined a mini language in X which still compiles to the same syntactic structure.

Of course typing all of these rules in every file (or compilation unit) would be a huge pain, so we could include them much as we include external libraries.

@include math.rules

fun intersection(V,R) {
     return V dot R / |V|;

Most importantly, not only would the compiler understand these rules but so would the editor. The editor can now indicate that V ⋅ R is valid only if they are both vectors. It could enforce the rules from the rule file. Now our code is limited only by the imagination of our rule writers, not the fixed compiler.

In practice, were X to become popular, we would not see everyone making up their own rules. Instead usage would gather around a few popular rulesets much as JavaScript gathered around a few popular libraries like JQuery. We might call each of these rulesets dialects, each a particular flavor derived from the base X language. Custom DSLs would become trivial to implement. It would be common for developers to use one or two "standard" dialects for most of their code but use a special purpose dialect for a particular task.

The important thing here is that the language no longer has a fixed syntax. It can adapt and evolve as needed. All without changing the compiler.

How do you edit?

I hope I’ve convinced you that a flexible syntax delimited by typography is useful. Many common idioms like iteration, accumulation, and data structure traversals could be distilled to concise syntax. And if it has problems then we can tweak it.

There is one big problem though. How would you actually edit code like this?

Fortunately this problem has already been solved by graphical word processors. These tools use color, font, size, weight and spacing to distinguish one element from another. Choosing the mode for a variable is as simple as selecting it with the cursor and using a drop down.

Manually highlighting a entire page of code would quickly grow tedious, of course. For common operations, like declaring a variable, the programmer could type a special symbol like @. This tells the editor that the next letters are a variable name. The programmer ends it with @ or by pressing the spacebar or return key. This @ symbol doesn’t exist in the code. It is simply to indicate to the editor that the programmer wants to be in ‘variable’ mode. Once the mode is finished the @’s go away and the text is rendered with the ‘variable’ font. This is no different than using star word star to indicate bold in Markdown text. The stars never appear in the rendered text.

The choice of the @ symbol doesn't matter as long as it's easy with the user's native keyboard. @ is good for US keyboards. French or Russians might use something else.

Resolving Ambiguity

Even using manual markup might become tedious, though. Fortunately the editor can usually figure out the meaning of any given token by using the dialect rules. If the rules indicate that # equals division then the editor can just do the right thing. Using manual highlighting would only be necessary if the dialect itself introduces an ambiguity. (ex: # means division and also the start of a hex value)

What about multiplying vectors? You could type in either of the two proper symbols, but the average keyboard doesn’t support those directly. You’d have to memorize a unicode code point or use a floating dialog. Alternatively, we could use code completion. If you type * then the editor knows this must be either dot or cross product. It provides only those two choices in a drop down, much as we auto-complete method names today.

Using a syntax free language does not fully remove the need to resolve ambiguity, it just moves the resolution process to edit time rather than compile time. This is good. The human is present at edit time and can explain to the computer was is correct. The human is not there at compile time, so any ambiguity must result in an error that the human must come back and fix. Furthermore, resolving the ambiguity need only happen once, when the human types it, not every time the code is compiled. This will further reduce code regressions when other parts of the system change.

Undoubtedly we would discover more edge cases, but these are all solvable. Modern GUI word processors and spreadsheets prove this. A more challenging issue is version control.


Code changes over time. It must be versioned. I don’t know why it took 40 years for us to invent distributed version control systems like Git, but at least we have it now. It would be a shame to give that up just as we’ve gotten the world on board. The problem is Git and other VCSs don’t really understand code. They just understand text. There are really only two ways to solve this:

1) modify git, and the other tools around it (diff viewers, github’s website, etc.) to support binary diffs specific to our new system.

2) make the on disk format be pure text.

Clearly option 1 is a non-starter. One day, once language X takes over the world, we could ask the GitHub team to add support for X diffs, but that’s a long ways off. We have to start with option 2.

You might think I’m going back on what I said at the start. After all, I stated we should no longer be writing code as text on disk, but that is exactly what I am suggesting. What I don’t want is to store the same thing that we edit. From the VCS’s point of view the editor and visual representation are irrelevant. The only thing that matters is what is the file on disk. X needs a canonical on serialization format. Regardless of what tool you use to edit X, as long as it saves to the same format we are fine. This is no different than SQL or HTML. Everyone has their favorite tool, but they all write to the same format.

Canonical Serialization Format.

X’s serialization format should obviously be plain text. < 128bit ASCII would be fine, though I think we could handle UTF8 easily. Most modern diff tools can work with UTF8 cleanly, so Japanese comments and math symbols would come through just fine.

The X format should also be unambiguous. Variables are marked up explicitly as variables. Operators as operators. There should be no need for the parser to guess at anything or interpret syntax rules. We could use one of the many existing formats like JSON, XML, or even LaTex. It doesn’t really matter since humans will rarely need to look at them.

But.... since we are defining a new serialization format anyways, there are a few useful things we could add.

Code as Graph

Code is really just a graph. Graphs can be serialized in many ways. Rather than using function names inline they could be represented by identifiers which point to a lookup table. Then, if a function is renamed the code only changes in one place rather than at every point in the code where the function is used. This creates a semantic diff that the diff tool could render as ‘function Y renamed to Z’.

v467 = foo
v468 = bar
v469 = baz

fun v467 () {
return v468 + v469;

Semantic diff-ing could be very powerful. Any refactoring should be reducible to its essential meaning: moved X to a new class or extracted Y from Z. Whitespace changes would be ignored (or never stored in the first place). Commit messages could be context aware: changed X in the unit test for Y and added logging to Z. Our current tools just barely understand when a file has been renamed instead of deleted and a new one added. There’s a lot of room for innovation here.


I hope I’ve convinced you there is value in this approach. Building language X still won’t be easy. To be viable we have to make a compiler, useful dialect definitions, and a visual editor; all at the same time. That’s a lot of work before anyone else can use it. Building on top of existing tools like Eclipse or Atom.io would help, but I know it’s still a big hill to climb. Trust me. The view will be worth it.

After the more abstract talk I’d like to come back to something concrete. Regular Expressions, or regex, are powerful but often inscrutable. Today let’s see how we could make them easier to use through typography and visualization without diminishing that power.

Regular Expressions are essentially a mini language embedded inside of your regular language. I’ve often seen regex written like this,

new Regex(“^\\s+([a-Z]|[0-9])\\w+\\$$”) // regex for special vars

then sometimes reformatted like this

//regex for special vars
new Regex(
     “^” // start of line
     +“\\s+” // one or more whitespace
     +“([a-Z]|[0-9])” //one letter or number
     +”\w+” // one word
     +”\\$” // the literal dollar sign
     +”$” // end of line

The fact that the author had to manually split up the text and add comments simply screams for a better syntax. Something far more readable but still terse. It would seem that readability and conciseness would be mutually exclusive. Anything more readable would also be far longer, eliminating much of the value of regex, right?

Au contraire! We have a special power at our disposal: we can render whatever we want in the editor. The compiler doesn’t care as long as it receives the description in some canonical form. So rather than choosing readable vs terse we can cheat and do both.

Font Styling

Let’s start with the standard syntax but replacing delimiters and escape sequences with typographic choices. This regex looks for variable names in a made up language. Variables must start with a letter or number, followed by any word character, followed by a dollar sign. Here is the same regex using bold italics for all special chars:


Those problematic double escaped backslashes are gone. We can tell which dollar sign is a literal and which is a magic variable thanks to font styling.

Color and Brackets

Next we can turn the literals green and the braces gray, so it’s clear which part is actual words. We can also replace the $ and ^ with symbols that actually look like beginning and ending of lines. Open and close up brackets or floor brackets.


Now one more thing, let’s replace the w and s magic characters with something that looks even more different than plain text: letter boxes. These are actual unicode characters from the "Unicode Characters in the Enclosed Alphanumeric Supplement Block".


Now the regex is much easier to read but still compact. However, unless you are really familiar with regex syntax you may still be confused. You still have a bunch of specific symbols and notation to remember. How could we solve this?

Think Bigger

Let’s invent a second notation that is even easier to read. We already have a clue of how this should work. Good programmers write the regex vertically with comments as an adhoc secondary notation. Let’s make it official.

If we have two views then the editor could switch between them as desired. When the user clicks on the regex is will expand to something like this:


We get what looks like a tiny spreadsheet where we can edit any term directly, using code completion so we don’t have to remember the exact terms. Furthermore the IDE can show the extra documentation hints only when we are in this "detailed" mode.

Even with this though, there is still a problem with regex. Just by looking at it you can’t always tell what it will match and what it won’t. Without a long comment, how do programmers today know what a regex will match. They use their brains. Yes, they actually look at the text and imagine what it will match in their heads. They essentially simulate the computer the regex mentally. This is totally backwards! The whole point of a computer is to simulate things for us, not the other way around. Can’t we make the computer do it’s job?!


Instead of simulating the regex mentally, let’s give it some strings and have the computer tell us what will match. Let’s see what that would look like:

regex tester

We can type in as many examples as we want. Having direct examples also lets us test out the edge cases, which is often where regexes fail.

Hmm. You know… this is basically some unit tests for the regex. If we just add another column like this:

regex unit tests

then we can see at a glance if the regex is working as expected. In this example the last test is failing so it is highlighted in red.

Most importantly, the unit tests are right next to the regex in the code; right where they are used. If we collapse the triangle then everything goes away. It’s still there, just hidden, until we need it.

This is the power of having a smart IDE and flexible syntax. Because these are only visualization changes it would work with any existing compiler. No new language required.

So far my posts on Typographic Programming have covered font choices and formatting. Different ways of rendering the source code itself. I haven’t covered the spacing of the code yet, or more specifically: indentation. Or even more specifically: tabs vs spaces.

Put on your asbestos suits, folks. It’s gonna get hot in this kitchen.

Traditionally source code has been rendered with a monospace font. This allows for manual horizontal positioning with spaces or tab characters. Of course the tab character doesn’t have a defined width (I’ll explain in a moment why) so flame wars have erupted around spaces vs tabs, on par with the great editor wars of the last century. Ultimately these are pointless arguments. Tabs vs spaces is an artifact of trying to render code into a monospace grid of characters. It’s the 21st century! We can do better than our dad's 1970s terminal. In fact, they did better in the 19th century!

In The Beginning

Let’s start at the beginning. Fixed whitespace indenting can be used to line things up so they become pretty, and therefore easier to read. But that's a lot of work. All that pressing of space bars and adjusting when things change.

Instead of manually controlling whitespace what if we used tab stops. I don’t mean the tab character, which is mapped to either 4 or 8 spaces, but actual tab stops. Yes. They used to be a real physical thing.

image of typewriter tab stop

In the olden days, back when we used manual typewriters (I think I was the last high school class to take typing on such machines), there was such a thing as a tabstop. These were vertical brackets along the page (well, along that metal bar at the bottom of the current line). These tiny pieces of metal literally stopped the tabs, thus giving them the name tabstops. We were so creative with names in those days.

When you hit the tab key the cursor (a rapidly spinning metal ball imprinted with the noun: “Selectric”) would jump from the left edge of the paper to the first tabstop. Hit tab again and it will go to the next tabstop. Now of course, these tab stops were adjustable, so you could choose the indenting style you wanted for your particular document.

Let me repeat that. The tabs stops could be adjusted to the indenting style of your particular document. Inherent is the concept that there is no “one right way”, but rather the format must suit the needs of the particular document, or part of a document, that you are writing.

When WYSIWYG editors came along they preserved the notion of a tabstop. They even made it better by giving you nice vertical lines to see the effect of changing the tabstop. When you hit tab the text would move to the stop. If you later move the stop then the text aligned with it will magically move as well. Dynamic tabstops! Yay. We can finally rock like its the 1990s.

Word for Mac, circa 1991

Semantic Indentation

So why do we go back to the 1970s with our text editors? Tabstops are a simple concept for semantically (sorta) indenting our code. Let’s see what some code would look like with simple tabular semantic indenting.

Here’s some code with no formatting other than a standard indent.


This is your typical Cish code with brackets and parameters. It would be nice to line up the parameters with their types. The drawRect code is also similar between lines. We should clean that up too.

Here is code with semantic indenting.


How would you type in such code? When you hit the tab key the text advances to the next tabstop. these tabstops are dynamic, however. Instead of giving you a line with a ruler at the top, the tabstops automatically expand to fit the text in the column. Essentially they act more like spreadsheet cells than tab stops.

Furthermore, the text will be left aligned at the tabstop by default, but right aligned for text that ends with a comma or other special character. This process is completely automatic and hidden, of course. The programmer just hits the tab key and continues typing, the IDE handles all of the formatting details, as it should be. We humans write the important things (the code) and let the computer handle the busy work (formatting).

The tab stops (or columns if you think of it as a table) don’t extend the entire document. They only go down as far as the next logical chunk. There could be multiple ways to define a ‘chunk’ semantically, but one indicator would be a double space. If you break the code flow with a double space then it will revert to the document / project wide defaults. This lets us use standard indentation for common structures like functions and flow control bracing, while still allowing for custom indentation when needed.

Ludicrous Speed

Furthermore, using semantic indentation could completely remove the need for braces as block delimiters. Semantic indentation can replace where blocks begin and end.

if(x==y) {
} else {


if x==y
if(x==y) {
can become
if x==y
or even
if x==y  foo

using the tab character.

This might be a tad confusing, however, because there is only whitespace between the x==y and the foo. How do we know its not a space instead? If you hit the tab key, which indicates you are going to the next chunk instead of just a long conditional expression, the editor could draw a light glyph where the tab is. Perhaps a rightward unicode arrow.

Now I know the Rubyists and Pythoners will say that they've already removed the block delimiters. Quite true, but this goes one step further.

Python takes the choice of whitespace away from the programmer, but the programmer still has to implement it. With semantic indentation the entire job of formatting is taken away. You just type your code and the editor does the right thing. Such a system also opens the door for alternative rendering of the code in particular circumstances.

Better Fonts

And of course we come to our final advantage. Without manual formatting with spaces we don't need to be restricted to a monospace font anymore. Our code could look like this:


Semantic indenting. Less typing, more readable code. Let’s rock like it’s the 1990s!

Apparently my last post hit HackerNews and I didn’t know it. That’s what I get for not checking my server logs.

Going through the comments I see that about 30% of people find it interesting and 70% think I’m an idiot. That’s much better than usual, so I'm going to tempt fate with another installment. The general interest in my post (and let’s face it, typography for source code is a pretty obscure topic of discussion), spurred me to write a follow up.

In today’s episode we’ll tour the font themselves. If we want to reinvent computing it’s not enough to grab a typewriter font and call it a day. We have to plan this carefully, and that starts with a good selection of typefaces.

Note that I am not going to use color or boxes or any other visual indicators in this post. Not that they aren’t useful, but today I want to see how far we could get with just font styling: typeface, size, weight, and style (italics, small caps, etc.).

Since I'm formatting symbolic code, data, and comments; I need a range of typefaces that work well together. I’ve chosen the Source Pro family from Adobe. It is open source and freely redistributable, with a full set of weights and italics. More importantly, it has three main faces: a monospace font: Source Code Pro, a serif font: Source Serif Pro, and a sans-serif font: Source Sans Pro. All three are specifically designed to work together and have a range of advanced glyphs and features. We won't use many of these features today but they will be nice to have in the future.

Lets start formatting some code. This will be a gradual process where we chose the basic formatting then build on top for different semantic chunks.

For code itself we will stick with the monospace font: Source Code Pro. I would argue that a fixed width font is not actually essential for code, but that’s an argument for another day when we look at indentation. For today we’ll stick with fixed width.

Code comments and documentation will use Source Serif Pro. Why? Well, comments don’t need the explicit alignment of a monospace font, so definitely not Source Code Pro. Comments are prose. The sans serif font would work okay but for some reason when I think "text" I think of serifs. It feels more like prose. More texty.

So I won’t use Source Sans Pro today but I will save it for future use. Using the Source [x] Pro set gives us that option.

Below is a simple JavaScript function set with the default weights of those two fonts. This is the base style we will work from.


So that’s a good start but.., I can immediately think of a few improvements. Code (at least in C derived languages) has five main elements: comments, keywords, symbols, literals, and miscellaneous — or what I like to call ‘extraneous cruft’. It’s mainly parenthesis and brackets for delimiting functions and procedure bodies. It is possible to design a language which uses ordering to reduce the need for delimiters, or to be rid of them completely with formatting conventions (as I talked about last week). However, today’s job is to just restyle without changing the code so let’s leave them unmolested for now. Instead we will minimize their appearance by setting them in a thin weight. (All text is still in black, though).

Next up is symbols. Symbols the part of a program that the programmer can change. These are arguably the most important part of the program; the parts we spend the most time thinking about, so let’s make them stand out with a very heavy weight: bold 700.


Better, but I don’t like how the string literal blends in to with the rest of the code. String literals are almost like prose so let’s show them in serif type, this time with a bolder weight and shrunk a tiny bit (90% of normal).

For compatibility I did the same with numeric literals. I’m not sure if ‘null’ is really a literal or a symbol, but you can assign values to it so I’ll call it a literal.


Next up is keywords. Keywords are the part of the language that the programmer cannot change. They are strictly defined by the language and completely reserved. Since they are immutable it doesn’t really matter how we render them. I could use a smiley face for the function keyword and the compiler wouldn’t care. It always evaluates to the same thing. However, unlike my 3yr old’s laptop, I don’t have a smiley face key on my computer; so let’s keep the same spelling. I do want to do something unorthodox though. Let’s put the keywords in small caps.

Small caps are glyphs the size of lower case letters, but drawn like the upper case letters. To do small caps right you can’t just put your text in upper case and shrink it down. It would look strange. Small caps are actually different glyphs designed to have a similar (but not identical) width and a shorter height. They are hard to generate programmatically. This is one place where a human has to do the hard work. Fortunately we have small caps at our disposal thanks to the great contributions by Logos and type designer Marc Weymann. Open source is a good thing.


Now we are getting somewhere. Now the code has a dramatically different feel.

There’s one more thing to address: the variables. Are they symbols like function names? Yes, but it feels different than function names. They are also not usually prefixed with a parent object or namespace specifier. Really we have three cases. A fully qualified symbol like ‘baz’ in foo.bar.baz, the prefix part (foo.bar), and standalone variables that aren’t qualified at all (like ‘x’). This distinction applies whether or not the symbol is a function or an object reference (it could actually be both in JavaScript).

In the end I decided these cases are related but distinct. Standalone symbols have a weight of 400. Technically this is the default weight in CSS and shouldn’t appear to be ‘bold’, but since the base font is super light, regular will feel heavier against it. The symbol at the end of a qualifier chain will also be bold, but with a weight of 700. Finally the prefix part will be italics to further distinguish it. There really isn’t a right answer here; other combinations would work equally well, so I just played around until I found something that felt right.

This is the final version:


I also shrunk the comments to 80%. Again it just felt right, and serifed fonts are easier to read in longer lines, so the comments can handle the smaller size.

Here’s a link to the live mockup in HTML and CSS. This design turned out much better than I originally thought it would. We can do a lot without color and spacing changes. Now imagine what we could do will our full palette of tools. But that will have to wait for next time.

BTW, if you submit this to Hacker News or Reddit please let me know via Twitter so I can answer questions.

Allow me to present a simple thought experiment. Suppose we didn’t need to store our code as ASCII text on disk. Could we change the way we write -- and more importantly read -- symbolic code? Let’s assume we have a magic code editor which can read, edit, and write anything we can imagine. Furthermore, assume we have a magic compiler which can work with the same. What would the ideal code look like?

Well, first we could get rid of delimiters. Why do we even have them? Our sufficiently stupid compilers.

Delimiters like quotes are there to let the compiler know when a symbol ends and a literal begins. That’s also why variables can’t start with a number; the compiler wouldn’t know if you meant a variable name or a numeric literal. What if we could distinguish between then using typography instead:

Here’s an example.


This example is semantically equivalent to:

print "The cats are hungry."; //no quotes or parens are needed

Rendering the literal inside a special colored box makes it more readable than the plain text version. We live in the 21st century. We have more typographic options than quotes! Let’s use them. A green box is but one simple option.

Let’s take the string literal example further:


Without worrying about delimiting the literals we don’t need extra operators for concatenation; just put them inline. In fact, there becomes no difference between string concatenation and variable interpolation. The only difference is how we choose to render them on screen. Number formatting can be shown inline as well, but visually separate by putting the format control in a gray box.


Also notice that comments are rendered in an entirely different font, and pushed to the side (with complete Unicode support, of course).

Once we’ve gone down the road of showing string literals differently we could do the same with numbers.


Operators are still useful of course, but only when they represent a real mathematical operation. It turns out there is a separate glyph for multiplication, it’s not just an x, but it’s still visually confusing. Maybe a proper dot would be better.

Since some numbers are actually units, this hypothetical language would need separate types for those units. They could be rendered as prose by using common metric abbreviations.


In a sense, a number with units really is a different thing than plain numbers. It’s nice for the programming language to recognize that.

As long as we are going to have special support for numeric and string literals, why not go all the way:

Color literals


Image literals


Byte arrays


If our IDEs really understood the concepts represented in the language then we could write highly visual but still symbolic code. If only our compilers were sufficiently smart.

I’m not saying we should actually code this way but thought experiments are a good way to find new ideas; ideas that we could then apply to our existing programming systems.

Typography is the study of type, meaning letter forms of the printed word; though in modern usage it includes all manner of non-printed letter forms such as computer screens, eBooks, electronic billboards, and even textiles. Typography is possibly the most important part of design because humans are visual creatures and nothing conveys more information in a smaller space than words. This article will cover the basic terms and anatomy of type, then give you some quick tips to help choose fonts wisely. Unlike my last topic, color, modern typography is actually quite old. Most of modern color theory was developed in the 20th century with the invention of modern dyes, paints and emissive displays. The core typographic theories, however, were in place when Gutenberg made the first printed books over 500 years ago. Certainly a lot has been developed since then, but the universal principles of type like alignment and grids were already developed during the dark ages by monks transcribing bibles by hand. When Gutenberg converted these principles to his mechanical printing press modern typography was born.

Typeface vs Font

First things first: typefaces versus fonts. A typeface, also called a font family, is a set of fonts designed with a stylistic unity, each comprising a coordinate set of glyphs. (wikipedia.org). A font is a complete character set of a typeface at a particular size, weight, and style. In software engineering terms you can think of a typeface as a class and the font as an instance of that class with a particular configuration. The exact meaning of typeface and font have changed subtly over the years as the technology has shifted from mechanical type to computer-set type to purely digital type. Originally fonts from the same typeface were literally different boxes full of chunks of metal shaping each letter at a given size. These days we have scaleable font technology so different sizes, weights and styles can be generated algorithmically or the multiple forms can distributed in a single physical file. Thus font and typeface have somewhat merged, and today you can safely refer to a typeface as a 'font' in general usage, and only use the more precise terminology when referring to particular instances of that font (or when talking to ornery typographers).

Anatomy of a Font


The weight of a font is the thickness of the lines that make up the letter forms. Since this thickness is relative to the size of the letters we refer to these as their weight. You are probably most familiar with bold vs regular, but some fonts have as many as nine weights from thin and light up to bold, heavy, and black. The True Type font format specifies weights on a scale from 100 to 900. CSS can use words or scale numbers to specify the weight. Police-Helvetica.gif Helvetica (wikipedia.org)

Style: AKA Slope, Italic & Oblique

We usually think of the slope of a font as italic or normal but, as with everything in typography, there's more subtlety to it. Italic text really means text which is drawn differently to indicate it is emphasized and should stand out in some way. Italic text can be programmatically generated by leaning the text to the right, called oblique text. Some fonts include a true italics form which actually draws the letters with different shapes. Fancier fonts may lower the fs, adjust the roundness of letters, or lean to the left instead of the right (depending on the language). FirefoxScreenSnapz016.png FirefoxScreenSnapz017.png roman, italic, and oblique (wikipedia.org)


Some fonts have wider and narrower versions of the letterforms. The terms for these versions vary by font maker, but will often be called things like condensed and expanded. These fonts adjust the horizontal width of the letters themselves.


Leading (pronounced like treading or fed-ing) is the spacing between lines of text. Back in the metal typesetting days they used strips of metal lead to separate lines, which is where the name comes from.

Kerning & Tracking

Track_Kern.png Kerning and Tracking (wikipedia.org) Kerning and Tracking are adjustments to the spaces between letters of a proportional font. With tracking the extra space uniformly applied to all letters. With kerning the extra space allocated by letter pairs. This means that certain pairs of letters will have more or less space than other pairs, resulting in a more pleasing look. In some cases letters may vertically overlap to make them look better. Kerning values are very hard to generate algorithmically because it depends on the how the letter forms visually look to the human eye. For this reason well kerned fonts are created by hand by a font expert, and are therefore more expensive. The results can be well worth it, however, because the badly kerned fonts result in keming. Also available as a T-shirt. FirefoxScreenSnapz014.png Keming, from Ironic Sans.

Font Metrics

The actual letter forms have a specific vertical anatomy based on how we measure them. Together these measurements are the font's metrics. We use these measurements to decide how to put letters together to form words, lines, and paragraphs of text. FirefoxScreenSnapz018.png Font Metrics (wikipedia.org) Of the various measurements in this diagram the most important is baseline. The human eye finds baseline aligned text very pleasing, and will notice the tiniest imperfections in this alignment. In GUI software it is important to align UI controls containing text (which is most of the standard controls) along the baseline of the text rather than the visual bounds of the actual control. For example, a button should be aligned with the text inside the button rather than the rectangular edges of the button itself. It may seem like a small detail but it makes a big difference. I once spent a year fixing bugs in the Windows Look and Feel for Swing, and the most visible bugs were vertical alignment issues. Yes, I spent a year of my life turning this: before.png into this after.png All in all, I still think it was a year well spent :)

Odds and Ends

  • Small capitals, aka: smallcaps, is simply smaller forms of the capital letters. Usually these smallcaps have the same height as the lowercase letters of the normal font. Smallcaps may be included in a font or generated algorithmically by shrinking the regular capital letters.
  • Dropcaps or Initials: special forms of the intial letter in a paragraph or chapter. They are often much larger than the regular text on the page and will 'drop down' into the lower lines.
  • Logical font: a font which doesn't actually exist, but will be replaced with a real font at runtime. For example, in webpage you can use the font 'sans-serif'. There is no real font named 'sans-serif', but the webbrowser will pick the user's preferred sans serif font and substitute it at runtime.

Categories of Fonts

In Roman languages (languages which use the 26 letters of the Roman alphabet plus various accent and punctuation marks) there are five major categories of fonts.


Serif letters drawn with features at the ends of their strokes. The serifs are the little feet we see in fonts like Times. These are some of the oldest type designs. The feet along the baseline help guide the eye from left to right, making them very 'readable' fonts. FirefoxScreenSnapz021.png Serif Font

Sans Serif

Sans Serif (french for "without serifs") are letters drawn with straighter lines and no feet. Their larger letterforms make them very legible, but can cause greater eye strain when used in long runs of text. Helvetica is considered the quintessential sans serif font. FirefoxScreenSnapz020.png Sans-serif Font


Script fonts imitate hand written letter forms. They are much harder to read than serif and sans serif fonts, and should never be used for body text.


Ornamental fonts are highly decorative and tend to work only at larger type sizes. They are often called 'display' fonts. They should never be used for body text (I'm looking at you, Comic Sans!). kruffy.png Type set in Baby Kruffy


Monospace fonts have letterforms that are exactly the same width, or are at least equally spaced. These come from the days of typewriters and old computers that could only move fixed distances. The are rarely used today except when the vertical alignment of columns matters. Examples include code editors and laying out tabulated data.

Text Placement

Most of our type terminology in software comes from the world of newspapers. In practice we have three major kinds of text when building webpages and software: body, header, and navigation.
  • Body: this is the bulk of your text organized into paragraphs. It is usually smaller than the other text and should use a simpler font that looks good at smaller sizes.
  • Header: this text in short runs (usually a single line or less) that must stand out. Headers usually appear at the top of a page or paragraph indicating that the reader should move their eyes here to begin reading something. Headers also look good with serif and sans serif fonts, but you have more freedom because the text is typically larger.
  • Navigation: this text is very short and is usually at the top or sides of the page or screen. It indicates the structure of the entire work, such as newspaper sections or different pages in a site. It should also indicate to the user how to navigate to that section. In a newspaper you would use page & section numbers. For software the text would be rendered as something clickable (a button, an underlined link, a hover effect etc.).

Quick Tips

Use the right font for the size

Not all fonts look good at all sizes. Some really shine at small sizes, others at large ones. Use the right font for your size. At smaller sizes you want fonts which are simpler and easier to read such as the basic serif and sans serif fonts. Try Arial, Helvetica, Times, Verdana, and Georgia. They are installed on virtually all computers and were designed to look good on screen at smaller sizes.

Use different font categories for different parts of text

Use a sans serif font for the header and a serif font for the body, or vice versa. It creates a contrast without banging you over the head with it.

Use truly crazy fonts only for top headers

Use the cool and crazy fonts only for the top most headers. In the example below I used a font meant to evoke mid-century Tiki-culture, with a funky light effect to suggest it's alcoholic origins. I only used this font in one place: the top most header of the page. Everything else is done with Helvetica; a standard sans-serif font. FirefoxScreenSnapz022.png Home page of Project MaiTai Use restraint. Pick one crazy font to use for your whole site or application and use it sparingly. If everything is different then nothing is. You can use a fancy font (somewhere between the plain boring fonts of body and crazy fonts of only top most headers) for headers within the page. And use one of the plain boring fonts for body text. Leave the crazy fonts just for the top most header.

Use color and style to create contrast

Use color changes or capitalization style instead of fonts to contrast between the headers, body, and navigation. For example, in recent versions of iTunes the headers in the sidebar are all capitalized and in a slightly lighter color with a hint of etching. iTunesScreenSnapz002.png iTunes 9

Bundle a font in your app

For the body text I still recommend sticking with one of the standards, but for your header font you can choose something a bit more distinctive. Here are some great places to get cheap or free fonts.

Avoid Cool Fonts

Don't use cool / cute / themed fonts unless the subject matter really warrants it. Your website on technology trends really shouldn't use a cutesy bubble font unless you are really targeting the tween market. FirefoxScreenSnapz015.png Sniglet

Just the beginning

Typography is an incredibly large topic. What I've covered here is just the beginning. In the future I'll cover text as part of the overall design and dive into the grid system. If you'd like to read more on typography, here are a few good resources: The 20 Do's and Don'ts of web typography.