Syntax for a Humanist Language

This is part of my series on the humanist programming language I’m building called (currently) HL. Read the rest here.

HL is meant to be a PL (programming language) for thinking, not a way to instruct the computer to do things efficiently. This is a very different philosophy from most PLs. The history of PL design has largely been about giving the computer as much information as possible so it can generate the most efficient code. There have been smatterings of languages that try to augment humans (APL and Smalltalk being excellent examples), but they are not in widespread use today. But enough high-level talk. What would a human centered programming language actually look like?

I think of humanism first and foremost as preserving human autonomy. In the context of programming languages I think this means a programing experience that lets you work how you want to. The system imposes few rules upon you, letting you build up your own conventions and comfortable workflows. The language should support you, the human, and not hold you back. In practice I think this philosophy results three requirements:

A natural feeling syntax.
Well thought out APIs that are forgiving and composable.
An environment that removes friction.

Today I’m going to talk about the syntax.

Natural language

So what is a natural feeling syntax? Some PL designers have tried to use an English like syntax, or something that feels like natural spoken languages. I don’t think this has worked out very well. The resulting code tends to be brittle. It breaks the minute you try to do something slightly different than what the ‘natural syntax’ suggests. Why? I suspect it’s because the computer pretends to be smarter than it is.

If you, as a new user of a programming language, are presented with a language that appears to be like a spoken language, you will then try to program that way. However you will quickly discover that if you say things slightly different than expected then the computer doesn’t understand you.

A natural language PL makes a promise that it can’t keep: that the computer can understand you like another human would. I think this is the reason voice assistants like Alexa and Siri are so frustrating to use. A real human understands context. A real human has shared knowledge that no computer program can come anywhere close to. I think that natural language PLs are not viable. But maybe one day.

Math Like Syntax

Instead of natural language I want to go with a syntax closer to mathematics: symbols are combined with operators that go into functions. Math isn’t necessarily any more ‘natural’ than traditional programming languages, but it is something most people already know. Mathematical thinking and equation notation is already taught to children everywhere, so a humanist PL should leverage that knowledge.

But you might say: "Aren’t most PLs already using a math like syntax?" Somewhat, but they could be better.

Many PLs take existing operators like = which mean equality, and repurpose them to be assignment, which is a related but not equivalent concept; or overload functions like + to work with strings, but not other objects. Further more, most PLs have strange restrictions on what you can and can’t use in identifiers and numbers, and in what order. Well, strange from the point of view of a non-programmer. I suppose we can all get used to anything given enough time.

I think a lot of these restrictions come from the difficulty of making parsers for languages (and the symbols available on common keyboards), but implementation is not a problem for the user to solve. That’s a problem for the language designer to solve. Pushing the problem back on to the users is fundamentally anti-humanist.

So, in concrete terms here’s what I propose.

Extra whitespace doesn’t matter

From the human’s point of view there is a big difference between zero spaces and one space, but there is almost no difference between one space and two or three or 20 spaces. There’s either a gap or there isn’t. Using indentation to solve nesting just takes a visible problem (that we need nesting) and hides it (making the nesting invisible and hard to debug).

Arguably the nesting should be handled by the IDE itself. The computer doesn’t actually care. The only reason PLs use parentheses and brackets to being with is to help out the parser. The rest of the computer works with the AST (Abstract Syntax Tree) that doesn’t need nesting delimiters. If we were all using WYSIWYG structural editors then we wouldn’t need such notation. However that is a much bigger problem than I can solve today, so let’s stick with some sort of braces for now. Perhaps when I make the inevitable block version of HL I can revisit this.

All identifiers are case insensitive

How many times have you been helping a kid debug a program and discover they had typed food as a variable in one part of the program and Food in a different part. The compiler says something like variable Food is not found and the kid is frustrated because clearly they had defined food up above. As an experienced programmer you are used to this and your brain automatically accounts for it. You might always capitalize your identifiers, or use camel-case, or some other notation to provide this extra information to the computer and your programmer brain. But...

Let’s back up and ask the question: Why does the case matter anyway? When would a responsible programmer ever want to have Food and food be separate things? Why can’t a language be case insensitive? I ask the same about filesystems as well. I can imagine the underlying reasoning related to parser implementations, or linking to external libraries, or something equally esoteric. Excuses! A human focused PL should do what is as natural as possible for the human. Implementation details are for the implementor to solve.

All underscores are stripped

I also propose letting you put underscores wherever you want. _my_func_ and MyFunc and ____myFunc are all the same. There’s no battle between snake case and camel case. Use whatever notation best works for you; the computer will understand it! I briefly considered allowing spaces in identifiers but that actually does open a larger can of worms. I think it’s possible, but more than I’m willing to bite off right now.

Allowing underscores to be stripped also makes number notation easier. In Javascript when I want to wait for 100 seconds I have to type 100000 since most JS APIs think in terms of milliseconds. As I stare at that number I think, wait, did I put the right number of zeros? Sometimes I”ll write it as 100*1000 to break it up into chunks. It would be nicer if I could just do 100_000. This makes it easier for me, a human who uses commas to split long numbers, and the computer still sees the same thing. What could be more humanist than that?

use >> and << for assignment

There’s a wide variety of notations for variable assignment in existing PLs. Many use or modify the equals sign, = because it seems close to the equality symbol in math. But it’s not the same. Equality is a question. It asks if two things are equal and returns true or false. It has a matching anti-operator of inequality ≠. Assignment means making one thing equal another. Assignment is actually very different than equality. It is only with repeated practice that our brains elide the incongruity.

Instead I propose using something like an arrow to mean that the results of an expression go into the identifier. The nice thing about arrows is that they are directional. We could use both left and right arrows to mean the same thing. foo << 3 is the same as 3 >> foo. They both mean the expression goes into the foo identifier.

The reason I want to use both arrows is because sometimes it’s more natural to think left to right rather than right to left. It depends on the problem you are working on. With arrows for assignment it’s always clear which is the expression and which is the final destination for the data. I think it also reinforces the concrete metaphor of data going from one place to another, which is half of the computation battle anyway. The other half is red and blue lasers.

The only downside of using << and >> is that are no longer available for bit shifting. However, I rarely do that these days and for the users of this language it’s a non-issue.

Use arrows for function pipelining

There’s another reason to use arrows. We can reuse them for function pipelining. If you haven’t used functional programming languages very much, pipelining is when the results of one function are passed as an argument to another function. Normally we do this with function application, as in this example to eat lunch.

eat(make_lunch(get_ingredients('jelly','bread)))

As the function calls get nested deeper and deeper the code can become hard to read. Function pipelining lets us do this:

get_ingredients('jelly','bread') >> make_lunch() >> eat()

This means the same thing but is far more readable. In fact, it literally *is the same thing. The pipelining is just sugar that the parser transforms back into the old form, but for the human it’s a vastly better experience. And because pipelining and assignment look similar and are used in similar ways, we can mix them with the same notation. If I want to assign the results of my sandwich to a bag to eat later, I can just replace eat() with ‘bag’.

get_ingredients('jelly','bread') >> make_lunch() >> bag

Assignment and pipelining both mean data goes into some other place. These are far closer in meaning than assignment and equality are.

And all the rest

I think that’s all I’m going to say about syntax today. HL should still seem familiar to traditional programmers. It still uses quotes for strings, digits for numbers, * and / for multiplication and division. It has units built in (meters, inches, etc.). I could write in depth about these features, but they are mostly the same things you’d find in many other languages, and so not interesting enough to blog about. (Well, maybe I'll do a post on units sometime).

HL won’t look like some crazy APL hell (though it does use a ton of ideas from APL). HL will look like runnable math. And why shouldn’t it. We all already know math. Reusing existing knowledge instead of forcing us to learn new things is an admirable quality in a programming language.

Next time I'll dive into feature 2: well thought out APIs that are forgiving and composable.

Talk to me about it on Twitter

Posted January 17th, 2021

Tagged: programming hl

Josh On Design