Thoughts on Build Systems

thoughts on build systems

I'm rebuilding my HTML Canvas Deep Dive book so I need a way to compile various source files into a final thing. I'm not producing an executable but rather a directory full of generated HTML, CSS, and Javascript, and possibly some other stuff; but it's the same basic idea. I need to turn a collection of things into another collection of things. I need a build system. So which should I use?

This brings up a long standing point. I have always wondered why build systems are declarative build specific languages, rather than general purpose ones (yes, I know there are some, but in general people use these declarative DSLs)? Being declarative means you can describe what you want to happen, not how it should happen. This means your code is compact and clear. This thing makes this other thing through this process. These languages also provide common build functionality like generating a dependency graph, determining when files are out of date, and the ability to fetch remote dependencies. All of these are good features.

The problem is that these systems can’t do everything. They are often designed for a specific use case, like compiling a bunch of C code. Once you go beyond the sweet spot using them becomes painful. They are difficult to extend. At some point you start asking for the features you would get from a traditional non-declarative Turing complete language.

So my question is really, why do we use these limited DSLs instead of using a declarative style of programming in a traditional language to do the same thing. Once we hit the wall of complexity we still have a real programming language backing us up and we can extend the build system however we want.

I’m not sure this is a good idea. Clearly lots of smart people have thought about this problem for a long time, and we still use declarative systems (though continuing to use makefiles is surely an archaic practice that should be ended immediately). I thought it would be an interesting experiment to see if I could make an understandable build system in a general purpose programming language. Since I’m building a book out of web stuff, using NodeJS is the natural choice.

The first challenge is how to declare a series of dependencies. For B to happen, A must happen first. Well, that part is easy. Just make them functions and have B call A first thing.

function B() {
A()
// do something second
}
function A() {
//do something first
}

A function call graph is a kind of dependency graph. There’s a problem though. What if C also needs A to happen, but A had already been done by B. We don’t want A to happen twice.

What we need is for the system to only execute A once, no matter how many times it is called, provided it’s called with the same arguments. Fortunately functional programming long ago provided a solution: memoization. And in modern JS it’s trivial to implement.

function memoize(fun) {
const memo = new Map()
const slice = Array.prototype.slice
return function() {
const args = slice.call(arguments)
if(!(args in memo)) {
memo[args] = fun.apply(this, args)
}
return memo[args]
}
}

The memoize function above accepts a function as an argument and returns a new function. This new function will delegate back to the original function and cache the result in a Map using the arguments as a key. If the function is called again with the same arguments then it will use the result from the cache instead of calling the original function.

Now we can rewrite our A,B,C example with memoize like this:

const A = memoize(function() {
//do the first thing
})
const B = memoize(function() {
A()
})
const C = memoize(function() {
A()
})

Great. The next problem is that by itself Javascript doesn’t have threads, and using callbacks for IO will get ugly really fast. Modern JS has a solution for this: async/await. We can make very clean code that does all of the async future callback nonsense for us in the background. Node also recently added Promises for most of the IO functions, so we can use them cleanly together.

For example, suppose we want to read a directory then call a processing function on every file in that directory. This processing function will replace every instance of ‘dog’ with ‘cats are better’. The final step will concatenate them all together and write them to a new file called ‘output.txt’

const readdir = require('fs').promises.readdir
const readFile = fs.promises.readFile

const process = memoize(async function(file)) {
const content = await readFile(file)
content.replace(/dog/g,'cats are better')
return content
})

const doit = memoize(async function(dirname) {
const files = await readdir(dirname)
const outputs = await files.map(file => process(file))
await writeFile('outfile.txt',outputs.join())
})

await doit('source')

There we go. With async and memoize the code is pretty compact and most importantly it reads clearly. We can see that ‘doit’ does the following things in order: reads the ‘source’ directory, processes each file, then writes the output to ‘outfile.txt’. We can reuse all of the power of node from a general purpose programming language while ending up with something that still feels somewhat declarative.

Again, I’m not sure this is a good idea. I suspect I’m missing something that lots of other smart people have thought of over the last few decades. Would this become too cumbersome as the codebase gets bigger? Or could we just create some better abstractions with functions? What do you think?

Talk to me about it on Twitter

Posted July 21st, 2019

Tagged: rant javascript programming