Concise Computing: a PNG Parser in 20 lines

By what standard should we measure if code is "beautiful"? I argue it should be not lines or speed, but rather conciseness and clarity. Can someone who is not familiar with the language still understand what the algorithm does? Can someone not familiar with the task still get a feel for how it works? This metric favors shorter code over longer, but not at the expense of readability. Beautiful code should be as close to expressing the underlying algorithm as possible. How close is the actual code to the most straight forward pseudo code?

Some will argue that an elegant implementation is useless if it is too slow; but what if the elegant version is only 5% slower? Is that a good tradeoff? Almost certainly. What if it's 10x slower but only runs for 1% of the time? Then the trade off might be worth it. It depends. Certainly it's good to start with the clear and concise version and only move to a more complex implementation if the speed profiling warrants it. This is what I like to call concise computing.

Complexity

Concise code is easier to read, easier to write, and much, much, much easier to maintain. Fewer lines of code means fewer places for bugs to hide. But let me take it from another angle:

If you open up the typical programming library, especially older ones, you will find a bunch of complex code; often difficult to understand. This is the case even if the underlying task should be straight forward. Why should code be like this?

I think there are two reasons: first, the programming language used to write the library wasn't expressive enough to write the implementation in a way that looks like the algorithm. If a task is naturally described using objects then a language with out direct object support will be at a disadvantage. If the task is really about matching patterns, then a language without matching will necessarily represent the algorithm in a less elegant way. And of course once written we try not to rewrite our code for fear of breaking things. Code is instant legacy.

The other reason for complex libraries is speed. While I can describe sound synthesis using pure math... math is expensive to compute. A set of pre-calculated wavetables will be far faster than raw cosine functions. Changing an algorithm to be fast often makes the code hard to understand. But is this really a good excuse?

Our software doesn't have to be fast. It has to be fast enough. But enough depends entirely on context and it changes over time as hardware evolves. Math.sin() used to be hugely expensive. When John Carmack wrote Doom he would massively change code to save a few divides. The code was amazing but came at a cost of obscuring the underlying algorithm and being hard to maintain. It doesn't have to be like that anymore.

Ometa

I did not invent concise computing (though I haven't heard the specific term before). It has been a computing goal for over fifty years. The sweet spot for any implementation has shifted over the years as hardware changes, but I think the time has come for most of our coding to move in the direction of conciseness. We have such an excess of CPU power, even in our mobile devices, that in almost every case the concise version wins over a fast but complex one. However, let me stop with the theoretical discussion and give you a concrete example of what I'm saying.

I recently discovered a language called OMeta. OMeta is actually a extension to let you write DSLs in a general purpose host language. My preferred version is OMeta/JS, which extends JavaScript, but there are implementations for SmallTalk, C#, and others. OMeta really exemplifies my thoughts on concise computing.

Alex Warth, the creator of OMeta, realized many computing tasks, especially writing parsers and compliers, are just different forms of matching. So why not bake it into the language in a way that is convenient but still lets you get at the power of the host language, and treat the parsers as objects that can be extended in various ways. Here's what a simple calculator looks like:

SimpleCalc <: Ometa {
   exp = number | sub | add,
   sub = exp:a "-" exp:b =<  (a-b),
   add = exp:a "+" exp:b =< (a+b),
}

result = new SimpleCalc().matchAll("6+7);  //result is set to 13

Even if you don't know anything about OMeta you can follow what is going on. This says that exp is a number or a pattern called sub or add. Then it defines sub and add with recursive definitions back to exp. The => part says that the matched expression goes to a small Javascript expression, in this case the part that does the actual math.

This code looks a lot like your typical BNF grammar. The magic here is that everything after the => part is regular Javascript. Instead of (a+b) it could be console.log("doing an add"), or call some other function to add in base 16 math, or generate C fucntions. Normally I would have to break out to a special library with weird syntax, or write in a completely different language. OMeta lets us implement algorithms that are naturally expressed with matching, but still remain in our general purpose host language, not some special matching only parser. This is concise computing.

OMeta was created as part of Alex Warth's graduate thesis. You should definitely check it out. For an academic paper it is highly readable. Alex continued his work at VPRI, a research group set up by Alan Kay. Yes, thatAlan Kay. Though Alex's OMeta is an excellent improvement the original idea goes back to Meta II, a meta-compiler created back in 1962!!! The battle for concise computing is far from recent.

A PNG Parser in OMeta

Most public uses of OMeta show small examples like the calculator above or language compilers, which is probably it's most obvious application. (I myself am working on my own cross-language system with OMeta which I'll detail in a future blog post). Earlier this month I showed you a markdown word processor with embedded arithmetic expressions built in OMeta. However, today I want to show you a different kind of task: parsing a binary data PNG file in JavaScript.

PNG is a fairly simple data format yet many parsers are surprisingly complex. As documented in Wikipedia, PNG consists of a header followed by a series of chunks. Each chunk has a length, type, and payload. Most of the chunks contain metadata about the image. Only one, IDAT, contains a bitmap. The example below does not decompress the actual image data (which requires a DEFLATE decompressor) , but rather parses and validates all of the metadata chunks; the most likely use for an external parser.

Parsing binary data in Javascript has always required hacks because JS has no native binary types. Fortunately, the W3C and Khronos recently remedied this situation with Typed Arrays, a new data type built to support textures and vertices for WebGL. Using typed arrays we can fetch a PNG file from a server and convert it into a large buffer; like this:

function fetchBinary() {
    var req = new XMLHttpRequest();
    req.open("GET","icon.png",true);
    req.responseType = "arraybuffer";
    req.onload = function(e) {
        console.log("loaded");
        var buf = req.response;
        if(buf) {
            var byteArray = new Uint8Array(buf);
            console.log("got " + byteArray.byteLength + " bytes");
            var arr = [];
            for(var i=0; i<byteArray.byteLength; i++) {
                arr.push(byteArray[i]);
            }
            processOmeta(arr);
        }
    }
    req.send(null);
}

The Uint8Array is an array of unsigned bytes (8 bit long 'words'). With this array in hand we can parse the entire structure in less than 20 lines of Ometa. Again, even if you don't know how Ometa works you can get a vague idea of what the program does by reading it.

ometa BinaryParser <: Parser {
    //entire PNG stream
    start  = [header:h (chunk+):c number*:n] -> [h,c,n],
   
    //chunk definition
    chunk  = int4:len str4:t apply(t,len):d byte4:crc
        -> [#chunk, [#type, t], [#length, len], [#data, d], [#crc, crc]],
   
    //chunk types
    IHDR :len  = int4:w int4:h byte:dep byte:type byte:comp byte:filter byte:inter
        -> {type:"IHDR", data:{width:w, height:h, bitdepth:dep, colortype:type, compression:comp, filter:filter, interlace:inter}},
    gAMA :len  = int4:g                  -> {type:"gAMA",value:g},
    pHYs :len  = int4:x int4:y byte:u    -> {type:"pHYs", x:x, y:y, units:u},
    tEXt :len  = repeat('byte',len):d    -> {type:"tEXt", data:toAscii(d)},
    tIME :len  = int2:y byte:mo byte:day byte:hr byte:min byte:sec
        -> {type:"tIME", year:y, month:mo, day:day, hour:hr, minute:min, second:sec},
    IDAT :len  = repeat('byte',len):d    -> {type:"IDAT", data:"omitted"},
    IEND :len  = repeat('byte',len):d    -> {type:"IEND"},
   
    //useful definitions
    byte    = number,
    header  = 137 80 78 71 13 10 26 10    -> "PNG HEADER",        //mandatory header
    int2    = byte:a byte:b               -> bytesToInt2(a,b),    //2 bytes to a 16bit integer
    int4    = byte:a byte:b byte:c byte:d -> bytesToInt(a,b,c,d), //4 bytes to 32bit integer
    str4    = byte:a byte:b byte:c byte:d -> toAscii([a,b,c,d]),  //4 byte string
    byte4   = repeat('byte',4):d -> d,
   
   
       
    END
}

This parser returns a JSON structure that we can loop through to find any chunk we care about.

chunk: {"type":"IHDR","data":{"width":33,"height":36,"bitdepth":8,"colortype":2,"compression":0,"filter":0,"interlace":0}}
chunk: {"type":"gAMA","value":55555}
chunk: {"type":"pHYs","x":2835,"y":2835,"units":1}
chunk: {"type":"tEXt","data":"Software0x0 QuickTime 6.5.2 (Mac OS X)0x0 "}
chunk: {"type":"tIME","year":2005,"month":4,"day":5,"hour":17,"minute":6,"second":20}
chunk: {"type":"IDAT","data":"omitted"}
chunk: {"type":"IEND"}

In a sense the code above is not really an algorithm, but rather a definition of the PNG format. A runnable specification. For an even more impressive example, the OMeta guys created a TCP/IP parser which uses the actual ASCII art header definitions from the RFC as the parser code itself.

More

What I've shown you today is merely a hint of what concise computing could be. Imagine an entire operating system from boot to GUI, complete with apps, written in less than 20k lines of code? It sounds crazy yet that is what VPRI has done.

By writing the most clear and concise code possible we can make our software more maintainable and longer lived. Next time I'll show you the same principle applied to audio synthesis. Feel free to ask questions and continue the discussion.

Talk to me about it on Twitter

Posted March 18th, 2013

Tagged: ometa concise