Josh On Design

Thoughts on LLMs and the coming AI backlash

Thu, 13 Feb 2025 00:00:00 +0000

Thoughts on LLMs and the coming AI backlash

I find Large Language Models fascinating. They are a very different approach to AI than most of the 60 years of AI research and show great promise. At the same time they are just technology. They aren’t magic. They aren’t even very good technology yet. LLM hype has vastly outpaced reality and I think we are due for a correction, possibly even a bubble pop. Furthermore, I think future AI progress is going to happen on the app / UX side, not on the core models, which are already starting to show their scaling limits. Let’s dig in. Better pour a cup of coffee. This could be a long one.

Cut through the hype

First I want to cut through all the hype. I’m dismayed that so much funding is going to ML and almost nothing else. What happened to all of the AR/VR startups? My beloved WebXR? But I digress. Back to ML.

A lot of the hype around LLMs stems from saying that we are just moments away from AGI. This is false. I get that the arguments are seductive. Breathless takes like ‘the fastest adoption of technology ever’ make us think that AGI is just around the corner. Moore’s Law lets technology scale exponentially and LLMs were only possible thanks to fast chips, therefore Moore’s Law gets us AGI. Right? No. No it doesn’t.

We have to remember that Moore’s Law doesn’t apply to anything other than transistors. Other tech will not scale the way chips did, unless you can turn it into a chip problem (which is why the mobile revolution seemed so fast and then petered out). Humans are very good at making the same thing over and over again. LLMs are not like transistors. LLMs require huge computational power to do their thing, but they also require massive amounts of data and clever algorithms to apply them. Those don’t scale exponentially. We are running out of public data. And even exponential growth can’t go on forever in the real world. There are always limits. Nothing is forever.

AGI beyond LLMs

LLMs are powerful but are already starting to show their limits. I don’t think a few more rounds of Moore's law is going to get us to AGI, despite what Sam Altman has said. Real AGI is going to take a different approach, combining LLMs with other kinds of systems. LLMs can’t correct themselves because they don’t have logical reasoning. They can generate something, but not know it’s correct beyond a statistical correlation with the corpus of the web it has ingested. These are called hallucinations (though perhaps bullshit is a better term). Humans make these problems too, but we can then check our thoughts: is the thing I just thought I saw likely to be real? LLMs don’t have the ability to say how confident they are. Humans can. Self correction is possible with code generation, because code can be compiled. You can run it to see if it actually does what it should. Anything where the answer can be easily verified is a good use of an LLM.

The Backlash is Coming

I see a backlash coming in the AI industry, and to avoid it we need to focus more on fail-safes and real value. Companies are making bold claims which will not only disappoint the bulls, but also anger the bears. Along the way we’re going to see the web filled with AI slop. Anything free will be flooded. Systems which depend on determining who is a real person or not are starting to break down. I don’t know of a solution beyond charging money. Maybe that’s a good thing. Massive unemployment due to LLMs probably won’t happen (long term we are in for a human labor shortage as the boomers retire) but if a recession is coming AI companies may get a lot of the blame. The backlash is coming.

A few weeks ago we saw a stockmarket correction that saw a sell off of AI and Nvidia stocks triggered by Deep Seek. This was inevitable. I never bought the idea that early AI companies could build a moat around their models. If one company can strap a bunch of GPUs together to train a model from the web then so can another company. LLMs are a technology that is destined to be commoditized. That doesn’t mean there isn’t money to be made; databases are commodities and Oracle is doing fine.

I think the real problem is that commoditization happened faster than people expected, and at the same time current AI apps just aren’t as useful and profitable as expected. This is okay. It has happened many, many times before in tech. It’s a bubble popping, or at least deflating.

I’m old enough to remember the dot-com bust. That didn’t mean the web was a bad idea. Just that these companies were too early and invested too much in technology that was about to be commoditized (fiber, servers, software). The two decades following the dot-com bust were the most productive our industry had ever seen. I think the same will happen with LLMs. It’s a new tool in the toolbox that we haven’t figured out how to use yet.

This doesn’t mean there isn’t value there. It’s simply that AI is a technology, not a product. It’s going to take longer to find the value than expected. The current LLMs just aren’t reliable enough. Over half of LLM-written news summaries have “significant issues” according to a recent BBC analysis.

Things I’m Not Worried About

I’m genuinely not worried about excessive power usage from AI. Deep seek is showing that LLMs can be trained with less computation and energy. Overall power usage for the economy will continue to go up over the next decade, but at a rate consistent with historical averages. AI is a blip and the power company stocks will fall back to earth soon. Power usage is an engineering problem. It’s the kind of problem we know how to solve. Very little work has been put into making these AI systems power efficient yet. I expect that to change over the next few years.

I’m not actually too worried about LLMs being controlled by just a small number of companies. The technology is proving easy to duplicate at scale. Even if we don’t have open models from the big US companies, we likely will have them floating on the Internet sourced by Chinese companies, and efficiency gains will let us run them locally.

I’m also not worried about AI alignment and robots taking over. It’s important to remember that these things aren’t alive or conscious in any meaningful way. I’m not worried about ‘alignment’ because the question itself assumes intent and intelligence that these things don’t actually have. I am worried about ‘bias’, though, the same as I’m worried about bias in psychological studies whose core sample population is Ivy League psych majors. We already have alignment problems with other non-human processes. They are called corporations. What does it mean for a business to be ‘aligned’ with human interests? AIs will need be subject to regulation the same way businesses are. Imperfect, to be sure, but not an existential crisis.

What we should be working on

LLMs are a technology, just like transistors and lasers. They are not products. We haven’t seen the real AI products yet. I feel like the current problems are largely UI and product definition issues. LLMs work best where the cost of failure is low and where a human can review the results.

The problems come when we trust them to work without supervision. We aren’t designing these things to fail, meaning to properly handle failure. Failure will always come. If you don’t handle it you are just doing bad engineering. Consider a company replacing a call center with a chat bot which can’t actually solve the 5% of problems people call human support centers for. It’s not allowed to. Or insurance review boards replacing reviewers with LLMs that fail in some cases, and no way to go back to a human to address the failures. LLMs work best where the cost of failure is low. Is the generated essay too wordy? Just generate it again. Does this scan indicate cancer? Maybe, but we need to delegate to a human for the next step.

Right now we put too much faith in these systems. The computer should never make a promise it can’t keep. Above all, the engineers need humility. Solve the small part of the problem that you can, and be humble that you can’t solve it all and account for it. It’s going to require a lot of work. A lot of experimentation to find the cases where AI provides real value instead of cheap-to-generate slop. Things may get a lot worse before they get better.

Conclusion

We haven’t seen the real AI products yet. Most tools are still just generative. They can’t advise me on what I already have. They don’t have judgment. Where are the tools which analyze my code dependencies and figure out what to swap out with slimmer replacements. But we will get there. Where is the LLM powered security analyzer that can find bugs in my code beyond the heuristics. What would it look like to have a word processor for thinking? Something helps me organize my thoughts, not just rewrite them. An AI that was a trusted research partner. AI as a collaborator, not a replacement. We are a ways away from that still. We need to augment humans, not replace them.

Remembering Mixed Reality

Thu, 23 Jan 2025 00:00:00 +0000

Working in the Mixed Reality group at Mozilla was the most fun I ever had in my 30 year career. Helping to usher a new technology into the world. One with such potential. I’d really love to go back there. Unfortunately it seems there isn’t really a there to go back to. Cue memory fade.

WebXR

Mixed Reality (MR) refers to the spectrum of interfaces between and including Virtual Reality (VR, fully opaque) and Augmented Reality (AR: fully transparent). The Mixed Reality Group was a team at Mozilla researching MR, building web standards around it, and making a MR first browser for multiple MR devices. At one point we had tech running on the Oculus Go, Oculus Quest (1), MS’s HoloLens, and some Android-derived devices like the Pico (who I just learned was acquired by Bytedance).

At the center of all of this was WebXR: a set of web standards (actual W3C standards) for building and securing mixed reality applications on the web. And I got to be a part of it. My team and I built developer tools, epic demos, wrote tutorials (since scrubbed from the web), and promoted WebXR content. It was a thrilling time and a highlight of my career.

The failure of WebXR

After the WebXR group was shut down at Mozilla a lot of other MR projects have been canceled, scaled back, or put on the back burner. In the following five years MR for the open web hasn’t happened and WebXR is largely DOA.

At Mozilla, after laying off most of the Emerging Technologies division in 2020, Hubs survived but mostly for the non-immersive cases, and even that was cut last summer. Everything else is gone. Our blogs are gone. The github repos have been archived. Almost wiped clean from the face of the web.

Apple supports the WebXR standard but only on their VR headset (more on that in a bit). We (Mozilla) had it AR based WebXR running smoothly on iPhones, but Apple never unlocked the API in Safari. In fact, Apple’s native ARKit efforts seem to have completely dropped off the face of the earth. They are still supporting it and occasionally add new features, but don’t highlight apps built with it or use it any way I can see.

Microsoft was a WebXR contributor but they killed off their Windows MR initiative and HoloLens 2 is stuck on an OS that will soon exit support and they have shown no indication there will ever be version 3.

Google abandoned Daydream and has done nothing with Android’s AR APIs. They also shut down multiple web-based systems that supported AR (visual tours of maps, a 3d model store, and I think a few others). Chrome itself still supports WebXR, presumably on any PC based VR headsets, but they don’t seem to do much work on it anymore.

Meta still supports WebXR on the Quest, but I never hear about websites being built to use it anymore. They don’t advertise it or seem to encourage development.

So MR on the web hasn’t happened. Or it happened and then it died. This makes me sad. Just writing this section of the article makes me sad. I really wanted to build the metaverse as an open network of websites that visitors could jump between. To make immersive a native part of the web. In the end.. maybe the world just didn’t want it

Beyond the Web

The non-web MR platforms have slowly improved but not matching what we dreamed about. Meta shipped the Quest 3 (and the expensive Quest Pro, for some reason) and it has a few popular games and exercise apps, but it hasn’t grown into a thriving platform beyond games. They also tried to build a VR social network but if none of my VR using teenagers have even heard of it then clearly it flopped.

Sony has lost interest in the Playstation VR2, but that was always a closed platform anyway. Pico has continued to make headsets but I don’t think they are sold in the US. And of course Microsoft and Google shut down their own offerings.

Google recently announced Android XR, but I question if it will go anywhere. Google has a habit of starting big initiatives and then abandoning them when they get bored. At least that’s what it looks like from the outside. I imagine internal politics are different. I’d love more information here.

Apple Vision Pro

And finally, a latecomer to the game (or maybe still too early) is the Apple Vision Pro. It’s amazing hardware but doesn’t do much and is very expensive. How is it that the best designers at the wealthiest company in the world can’t come up with better uses for AR than 3D photo galleries and VNC? And if that’s the best they’ve got, then why did they release it at all?

Usually when Apple introduces a new technology they have some amazing showcase uses to drive adoption and excite developers. However, sometimes it’s simply impossible to nail the killer uses without a bunch of real people trying it out in the real world and seeing what works. The Apple Watch was like this.

The first versions of the Apple Watch were slow, had terrible battery life, and couldn’t do very much. In most ways they were worse than everything else on the market. The one true bit of value they added was notifications from your phone on your watch, which ended up being easy for others to replicate anyway thanks to BLE. But Apple didn't give up.

Apple’s Watch OS team pivoted the UI and core uses several times before nailing the wellness functions around version 5. So why did they ship before then? Why did watches 1-4 exist outside the lab? Because without them they could never get to Watch 5.

I think, or at least I hope, Apple Vision Pro is like this. There will have to be a few crappy versions until we get to the good one that has the right mix of weight, features, and battery life; and has a few (or even just one) truly killer use that justifies the product.

I’ll say this about Apple. They don’t give up. They keep iterating until they nail it. Of course, once they find product market fit they sometimes let products atrophy, but that’s a story for another day.

A different path forward

Maybe evolving VR to AR was the wrong approach. Maybe instead we should focus on regular glasses with a heads up display and actual useful functions before trying to nail the real world overlay and occlusion problems. Meta’s RayBan smart glasses are far more interesting to me than either of their separate VR or AI initiatives.

Maybe we need an AI agent looking over your shoulder analyzing what you are doing. Being the guy in the chair to our superhero selves. Secure AI agents could be huge here.

My Areas of Interest for 2025

Thu, 16 Jan 2025 00:00:00 +0000

I’ve been quiet lately, not because I’ve had nothing to say but because I’ve not felt well enough to say it. That changes today.

I’m starting to get a handle on my adult onset Type 1 Diabetes (T1D for the cool kids), including taking more control of my medication and standardizing my food choices (it’s less what I eat and more about consistency of amount and timing).

I’m not quite ready to look for a new job but I am ready to look back at my career and find patterns. Which areas of tech have I enjoyed the most? What new areas actually interest me. So I’m starting this article series to document my personal research on the state of the industry. I hope you’ll join me for the journey, and hopefully we’ll learn some cool stuff along the way.

Going Back in Time

I’ve worked in a bunch of [different areas](https://www.linkedin.com/in/joshmarinacci/) of tech during my 30 years of being a professional software engineer (Holy carp! I’m old!). From UI toolkits to network attached storage, from mobile applications to low latency networking. Though the jobs have been diverse, there are some common themes, especially human computer interaction. This should not be a surprise as Graphics, Visualization and Usability was my specialization in college. Still, it’s interesting that I keep coming back to the area over and over. It’s surprising that my initial instincts at 18 were actually correct.

Let’s go through some current tech areas that are heavy with HCI and are growing.

Artificial Intelligence

I am not an AI maximalist. I don’t believe AIs are conscious, nor do I think we are anywhere close to AGI. Sam Altman seems to think we are, but he carefully hedged his bets by saying we will have super intelligence within a few thousand days, which is a clever way of saying “at least a decade away”. I personally think it will be several times that, assuming it is possible at all. In either case, it doesn’t affect my job hunt today.

However, intelligence and consciousness are different things. It is entirely possible to have intelligent machines that are not conscious, and that is probably preferred anyway. I’m not even talking about future tech. We have yet to absorb the possibilities of current LLMs, much less the ones coming down the pipe. Lasers were first built in 1960 and we are still discovering new uses for them.

I feel that the next steps for AI are largely a user interface problem. These things are powerful but flawed. Making them applicable to more situations, and making them actually help humans instead of replacing them, is going to require some deep UI work. This is something I want to work on. If you are in the AI industry and looking to hire someone on the UX side, give me a call.

I’m going to explore this deeper in its own post.

XR: Augmented and Virtual Reality

Working in the Mixed Reality group at Mozilla for three years was probably the best work experience I’ve ever had. The end of that group was devastating for me. I got to build cool demos, developer tools, write blogs, as well as be on the W3C standard committee to make XR widely available and defend against attacks that could fill many a Black Mirror episode. It was a thrilling position and I really miss it.

Since 2020, when Mozilla dissolved that division (along with essentially every other group outside of the core browser), XR has largely been a disappointing industry. Meta has continued to improve the Quest but it hasn’t broken out of gaming. Microsoft shut down Windows MR. Google abandoned Daydream. And finally Apple’s Vision Pro has been a huge disappointment. While the hardware is incredible, how is it that the best designers in the world can’t come up with better uses for AR than 3D photo galleries and VNC?

Maybe evolving VR into to AR was the wrong approach. Maybe instead we should focus on smart glasses with a heads up display and actual useful functions before trying to nail the real world overlay and occlusion problems.

I’m cautiously excited to see Google return to the immersive space with Android XR, and Meta’s RayBan smart glasses are far more interesting to me than either of their separate VR or AI initiatives.

Embedded Hardware

The capabilities of modern embedded systems amaze me. Anyone can make fairly complex hardware using free circuit layout software, embedded programming languages like CircuitPython, easy to acquire components (I'm looking at you RP2040), and cheap PCB manufacturing services. Common standards like USB-HID make it even better. And yet..

Theoretically the low barrier of entry should trigger a Cambrian explosion of bespoke hardware, but I haven’t seen much of that yet. All keyboards look the same. All phones look the same. Where's the crazy hardware?! Yes “hardware is hard” but it’s a lot easier than it used to be. Where's my LLM powered micro-robots? How are vacuums and airpods the only new consumer hardware category of the past decade?

I suspect there is a lot interesting things happen that just aren't visible yet. One of my personal product ideas is a LEGO compatible macropad, which I’ve prototyped and am considering bringing to market. I’m also looking at building a programmer alarm clock, pixel style wall displays, and other products that will improve my industrial design skills. None of these ideas are likely big enough to be my full time job, but they’d make for some interesting side projects. More on this later.

Desktop Operating Systems

I spent 5 years on the desktop Java UI team at Sun, and at least another five years before that building lots of desktop software. I remember the Gnome vs KDE wars. I eagerly awaited every new release of OSX. Those days are over.

From a UI point of view desktop/laptop OS dev is dead. All of the effort is targeted at mobile while the desktop OSes have degraded from increasing monetization, API neglect, and centralized control; all at the cost of usability and productivity. A new OS from scratch could be an order of magnitude more productive with far fewer resources if it was designed around simpler messaging APIs and a database filesystem.

Sadly, while I feel there is desire for a desktop OS that is as usable and productive as, say, early OSX and Windows XP, with modern architectures and hardware; I don’t think there is a way to sell it. I don’t know how to build a business or open source project that could fund professional development of a new operating system (even if it used the guts of Linux to get started). I have, however, continued some personal research on this topic which I’ll cover in my deep dive on OS and UX dev. I just wish I could find a way to turn it into an actual job.

No Code / Low Code

AKA: coding for the masses. Once upon a time there was a lot of research and product development in software to let non-programmers be productive with computers. To let normal people create their own solutions. Hypercard. Visual Basic. For a variety of reasons little progress has been made for several decades. It’s sad to me that spreadsheets are still the best end user programming system we’ve come up with. That Apple's addition of math evaluation to Notes is such big news represents how lackluster things have become.

LLMs may be the game changer here. They provide a fundamentally new way of interacting with computers that I think could let people really solve their own problems. I don’t know how to build a business out of it, but there’s clearly something interesting going on here.

To be clear, I don’t think that all programmers will be unemployed because LLMs can generate code for you. Code generation is not the answer. Building a secure online application to collect data is far more complicated than it should be. However, asking an LLM to write the same code that a human programmer would is a sure fire way to get an insecure unmaintainable mess. I suspect the right answer is some new model of computation more amenable to the fuzziness of small bespoke applications, and perhaps a new kind of programming language to go with it.

Again, I don't know how to make a business around this, but there's some interesting kernel to be explored here. Contact me if you'd like to talk about it.

Medical Software

The modern medical software that patients interact with (health record systems, patient portals, etc) is a mess. It is some of the worst software I’ve ever had to use (and sadly I’ve had to use it a lot lately). Unfortunately, I strongly suspect that the root cause is that the medical system itself (at least in America) is a mess. *Software reflects the structure of the organization that built it*. I’d love to help fix medical software, but faster billing systems won’t fix broken business models. I don’t know what the solution is here but as someone currently wading through a new diagnosis I’d love to help fix it. Please contact me if you’ve got some medical startup opportunities.

Technical Documentation

Writing technical documentation is still harder than it should be and too locked into proprietary software. Writing good docs was the core of my last job, so I’m familiar with a lot of the problems. While I love the idea of GitBook, the reality of WYSIWIG editing Markdown for large documentation sets is still buggy and frustrating. I have some product ideas and GUI prototypes here that I want to explore in future posts, so I’ll save it for then.

Next Steps

So where am I going with all of this?

Right now my plan is to explore some of these ideas through a small side company (I have a certificate of existence!), while also continuing to get healthier and look for a new full time position.

IdealOS Thinking

Tue, 25 Jun 2024 17:42:52 +0000

One of my original IdealOS blog posts from 2017 showed up on the front page of Hackernews the other day (comments here). This got me thinking about IdealOS again. I haven’t worked on it in a couple of years, but as I read through the comments and links to articles by people with similar ideas, I came to a realization. I am still working on it. Maybe not directly, but I’m still exploring ideas that are needed to build IdealOS. So with that in mind let’s take a look at what I’ve been working on lately.

Tiny Apps

One of the keys to IdealOS actually being “ideal” is having apps which are tiny. They should have very little code. The less code there is the less space there is for bugs to hide. Tiny apps also tend to run faster and be easier to maintain. If we assume an always accessible database, then theoretically a lot of the complexity of writing an app can go away, at least if we have the right abstractions.

To that end I’ve been prototyping a React library that lets you define a data schema, then use it as the core data structure in your React app without having to manage state updates, and automatically persisting to local storage, resulting in an extremely compact app.

Runtime Types

The core idea I'm prototyping is that you make a schema of what you want to represent using prototype objects then clone those for your actual instances. This lets us do all of the type calculations at runtime. Schema objects contain all of the information for other code to read the entire schema, generate UIs based on those schemas, load and save to JSON, and do other useful things.

The schema is composed of a few core object types: atoms, lists, maps, and.. Actually, it would be easier if I just show you.

Here’s a simple example of a todo list item:

const TodoItem = makeMap({
    title: makeString("untitled"),
    completed: makeBoolean(false),
})
type TodoItemType = typeof TodoItem
const TodoList = makeList<TodoItemType>(TodoItem)
type TodolistType = typeof TodoList

Now lets create a list with two items in it.

const data:TodolistType = TodoList.clone()
data.push(TodoItem.cloneWith({
    completed: false,
    title: "make breakfast"
}))
data.push(TodoItem.cloneWith({
    completed: true,
    title: "buy milk"
}))

From this list we can pull out the data, manipulate it, loop over it, and all of the other things we normally do with data structures. All items have a built in toString() method so we can easily print it as well.

console.log(data.toString())

List:type_40070(make breakfast,false), 
     type_40070(buy milk,true)

Now let’s create a simple React component to view the list of items, and add, edit, or delete them.

export function TodoListExample() {
    const [selected, onSelect] = useState(()=>data.get(0))
    const addItem = () => data.push(TodoItem.cloneWith({
                                         completed: false,
                                         title: ""}))
    const deleteItem = () => data.deleteAt(data.indexOf(selected))
    return <VBox>
        <HBox>
            <Button title={"Add"} onClick={addItem}/>
            <Button title={"Delete"} onClick={deleteItem}/>
        </HBox>
        <ListView data={data}
                  selected={selected}
                  onSelect={onSelect}
        />
    </VBox>
}

That’s it. That is all of the code that we need to make a simple todo list. Rendered it looks like this:

Events and change propagation all happen internal to the structure, so your app doesn’t need to manage updates when new items are added. It will just do the right thing.

ListView

The ListView component knows about these smart objects, so it can render and select elements from the list right away. When something changes in the list, or a property deep down inside the list’s elements changes the ListView will be notified and redraw itself automatically.

ListView does not know, however, what our TodoItem object actually is so cannot render the items the way we want. By default it will call item.toString() which recursively calls toString() on the child properties. That is the output we saw above.

The next step is to tell the ListView how to render TodoItems using a renderer. If we just want a different string representation we can give it a StringRenderer like this:

const SimpleTodoItemRenderer:StringRenderer<TodoItemType>
 = (item:TodoItemType) => {
    return item.get('completed') + " " + item.get('title')
}

This works if we just want text output, but more likely we want to be able to edit the todo item from the view. In that case we create a ListItemView renderer with a checkbox for the completed property and an EditableLabel for the title.

const ComplexRenderer: ListItemRenderer<ItemType> 
   = (item: ItemType) => {
    return <>
        <BooleanCheckbox value={item.get('completed')}/>
        <EditableLabel value={item.get('title')}
                    strikethrough={item.get('completed')}/>
    </>
}

Again, these components are aware the smart objects. BooleanCheckbox knows how to toggle the boolean item.completed property. The EditableLabel knows how to edit a text property. All of the event propagation is handled internally.

With the Right Datastructure the Code Just Falls Out

So where are we now? We have the ability to define a datastructure out of lists, maps, and typed atoms like strings and numbers. Then we have smart react components know how to intelligently render the data structure and generally do the right thing. This is all pretty cool but we could have done this with regular TypeScript objects. The magic is that we have the types available at runtime. We have a schema. Because of the schema other features fall out for free:

Serialize

Save to JSON with data.toJSON() which recursively saves the data structure to a JSON object.

Deserialize

Restore from a JSON object with TodoList.fromJSON(json).

Property Defaults

Provide default values when adding new fields to a type.

const TodoItem = makeMap({
    title:makeString('untitled'),
    completed:makeBoolean(false),
    cost:makeNumber(10)
})

When restoring from JSON, if the stored object doesn’t have a cost property, the system will automatically use the default value of that property. Calling cloneWith on Map objects will also use the defaults if the property is omitted.

// completed and cost are set to defaults
const todo2 = TodoItem.cloneWith({title:"hello”})

History / Undo & Redo

We can monitor all changes to a datastructure by attaching a change handler to the top node. This means we can record every change in memory and undo those changes with fully generic code. I created an AHistory object which does exactly that.

const history = new AHistory<TodoListType>(data)
// add item
data.push(TodoItem.clone())
// undo the add
history.undo()

Constraints

The runtime schema also gives us constraints that can’t be expressed directly in the type system. Suppose we are making a drawing program and want to express that the radius of a circle must be a positive numeric value. We can do that with a constraint function that must return true for the change to be allowed.

const GreaterThanZero:Constraint<number> = (v) => v>0

const Circle = makeMap({
    x:makeNumber(0),
    y:makeNumber(0),
    radius:makeNumber(1,{
        cons: [GreaterThanZero]
    }),
})

Now the Circle type will never allow the radius to be set to something less than or equal to zero. UI components can use this information as well. A smart number editor could use a numeric input field with min set to 1 to disallow entering an invalid value.

The Missing Piece

One thing I haven’t talked about is: where does the data come from? For something document oriented like a drawing program, we can imagine the document is represented as a tree structure which is persisted to a JSON file or exported as an image.

One of the core features of IdealOS was that it was database oriented. Sure, we could store a document as a single object in the database, but part of the power of a db is the ability to search and link. Right now the TodoList object wraps an array in program memory, but it doesn’t have to. What if, instead, TodoList wrapped a live query in the database. Then the program wouldn’t have to consider persistence at all. Data comes transparently from the system database into the UI. Editing a field would automatically update the database. Most applications would not need code for saving to disk or interacting with the database. The data structures would keep themselves up to date, letting the app focus on doing one thing well: helping the human interact with data.

Conclusion

The core idea of a schema, or runtime type information, is very powerful. We can build reusable components that automate away much of the busy work involved in making a GUI. The GUI can customize itself to the structure of the data rather than vice versa. Combined with a built in database, the applications for IdealOS can be tiny and incredibly easy to build. We can even imagine letting users build their own apps visually for IdealOS, resulting in an incredibly flexible and hackable system.

You can find the source to this unnamed db object library https://github.com/joshmarinacci/tool-toolkit.

Circuit Python Watch Status

Tue, 25 Jul 2023 19:22:58 +0000

It's been a bit since I've posted on my round screen watch project. Most of my time has been taken up by work, travel, and family stuff, but I did have a few seconds to add a feature or two.

New Screens

The Waveshare board I'm using has a lipo battery charger inside it, but until now I hadn't exposed it. There is now as screen that shows the current battery percentage, though I think my calculations might be off. What do you think?

I also added a screen with a timer. The actual timer itself works, but it won't alert you when the timer ends if you switch to another screen. I need to figure out some sort of a background task.

I also added a moon phase indicator using some calculations I found on the web. Again I'm not sure it's accurate, but the fun part is the star field particle effects behind the text. The hard part was figure out how to draw lots separate pixels (one for each star) without forcing the entire screen to refresh. The secret is to call bitmap.refresh() after each star is moved, rather than after drawing all the stars. If you do all the stars then you get one giant dirty rectangle that fills the whole screen and will be slow. If you call refresh after only changing two pixels (undrawing the old, then drawing the new), then in most cases you'll have a dirty rect that is only 2 or 4 pixels in size, which is super fast to draw. Even with 50 particles, the total pixels drawn with my incremental approach is still a fraction of the full 240x240 screen size.

Sleeping

The final big feature is making the watch sleep, though it's only partially working. The screen will turn off after N seconds of no input. This more than doubles the watch lifetime on a single charge. However, the touch sensor and CPU are still running a full power, only the screen is off. The device does have a more advanced low power mode that will wake up the main CPU when a tap is detected, but unfortunately it requires monitoring a specific interrupt pin, which CircuitPython doesn't support. There is some async io support in the form of a counter api, that I plan to investigate next.

Pausing

I'm happy with the progress of my watch, but I'm pausing this project for now while I work on a few other electronic projects. I recently built a set of animatronic eyes based on Will Cogley's video tutorial, which I plan to expand into a singing tiki head. I'm also experimenting with the new synthio CircuitPython API to build a step seqencer in a gameboy-like form factor.

As always, the full source is on github.

CSS Text Style Builder

Wed, 12 Jul 2023 20:16:07 +0000

It’s a truism of the web that when something becomes free it turns to crap. This is largely due to advertising, tracking, and SEO hackers getting their crappy version of something to the top of the search index. When this has happened to a tool I want to use I can’t fix the underlying problem, but I can make my own non-tracking version and share it with the web, doing my little part to make the world a better place. Today it’s a CSS Text Style builder.

For my smart watch project I want some cool looking fonts for the watch face. For the body text of the interface I can use nice fonts from Google Fonts and convert them to bitmaps (my tutorial here). However, for the watch face itself, I want text with interesting full color effects like drop shadows, outlines, and 3D borders. Furthermore I don’t want an entire font, just the numbers and maybe a colon. Memory is precious.

At first I thought of writing drawing code to render these directly, but I realized I already have a great way to style text: CSS. The problem is the crappy SEOed tools I mentioned above. Being me, I solved it by building my tool. Behold: CSS Text Effects

While still a pretty ugly UI, this tool does a couple of interesting things. The text style is actually a JS object with properties and constraints using Zod, a Typescript library. Using these constraints the interactive sliders can be autogenerated from the spec. Furthermore the style is mirrored in the URL, so you can share a text style you make with anyone else by just pasting a link.

CSS styling text is still more limited than box styles, but with some creative coding it is possible to generate outlines, fading 3d effects, and blurry fire. After doing a ton of research from great CSS design sites, I included some presets to show you the possibilities.

Finally, in addition to giving you the CSS for the style you’ve generated, you can also export a PNG with only the characters you care about using the Export PNG tab.

My re-creation of CSS styles in Canvas isn't perfect, but it's pretty good. I hope this tool will be useful to you. The source is available on GitHub in case you’d like to fork it. Feel free to contact me with feature requests and bug reports.

Fonts and Icons on Circuit Python

Sun, 25 Jun 2023 18:37:12 +0000

I’m continuing to work on a little smartwatch prototype using a little round LCD and I want to have nice looking text. The default font for CircuitPython is fine, but it’s very tiny. This LCD has a pretty high DPI compared to other hobbyist screens( > 200 ppi), so I need to find a new font. CP has a way of importing new fonts, but there are a few pitfalls and tricks I discovered, so that’s what I’m covering today.

Adafruit has a guide to using custom fonts on CircuitPython. CP only supports bitmap fonts, so you’ll need a font converted to BDF (ascii) or PCF (binary) format. Modern fonts are vectors instead of bitmaps, usually in TrueType (TTF) or related font formats, so we'll need a converter. The command line program otf2bdf is such a converter. If you’ve already downloaded a TrueType font locally you can convert with these instructions. I’m using MacOS but similar should work on Windows and Linux.

First install otf2bdf with brew, apt-get, or the install program of your choice

brew install otf2bdf

Now convert the font to bitmap at a specific point height. I’m using 16pt.

otfbdf SomeFont.ttf -o cpfont.bdf -p 16

Now copy the output bitmap font to your CircuitPython device and use it in Yython code like this:

from adafruit_bitmap_font import bitmap_font
from adafruit_display_text.bitmap_label import Label
font = bitmap_font.load_font("cpfont.bdf")
label = Label(
    font=font,
    text='Text in a nice Font',
)

And that’s it. Pretty easy.

Icon Fonts

Now let’s suppose you want some nice icons. You could download each icon as PNG or SVG and then convert to CircuitPython’s preferred BMP format, but icon fonts are a thing. Let’s use them. Google created the Material icon font for use across all of Android and the Web. They are completely open source and have icons for pretty much everything. We can’t use it directly, however, because it’s in what’s known as a variable webfont format which can have multiple weights and styles in a single file. Instead we want the static truetype fonts from the GitHub repo. I chose the font called MaterialIcons-Regular.ttf.

When you download a single Material icon font you’ll see that it is huge. 348KB for a single weight! That’s because it contains over 3000 icons. 348KB is too big for our little RP2040 that has less than 200KB, and of course we don’t need all 3000 icons. In most cases we only want five or ten of them. So we need a way to subset the font.

We will use a python program called pyftsubset that knows how to subset fonts. It is part of the font-tools. Install font tools with pip then run pyftsubset to make sure it’s installed correctly.

pip install fonttools
pyftsubset --help

Now let’s create a new icon font with a subset of the Material Symbol Font containing only the glyphs we want. If you select an icon back on the Material Symbol Font webpage a sidebar will appear on the right. At the very bottom it will list the code point of that particular icon. That’s what you need. Write down this code point, and the points of any other icons you want to extract. For my example I want the icons for Settings (e8b8), Battery Full (e1a4), and Timer (e425).

pyftsubset MaterialIcons-Regular.ttf --unicodes="U+e8b8,e1a4,e425" --output-file=icons.ttf

You can add the --no-ignore-missing-unicodes option to print an error if you typed in a code that doesn’t exist in the font. The resulting font is still in TrueType format, so we need to convert to bitmap to use it.

otf2bdf icons.ttf -o icons.bdf -p 16

And now we can use it in our program. To use the icons we have to specify the codes for each icon with unicode escapes:

time_label = Label(
    font=icons,
    text="\ue8b8\ue1a4\ue425",
)

Here it is on my prototype device.

C’est magnifique

Performance Limitations of CircuitPython's DisplayIO Graphics

Mon, 12 Jun 2023 21:23:01 +0000

For my next project with the WaveShare Round LCD I want to create an animation of Dr Strange's Eye of Agamotto. Here is my first attempt in JS (looks more like the comics than the movies). Not bad, I think.

When it animates the eye lids come down. I built them with quadradic curves that change over time using Canvas. Canvas is fast enough to let me make the animation very smooth. Once porting it to CircuitPython on an embedded device, however, well.. it's pretty damn slow.

Animation in CircuitPython

For my first attempt I captured the JS animation as a video, scaled it down to 240 x 240, then converted it to an animated gif. On the device the gif renders at 2 or 3 fps. A far cry from the smooth 60fps of my browser.

For my second attempt I exported a background image of the disk, drew the eye on top, then animated the eye lids as rectangles that move from the top and bottom to the center. It was still rough, but seemed to be maybe 6 or 7 fps.

What's going on here? While this device is slow, it's stil a 133mhz process. I had far faster animation back on my old pentium, even when running Java code instead of optimized C. The problem must be in how the display is updated. After spending the next 8 hours researching how these sorts of displays work, and how CircuitPython exposes them to the programmer, I've come to this conclusion: Tiny TFT displays are just slow by nature, even with raw C code, but there are ways we can speed it up. That's what the rest of this post is about.

How CircuitPython Does Graphics

CircuitPython can be quite fast because all of the performance sensitive work is implemented in optimized C code. This includes drawing. The displayio system lets you use bitmaps with indexed colors to save memory, which means copying bitmaps back and forth can be fast. Drawing shapes can be almost as fast as C because we are just setting pixels in an in memory bitmap, and of course things like vertical lines and rectangles can be optimized using memcpy routines. Actually drawing to the screen is a different story. It can be very slow.

No matter how you draw into the framebuffer, that buffer has to be uploaded to the display a bitmap. Most of the little hobiest displays use a serial bus called SPI. Looking at the source to the driver for my board, the only thing custom was the init code. All uploading is a standard FourWire connection, which is a type of SPI connection.

I’m using a 240x240 16bit color screen connected over SPI. It has no indexed modes or accelerated drawing routines (as far as I can tell none of these hobbiest screens do). This means to do a full screen refresh requires sending two bytes for every pixel on the screen, over a serial port, one bit at a time. The DisplayIO routines are smart and try to use batches, but even if you set the entire screen to a single color, that is still a lot of data to transfer.

240 x 240 x 2 bytes per pixel equals 115200 bytes per screen, or 912000 bits per frame. For 30fps that’s over 27 million bits per second with zero overhead. In practice I’m guessing we get half of that. A lot of SPI ports only run at 10 or 20 MHz, so getting a consistent 30fps just isn’t going to happen. That’s just the nature of SPI. Desktop computer screens use protocols that can send more than one bit at a time, and operate much faster than 10Mhz. SPI just can't.

There is one saving grace, however. These screens have an internal memory buffer and support partial refreshes. This means you can update rectangular subsections of the screen containing just the changes between the last frame and the next. It won’t help for full screen animation, but for typical GUI work where only a few things change we can make it quite performant. We just have to go back to rendering algorithms from the 90s (like the Java Swing toolkit that I worked on for 10 years). This is in fact what displayio does.

The inability to do smooth full screen refreshes does mean certain types of effects, like slideshow animations and or fading the entire view to black are impossible. (Well, maybe we could animate the backlight?)

Earlier I said there are not acceleration commands, like RLE encoding or bitblts, or indexed color palette swaps. This is true. However, this particular chip does have vertical scroll offsets, which might make it possible to do smooth vertical scrolling of at least part of the screen. I’ll have to look into that later.

Improving Performance

So to start with, the only perf improvement we can do is set the SPI bus to the fastest possible speed. In the sample micropython code that comes with this screen I *think* they are setting the speed to 100mhz but the C code seems to be 40mhz and the Arduino code 66mhz. 🤷 Switching from 20mhz to 100mhz seems to speed up the animation I’m working on, but really I need a proper benchmark to measure it.

I created a little script to redraw a fullscreen bitmap over and over, with different SPI speeds. At 1mhz I’m getting an 690ms per frame or about 1.45fps. As I increase the speed the fps improves linearly until it tops out at 64 MHz for a frame time of 144ms or just under 7fps. Setting the speed any higher has no effect. Calculating the multiplying the size of a frame in bits times the fps gives me 6.4Mbps, which sounds correct for an uncompressed video stream. This seems to be the max we can get.

Incidentally if I leave off the speed setting it seems to default to 20Mhz. If I leave off the target framerate of 60 it drops to 3.66fps. If I set the target framerate to 10 (which is still higher than our max) it drops to 5.38fps. Setting the target to 60 lets get back to our max fps of 7. So I’m not sure what exactly the target framerate is doing but it seems to have some inaccuracies. Perhaps it’s allowing for more GCs? Setting it to 100 actually seems to increase the fps slightly to 7.09. Setting it to 1000 increases it some more, but only to 7.3. So clearly there is some tradeoff here but if it’s impacting battery or heat then it’s not worth it for that tiny speed improvement.

Partial Refresh

Now let’s try filling just part of the screen and see if we can get speedups?

240x240 = 7.09 fps

240x120 = 13.89 fps

120x120 = 25.48 fps

Sure enough: halving the pixels roughly doubles the frame-rate. Cutting them to a quarter gives us a 4x speedup. So 25fps seems pretty good, right? Yes and no. The numbers say it is pushing that many frames per second, but visually it does not seem to actually be rendering that fast. In fact there is a lot of tearing. Recording with my slow-mo camera shows that it takes about 70-100ms to render a frame, or in the range of 10-14fps, even though we are sending significantly more frame data than that. Technically we could refresh at 64x64 rectangle over 60fps, assuming we had no overhead, but that’s only 16th the number of pixels. And visually it doesn’t look that fast, plus processing that much data in code for something like a video would be insane.

Still, this does give us something to work with. A full screen refresh is around 7fps, but if we only want to refresh part of the screen at a time then we can get 15 to 30fps fairly easily.

Let’s try doing several circles rotating. I can get a calculated fps of over 30 fairly consistently. In fact, if I set the target framerate to 20 and the speed to 200Mhz I get an extremely consistent exactly 32fps. Something must be syncing up nicely here. I'm not doing anything clever in my code. I created some circles and move them every frame. DisplayIO must be doing some dirty tracking underneath and only sending the smallest number of changed pixels to the screen. That's how we get 30fps. But again, visually it does not seem as smooth and there is some tearing. I suspect the refresh rate of the physical LCD is lower than how fast we can pump out screens. It also looks a little choppy because my circles are restricted to integer coordinates. I can't move a circle to be on the edge of two pixels and get anti-aliased drawing like I would on HTML canvas. But for what I'm building I can live with these constraints.

Next Time

So what have we learned?

All of my research taught me that the SPI bus is the bottleneck, and there's no way to get more than 7fps for a full screen refresh. However, partial screen refreshes can quite fast and the displayio API was designed to enable this. They've done a great job. I don't think I could do better without hardware changes.
Make sure you set the SPI bus to it's max speed. The default may be much lower than the component is capable of.
Design your graphics an animation around updating the fewest number of pixels
Use the backbuffer to your advantage. You can make complex graphics as long as the number of pixels changed per frame is small. Cool particle effects should be possible with this technique.

Next time I'll show you some graphics examples I've been working on as well as how to access the touch events to make a little painting app. All of the source code for this project, including my performance testing on in this github repo.

Playing with a Pico based Round LCD screen

Thu, 08 Jun 2023 20:57:00 +0000

I'm enamored of all of cheap and hackable screens and embedded CPUs coming onto the market. I'm frustrated, however, that despite embedded computing being more accessible than ever, the mass manufacturers products are all becoming more and more the same. Where are the phones with oval screens? Where are the steam punk pocket calculators? We have the technology to make devices that are ever more niche and increasingly creative and different, yet what you can actually buy is more glowing rectangles. I'd like to fix this.

Recently I found this cool micocontroller with a round LCD made by Waveshare. It's built around the RP2040 chip that sits at the heart of the Raspberry Pico, making it very compatible with Arduino and Python. It has some built in sensors, 4MB of flash, and lipo charger circuit. I ordered one last week and it just arrived. Let's dive in.

Though the official website only mentions C and MicroPython support, there is a beta CircuitPython firmware build available to download here, which is what I'll be using today. I also found a Rust create for a related variant of the board, so I might try hacking on that later.

First boot

After plugging it in the device shows a boot screen, photo of some greenery, and then this live sensor output screen; highlighting the built in accelerometer, gyroscope, and battery sensor.

In the photo you can see just how small this thing is. 1.28 inches across! That's a USB-C connector at the bottom there. The screen has a 240x240 pixel display that I found to be pretty sharp. It looks a little blurry here because I hven't taken off the plastic protective film yet.

On the back there are two buttons for boot and reset. When I plugged it into my computer it did not come up as a local drive. When I held down boot while plugging it in my Mac asked to trust the device and a tiny drive called RPI-RP2 popped up. The bootloader text shows:

UF2 Bootloader v3.0
Model: Raspberry Pi RP2
Board-ID: RPI-RP2

So far so good. After dragging on the CircuitPython UF2 file to the device it rebooted and came up as a CIRCUITPY drive. Using screen /dev/tty.usbmodem2101 got me to the python shell. Success!

Drawing to the Screen

I struggled to draw to the screen for a few hours. Some research indicated this device uses the gc9a01 graphics driver, which is supported by the latest CircuitPython release. None of the sample code I found worked, however. The backlight would come on but nothing was drawn. By coomparing to older MicroPython code by Alasdair Allan, I confirmed I was using the correct pins. Even the main spec page says I am.

After some more searching I came across this other data page specifically for the touch version. Nothing seemed awry, but looking at the schematic diagram I could see the pins are slightly different than the non-touch version, and different from the docs in the demo code.

In the end having the touch version meant one pin was different. Below is the code I ended up using. Notice that reset is not set to LCD_RST as would be on the non-touch device. Instead we have to use pin 13.

spi = busio.SPI(clock=board.LCD_CLK, MOSI=board.LCD_DIN)
# LCD_RST is 12 in regular version
# but we need 13 for the touch version
display_bus = displayio.FourWire(spi, 
   command=board.LCD_DC, 
   chip_select=board.LCD_CS,
   reset=board.GP13)
display = gc9a01.GC9A01(display_bus, 
   width=240, 
   height=240, 
   backlight_pin=board.LCD_BL)

And with that everything worked. Here's a photo of the LCD showing a indexed bitmap of Earth.

Next Time

Now that I have the board working it's time for some fun. Next up I want to get print a case for it, access the accelerometer, and build some sort of interaction app. Maybe a puzzle game.

Canvas Scaling and Smoothing Tricks

Sat, 15 Apr 2023 17:06:27 +0000

I love HTML Canvas. I even wrote a book about it once. Canvas is good drawing API and it runs everywhere. However, despite the magic that is the Canvas API, it can still be tricky to use when it interacts with CSS. I often have people ask me how to make their canvas fill the screen, or resize with the window, or to have a fixed aspect ratio but still scale to fit inside the window without overlap. All of these require understanding some internal details about how Canvas works. So let’s dive in.

Canvas is a replaced element in HTML. This means it is a rectangle with a fixed internal size. Essentially the browser treats a Canvas element as an Image. This means that some CSS styles meant for images also work for Canvas! You can tell the canvas to be a certain pixel size and use CSS to change its visual size. For example you could make a canvas that is 256x256 pixels, then scale it up to be 2560x2560. The canvas doesn’t know the difference; it draws into its buffer just like it was an image.

In the case below a 256px magenta circle on a white square canvas that is resized to be 8000 pixels across. The height is auto so it will maintain the same aspect ratio, just like an image.

<canvas width="256" height="256"></canvas>
<style type='text/css'>
     canvas {
        width: 8000px
     height: auto
  }
</style>

A really big image

Adapting for High DPI Screens

Now let’s consider HiDPI. Suppose you want a 500 x 500 pixels canvas, but if the viewer has a display with 2x HiDPI support then it should scale it up to look the same, but sharper. We can do this by increasing the pixels in the canvas and using CSS to constrain it back to its 500 x 500 pixel size. Remember that in CSS pixels are virtual pixels. They are scaled by the browser automatically. So on a 2x screen we’d want the canvas to be 1000x1000 image pixels but scaled back to 500x500 virtual pixels, which will be scaled back up by the browser to 1000x1000 physical screen pixels.

<canvas id="can" width="1000" height="1000"></canvas>
<style type='text/css'>
    canvas {
        width: 500px;
        height: auto;
    }
</style>
<script type='javascript'>
    const ctx = document.getElementById('can')
    ctx.fillStyle = 'red'
    ctx.beginPath()
    ctx.arc(250,250,250,0,Math.PI*2)
    ctx.fill()
</script>

Left is without the fix, right is with the fix.

Notice in this zoom how the pixels are sharper in lower image

Of course the ratio might not be 2x. There are some mobile devices with 3x screens. Also, the canvas now thinks it has 1000px of space, but if your drawing code assumes 500px it will will only draw in the upper left corner of the canvas. Let’s scale the size of the canvas to use the real device’s DPI in code, and then reverse the scale for drawing the circle. The property window.devicePixelRatio will give us the correct scaling factor.

<canvas id="can" width="500" height="500"></canvas>
<script type='javascript'>
    const canvas = document.getElementById('can') 
    let dpi = window.devicePixelRatio
    const WIDTH = 500
    const HEIGHT = 500
    canvas.width = WIDTH*dpi
    canvas.height = HEIGHT*dpi
    canvas.style.width = `${WIDTH}px`
    canvas.style.width = `${HEIGHT}px`
    const ctx = canvas.getContext('2d')
    ctx.save()
    ctx.scale(dpi,dpi)
    ctx.fillStyle = 'white'
    ctx.fillRect(0,0,WIDTH,HEIGHT)
    ctx.beginPath()
    ctx.arc(WIDTH/2,HEIGHT/2, WIDTH/2,0, Math.PI*2)
    ctx.fillStyle = 'magenta'
    ctx.fill()
    ctx.restore()
</script>

Preserving Chonky Pixels

We’ve seen that scaling up a canvas to handle high resolution is easy. But what if we need the opposite? Suppose we were making a game with retro-style pixel art then we want to see the chunky pixels. Unfortunately if we just scale it up the pixels get blurry, like this:

To fix this we need to tell the browser to not smooth the pixels. CSS already has a property for this, image-rendering: pixelated and it works for canvases too.

With that one css property it looks like this:

Note that if you draw scaled images inside the canvas they may still be smoothed. You can fix this by setting imageSmoothingDisabled:true on the drawing context

Big but not too big

Now that we have a retro game canvas, say a 160 x 120 pixel screen, we might want it to grow as big as possible to fill the extra space on page. We could use width:100% but the bottom might get cut off if the window is wide but short. Setting both width and height to 100% would make it fit but stretch and squish the pixels. Instead we want the canvas to be as big as possible while maintaining it’s aspect ratio, but not so big that any part of it is hidden. This turns out to be a common need for images as well, so there’s already a CSS property that does it: object-fit. With object-fit:contain the canvas will shrink to be completely contained within the styled size.

In this example a 160x120px canvas is stretched to 500x500.

canvas {
  width:500px;
  height:500px;
  image-rendering: pixelated;
  background-color: black;
}

If we set object-fit to contain then we get this, with black bars (the background of the canvas) filling the extra space.

And if we set object-fit to cover it will expand to fill all space, but possibly crop the edges off. Not good for our retro game but it could be useful for other cases where you don't know the rendered size until the page is loaded.

The object-fit - CSS: Cascading Style Sheets | MDN specs page describes all of the possible values. This css property is really for images, but remember: anything an image can do a canvas can do better!

canvas to fill the screen

So far we have used a canvas with a fixed size. We might scale the size up, but it has a definite size that doesn’t depend on the size of the browser window or any content next to the canvas. Sometimes we do want the canvas to adapt to its surroundings. Consider a drawing tool. It needs to resize when the user resizes the window. We could give the canvas a width and height of 100%, which would make it stretch to fill the available space, but, just like an image, that no longer preserves the aspect ratio. The pixels would get squished. Instead we need the canvas to change both its layout (css pixels) and its drawing surface (image pixels) to handle the current conditions. To do this we’ll need some code.

First we need to figure out how to get anything to fill the page. Let’s start with a div wrapping a canvas. Setting the width and height of the div to 100% does this:

Hmm. Not what we want. The problem is that the body element is special. It always takes up the full width of the window’s viewport, but the height depends on the content. What we want is for height to be exactly 100% of the viewport’s height. Enter CSS vh and vw units. Setting the div's size like this

div {
    width: 100vw;
    height: 100vh;
}

gives us this

Depending on your browser you might get some scroll bars. This just means there’s extra space we need to get rid of. Set the padding and margins to 0 for both the div and body, and set box-sizing to border-box to account for the 5px colored borders. Now we get this, and resizes properly with the window.

Now that the div is sizing properly, we just need to update the canvas itself when the window resized. Again we will use a little bit of code. On every size it will set the canvas size to the size of the div, minus the borders. Depending on your browser you might also need to set overflow: hidden; on the body element.

    function calcCanvasSize() {
        // get the wrapper size
        const wrapper = document.getElementById('wrapper')
        let rect = wrapper.getBoundingClientRect()
        const can = document.getElementById('can')
        // set the canvas size to wrapper, minus the 5px borders
        can.width = rect.width-5*4
        can.height = rect.height-5*4
        const WIDTH = can.width
        const HEIGHT = can.height
        // redraw
        const ctx = can.getContext('2d')
        ctx.fillStyle = 'white'
        ctx.fillRect(0,0,WIDTH,HEIGHT)
        ctx.beginPath()
        ctx.arc(WIDTH/2,HEIGHT/2, WIDTH/2,0, Math.PI*2)
        ctx.fillStyle = 'magenta'
        ctx.fill()
    }
    calcCanvasSize()
    window.addEventListener('resize',calcCanvasSize)

And now we get a beautifully resizing canvas.

Conclusion

I hope today you've seen how easy it is to style the canvas to different uses. As browsers introduce more css properties for images, almost all of them will work for canvas elements. Go Canvas!

Code for everything in this blog is available on this git repo.

One Hour Pong

Thu, 12 Jan 2023 23:03:25 +0000

I’ve been teaching my nephew to code for the past few years and we did a game jam together over the winter break. He said to me: “Uncle Josh, you sure can code fast.” I reminded him that I’ve been programming for 30+ years, and one day he’ll be even better than I am.

My nephew struggles to come up with ideas and often goes down rabbit holes on one feature or tool instead of working on the rest of the game. I told him that time constraints can make a game better because it forces you to focus. Ever the clever child, he suggested I do a pong game in one hour. From scratch. With no existing code or game frameworks. Challenge accepted!

Here’s what I built last Tuesday.

Live Demo

Construction

To build the game I first spent about 15 min doing setup. Point and Bounds classes. Ball, Paddle, Game state. And of course a game loop with functions for input, collisions, draw, check for game over. However, I didn’t actually write these functions or classes yet. I just stubbed in the names and left the implementations for later. My nephew asked why I was doing it this way. “Having the stubs helps to organize your brain and your code. I’ll fill in the code as I need it. That way you don’t wast time building parts that you don’t end up needing.”

By the 20 minute mark I had basic drawing working, and movement with keyboard inputs by 25 min. Collisions and physics ended up being the hardest part. I wrote a Bounds class with an intersection method, but that wasn’t enough. To do collisions correctly the direction of the ball's motion matters. I ended up doing a more brute force method where I check the ball against the bottom of the top wall, the top of the bottom wall, each paddle, etc. Not as clean as I’d like, but far simpler.

I managed to add sound and fade effects in the last 5 min. With 30 seconds remaining I remembered I still needed to make it "pretty", which ended up being just changing colors to a clean grayscale with red for the ball.

Once the challenge was over I did spend another 20 minutes fixing the fading code. It worked before but was wrong. I also tweaked the size of the canvas, added wall bumpers, and a click event to trigger audio. (I always forget that part).

What you see above is with the extra 20 min.

Tools used:

Typescript. Using Typescript with a good IDE (WebStorm) can be magical.
For sound effects I used the JSFXR web tool and embedded it in the game using a string encoding instead of downloading wave files.
I started using BeepBox to create a music composition for the background, but ran out of time. You can hear the little bit I managed to make here.

Conclusion.

Doing a one hour game was really fun. The time constraint pushed me to think fast and balance planning with YAGNI. I’m really impressed with how far you can get with modern canvas and javascript, no frameworks required.

You can view the soruce here.

Why you can’t build a web browser and why you should anyway.

Wed, 14 Dec 2022 16:01:16 +0000

In the last couple of years I’ve seen a lot of lamenting about the browser mono-culture. I even wrote about it myself. Some complains focus on how complicated the web specs have become. So big that only a few companies can implement a browser from scratch. I think these complaints are misplaced. Even if the web platform didn’t have such a large API surface it still wouldn’t matter. You can’t build a large scale browser with large marketshare. The browser market would still be a monoculture. You can't solve a business problem with a technology solution. I also don't think that replacing the web with something smaller like Gemini is the answer.

But you know what? We should build new browsers anyway! And to prove it I just made one in less than 1000 lines of code! Let's dive in.

No One Can Build a Browser

WebKit vs Chromium. Firefox is the last independent browser. Blah blah blah. This is a lot of whining without getting to the root cause. Yes, no one can make a new browser anymore, but so what? No one but a few people invest in building a new browser because there is no motivation for it unless you control the OS or a search engine. There is no money in the browser itself. WebKit is driven by Apple's Safari team which is driven by the selling of Mac hardware. Chromium is driven by the Chrome team which is driven by Google’s search engine. Of course no one can build a new browser.

Building a browser means you're competing with two of the richest companies in the history of the world. Of course they are the only ones who can invest billions in a commodity product that has no direct revenue. Even Microsoft couldn’t do it; Edge is their Hail Mary Pass to not be displaced on their own platform. Even if somehow you could make a new full spec browser it still wouldn’t matter because you’d never get marketshare from the platform owners controlled by, again, some of the largest companies ever. Even if you win you’d lose.

Side note: When I was at Mozilla I urged senior management to switch the desktop browser to WebKit (to lower costs since no one cares what the engine is) and focus on bringing FF to new platforms (like standalone VR headsets).

The Problems with a Browser Monoculture

So yeah, no one can build a browser. No one can start from scratch and keep up with Apple & Google. Even Microsoft couldn’t. But why does this matter?

First, it's a problem because it hinders development of alternative operating systems. A new OS isn't usable in the modern would without a full spec browser. It's the same issue as launching an OS without an existing App Store. Embracing the smol web or creating new protocols won’t fix any of these issues. Building a new browser won't help. This is a real problem, I agree. It does stop new platforms from emerging because the platform under you is always an intermediator. This is one reason why Facebook has invested so much in VR and doesn't use the Chrome browser on the Quest.

We will never beat the browser makers at their own. Let’s play a different game instead. Let’s go back to the beginning and think smol

Gemini?

I like the idea of the SMOL web and Gemini. But I think their approach is wrong. The Gemini protocol throws the baby out with the bathwater. It doesn't even allow anchors so you can jump to a specific part of a document. This was a feature of the very first version of the web back in the mythical golden days that the SMOL web longs for.

In the Gemini FAQ they say: Why not just use a subset of HTTP and HTML?

The problem is that deciding upon a strictly limited subset of HTTP and HTML, slapping a label on it and calling it a day would do almost nothing to create a clearly demarcated space where people can go to consume *only* that kind of content in *only* that kind of way.

I agree that this is Solutionism at it’s worst.The Gemini protocol is really beside the point. What matters is creating a part of the Internet where tracking and spam are not possible. The Gemini community is enforcing this with a protocol, but it’s the tracker-less part of the web that is the point. That's where the value is.

I like Gemini's goal, but it has problems. First, it’s exclusionary. By making it different it becomes harder to get there. The Gemini part of the Internet will always be insular and separate. Second, it doesn’t solve any real problems. It doesn’t address the reasons The Web became a cess pool of spam like this in the first place. Essentially Gemini is giving up on the world and saying "We'll just move elsewhere and build our own tiny web without you!" Fine if you want that, but it’s awfully depressing and unproductive. I don’t think it actually helps anyone.

Ultimately Gemini isn’t really a technology or protocol, it’s a social statement. It’s about building a community that says "no tracking and spam is tolerated here", and they are using the protocol to enforce this. I like it. I just think there’s more we could do that would reuse existing technology in better ways, and benefit more people.

Let's Build a Browser Anyway

The Gemini FAQ claims that their spec is easy enough to implement in 50 to 100LOC. I don’t know if that’s a good metric. Certainly HTML & CSS is more complicated, but it’s not that much more complicated. I've found that a lot of things that seem hard or impossible become solvable or even easy if you are willing to relax your constraints.

A high performance Javascript VM: hard. A slow JS interpreter: easy.

A full spec compliant PDF parser. hard. A PDF 1.1 parser (which is the parts most people want), far easier.

To prove this point I wrote a browser implementation in less than 1000 lines of code. (I was shooting for 500 lines, but currently I’m stuck around 750). Here’s what it looks like:

This is 750 LOC of typescript. The parsers for HTML and CSS are not spec compliant, but they do handle a useful subset and each is tiny! I mean tiny on the order of 20 lines for the grammar thanks to the magic of OhmJS, a PEG toolkit. This prototype example was not meant to actually be a usable browser, but to prove that it can be done.

My mini browser really does parse and render HTML and CSS the way it would have been done 15 years ago. It has links, lists, paragraphs and images. It supports basic CSS properties and the cascade. It does not support all of the CSS spec, but it is written in such a way that it would be clear where you could add them.

Okay, I’m sort of cheating a bit because I did it in a browser, but it’s not that much of a cheat. The only thing I’m using from the host browser is the graphics and network layers, both of which a standalone app would leverage the OS for anyway, so I think it’s fair. The implementation is simple enough that I could see this ported to something like a Raspberry Pico (not a Pi, a Pico) with a bit of supporting C/C++/Rust code.

And yes it has some rendering glitches. I said it works, not that it’s bug free. But this proves that in a thousand lines of code we can make something that can actually browse the web. And if I could get this far in less than a thousand lines of Typescript, what could be built with 10k of Rust code? You could have a very fast and lean browser that provides real value for a lot of people. It could even support images and video links *without* including Javascript support. (The audio and video elements are a thing for a reason).

Uses for a smaller independent Web Browser engine

Is it possible to make a new full browser that is so complete it can run Facebook or Google docs? No. We'd need to fork an existing engine for that. But, can you make a browser that is useful? Yes! There’s lots of the web you can render. The web standards are designed to degrade gracefully. Just because most pages use JS doesn’t mean the browser has to render it. The Web !== React SPAs. There’s so much more.

A new lightweight browser that’s a full native desktop app with zero JS support. It would be *fast* and use very little memory. It could even run on retro computers.
A replacement for Electron that includes the styling and layout parts of the web without the mess that overhead of a second runtime.
a library for building News and feed readers.
In game rendering of 2d rich text and images (think catalog and in game news updates)
PDF generation
local app development. Imagine how much faster Electron could be if it didn’t need to support every possible browser API?
offline bookmarks reader
eBook renderer
web browser or UI for tiny embedded systems like a Raspberry Pico
A knowledge archive app

Such a browser should focus on speed and new uses. It should focus on good experiences. When you need a real browser for something, then just open a call out to the user’s preferred full browser.

The Web Scales Down Too

We must remember that the web was designed to scale. Just because the spec details it doesn’t mean you have to implement it. From the very beginning graceful degradation was a key value point of The Web. You can make a mini browser. It doesn’t have to be fully compliant to be useful. There’s a tons of things you could do with a browser that isn’t spec complaint. In fact, without any JS support at all (which cuts out probably 75% of what a full browser does).

I hope this little prototype inspires you to go build something amazing.

Mini Browser repo

Canvas Computing Prototype 1

Sat, 03 Dec 2022 06:34:51 +0000

I've been working for a while on some new ideas around the future of programming, but I haven't done a great job of sharing these with the world. It's not science if you don't publish your results, so here we go.

I have long wanted a large infinite canvas for computing. A place where I can sketch, draw, and drop in code, all positioned exactly how I want. I should be able to divide my program into little chunks and algorithms, mixed with interactive controls and visualizers for the output. While the idea seems similar to boxes and lines visual programming environments, mine is still firmly built around a text based language (in this prototype, anyway). I'm imagining something more like a spreadsheet where you can place your data and your functions wherever you want, tied together with dependencies so any change will automatically recalculate the results.

Okay, that's all pretty abstract. Let's discuss a very simple, but concrete, example.

I want to draw a square. I have a canvas view for the output, and a tiny text editor containing a few lines of code. To specify the color I don't want to put it in the code, but instead use a little color picker button. When I change the color the code should automatically execute and update the canvas. This week I built a prototype that can do this:

Here's what it looks like:

Each of the panels can be dragged around in an infinite canvas. Not shown is a debugging panel and little menu bar.

Functionally it works. Everything can be moved around, the code executes, and there are dependencies so that when you change the color the code runs and updates the canvas. Most importantly, the whole deal is persisted as a JSON document into the browser's local storage and reloaded on page load.

Now let's discuss what the problems were.

Dependency management

I built the prototype using React and Typescript. I tried to have a base class for handling events so that the different coding nodes can communicate with each other. As part of this I want the type system to only let you listen for events that the source will actually send. Making this work with inheritance under Typescript is tricky. At one point two versions of what *thought* were the same type were incompatible and tried to insert a 'never' type. This blog shows a better approach: Events and listeners in TypeScript (Christopher G. Jennings that I'll try next time.

Next, The ceremony of which node is watching which is complex. It's currently done with the aformentioned event system. There has to be a simpler way to express dynamic dependencies. Also, the system itself needs the ability to monitor all of the watching links and show it in the debugger. How can we find out which objects A is listening to, and which event names? So, new design is needed here.

Persistence

The persistence system is flaky. We need an object model that can be persisted more reliably. Maybe empty constructors. The problem is that the constructors of the classes do computation, but it's a lot harder to call those same constructors programmatically when restored from JSON. Instead I'd like to be able to allocate the object with the right prototype, restore all of the property values, then invoke some sort of 'init' method to make the object live. Only then will it be shown on screen and start propagating events.

Another problem with persistence is that the dependencies are not preserved across persistence cycles. And I have to register a fixed list of classes and views to make persistence work correctly. Is there a better way to automate this part so it needs less overhead? Also the debug console and menu shouldn’t be persisted since they aren't really a part of the user's document.

UI improvements

Instead of a menubar I'd like to be able to create an object by right clicking, selecting from a menu, and the new object appears there. This lets me create an object and give it a position at the same time.

A few other issues:

The snippet editor needs syntax highlighting.
The color constant is too big. It needs a different, maybe non-resizable, window view. Still must be draggable, though.
The persistence engine needs to know when nodes change and when they move, but other nodes only want to know when a dep changes, not when it moves or is resized.
The system needs a way to add notes & comments and headers & captions.

Next steps

For the next iteration I want to address the issues above, but with a more complex use case. I'm considering some floating constants that are for numbers used to draw a fractal tree. Then you can just move some sliders to change the parameters to the fractal.

Make Rects Fast, in Rust

Thu, 29 Sep 2022 19:47:17 +0000

How can you draw a filled rectangle fast? By making it not slow! Great if you are using a GPU to accelerate rectangle drawing for you, but what if you are doing it oldskool with an in memory frame-buffer? You'd probably write some code like this:

// assume width is the width of your frame buffer
// assume color is a u32 of ARGB 
let ry = rect.y as usize;
let rx = rect.x as usize;
let rh = rect.h as usize;
let rw = rect.w as usize;
for y in 0..rh {
    for x in 0..rw {
        self.buffer[((ry + y) * width) + rx + x] = color
    }
}

Now, this will work of course, It's in fact the simplest thing to do, but if that's a lot of calculations we are doing for every pixel even though they are getting the same value. Really we are just copying a 32 bit integer over and over to the same adjecent memory. If this was in C we'd use memcpy() but we are using Rust and want to be safe, so what can we do?

Safely Modifying Vectors in Rust

In Rust, you can represent a memory buffer by a Vector of u8 or u32 numbers. Since my project is for an ARGB display, I'm using Vec<u32>. While you can set values individually with indexes, this is slower than in C because it has to do runtime bounds checking to make sure the you don't off the end of the allocated memory.

For the same reason Rust really doesn't want you to copy only parts of one vector to another because that also needs bounds checking. Instead Rust provides functions like copy_from_slice which let you copy only the entire vector at once, which is fast and only checks bounds once. That's fine if we want to manipulate the entire vector, but what if we only want to work with just a part of the vector?

Vector has a few other interesting methods like split and chunks, which give you access to subsections of the vector as mutable slices. Want to copy into just the half the vector? Split it! Need to chop it up into a bunch of uniform sized chunks, Chunk it!

The Plan

We are going to compose our solution for drawing rectangles in sections. First we will divide the buffer up into rows for each scan line, and then figure out the part of the rect inside that row. Then we can finally fill it in. Let's go.

First, let's calculate how big the buffer is in buffer_bounds and what part of the rectangle will be drawn in fill_bounds. In case the rectangle isn't fully inside the buffer we must calculate the intersection first.

let buffer_bounds = Rect::from_ints(0, 0, width as i32, height as i32);
let fill_bounds = buffer_bounds.intersect(rect);

Now let's create a vector representing a single row of the rectangle. Think of it as a rubber stamp that we can use over and over. Let's fill it with the color.

let mut row = vec![0; fill_bounds.w as usize];
row.fill(color);

Now we need to access each row of the buffer as a separate slice. We can do this with chunks_exact_mut. All the chunks methods return an iterator over the slices. The exact version ensures we only get chunks of the exact size we want (any remainder will be ignored), and mut means the slices will be mutable. For each row_slice we first check if it intersects with the rectangle we want to fill. If not then just continue.

for (j, row_slice) in self.buffer.chunks_exact_mut(width).enumerate() {
    let j = j as i32;
    if j < fill_bounds.y {
         continue;
    }
    if j >= fill_bounds.y + fill_bounds.h {
        continue;
    }

Now we can split the row of the buffer into three parts, before, after, and the middle part. The middle is the only part we want to draw. We can pull these parts out using split_at_mut. Now we can finally copy from our row to the current slice, just the part we need.

    let (_, after) = row_slice.split_at_mut((fill_bounds.x) as usize);
    let (middle, _) = after.split_at_mut((fill_bounds.w) as usize);
    middle.copy_from_slice(&row);
}

That's it. Now you can draw a rectangle or other shapes faster than setting each pixel individually. You can see this code in action as part of the next release of IdealOS Clogwench here.

Ideal OS Mark 6

Wed, 21 Sep 2022 18:43:38 +0000

Yeah, it’s been a while, but not forgotten! In between moving homes and jobs and pets I found to do a new release of my IdealOS prototype. And not just a new release, but an actual full rewrite! Let’s take a look at what’s new in IdealOS Mark 6!

Rewrites

The central core process is now entirely in Rust. No more Javascript! The central core handles message passing, the on-disk database, and access to all actual hardware. This means the central core has to be fasty, secure and reliable. This made it a good fit for Rust.

The central core now has a database built around JSON objects. Apps can create, update, delete, and query these objects over a simple network api. Currently the database is loaded from test data files and changes are not persisted to disk, but this will change in the future. The database powers the Music and People (contacts) apps, and eventually almost everything in the system. The central core now also has an audio service which can play MP3s off of disk. This service is implemented with existing Rust libs, since rewriting an MP3 implementation from scratch is way beyond the scope of IdealOS.

The window manager and mouse/keyboard input are also in Rust now. While the Mac version still uses SDL2 underneath, the Linux implementation directly uses the the Linux Frame Buffer API and mouse & keyboard input APIs. This gives IdealOS the lowest level access to the kernel, which will give us more flexibility and power on the Raspberry Pi in the future.

On the client side, apps are still in JS, but they are now using a new UI toolkit called Thneed-Gfx that actual works, for a change. It’s a very traditional style of tree-based UI toolkit, think Swing or UIKit, but simple to understand and extremely portable. It’s also built in TypeScript so maintenance should be easier. It uses a hacky bitmap font of my own design, and a central theming system that's ugly as sin, but it works! It can also run on the web in any HTML Canvas impl, making it useful for other things and easier to test.

What does it look like?

Here ya go. Runs on Mac in a window or Linux full screen. That photo is on my Raspberry Pi 400 with the official keyboard and mouse.

It’s very ugly and very slow. Very slow. That’s because all IPC is local web sockets. When an app wants to draw it sends a drawing command to the central, encoded as JSON, over a web socket. Central parses it, determines where the message should go, re-encodes back to JSON, and sends it to the window manager process, which parses it again, draws to the window’s back buffer, then actually redraws the screen. Whew. That's a lot. One day this will all be done lightning fast through shared GPU memory, but for now it’s good enough. Make it work first, then make it right, then make it fast. We’ll get there.

All the source is on GitHub in the Clogwench repo.

What’s next?

My plan is to continue working from this base. The core is Rust and apps are in Typescript. No more complete rewrites. The next few tasks are:

fix the central build so you can make and run the whole system with a single command.
move apps to a separate pure JS repo, also with a single build script.
heavy work on the multi-line text editor component. Editing text is such a fundamental part of an OS that it has to be good.
improve the database to actually persist to disk and have higher level semantics like versioning, security, and domains.

I hope you had a great summer. It’s time to get back to school.

Life Moves Fast

Tue, 02 Aug 2022 23:19:07 +0000

It’s been a while since I’ve talked tech and much longer since I’ve talked about anything personal. There's a lot of updates to share, so buckle up.

The last two years

In mid 2020 I was laid off from Mozilla along with the rest of the research division. I obviously thought this was a poor decision on senior managment's part, but that's a topic for another day. About 6 months later my wife and I separated and I moved to a nearby apartment where my son lives with me half time. About six months after that my division at Lyft was acquired by Toyota, and then shortly before Christmas of 2021 my mother passed away.

Needless to say, I've had a lot of change and turmoil in my personal life in the last two years. Fortunately I’ve been working with a good therapist and I have a very supportive family.

After my mother's memorial service in April I realized I need to quit my job and take some time off. I've been on a turbo jet treadmill for the past few years. I need time to really recover and refresh my brain and soul. If I keep going at this pace I'll wind up with a nervous breakdown or worse.

At the end of May I gave my notice at Toyota. While I'll miss my co-workers it was definitely the right decision. I've spent the last two months doing things with family, reading for pleasure, and taking a whole lot of naps. It's really amazing how fuzzy the human brain can be without enough sleep.

Right now I'm focusing on friends and family. With my mom gone my dad really needs me. My niece and nephew need me as well. I don’t know what my future holds but I know that this is what I should be doing right now. I am where I should be.

Two months probably wasn’t enough to fully heal but it’s been a good start. For the first time in years I’m feeling optimistic about the future. I got to reconnect with family I haven seen I years. I’ve been learning about topics over never had the time to study , like robotics and machine learning. And I’m actually feeling creative again.

New Job

Yesterday I started a new job. I am now a senior engineering manager at Markforged, the makers of some seriously amazing industrial 3D printers. Below is possibly the strongest Benchy ever, printed in nylon reinforced with carbon fiber.

I’m excited to be working for a non-California company for the first time. Markforged is based in Boston. In fact, I’m visiting there next week. If any of you are in the Boston area let me know. We can grab a drink.

I’m going to start blogging soon about my robotics and machine learning work, probably with some 3d printing thrown in. I might even start a YouTube channel. The focus will follow much of my previous personal research on how to enhance human intellect and cognition, but this time I’m gonna do it with some different tools it’s gonna be fun.

Next Steps

First: don’t forget your mental health. Even if you didn’t lose a family member recently, the last two years has been hard on all of us. It’s OK to not be 100%. We really can’t do everything. Take some time for yourself. Relax. Maximum productivity is not the right goal. Nothing matters more than friends and family. Life is made of people.

I hope you are keeping cool this summer.

Why are Browser Engine Monocultures a bad thing?

Thu, 14 Apr 2022 22:33:07 +0000

It is often mentioned in Hacker News comments and the Twitters that it’s a tragedy that the web ecosystem is now dependent one only three renderers: Chromium, WebKit, and Gecko. Every time a new browser is announced I see comments like:

"The world needs less Chrome-clones and Google-backed browsers (which to a certain extent includes Firefox), and more independent ones." [link]

and

"The web really needs a truly cleansheet, non-invasive, open source browser written in a modern language. It’s disheartening to see the consolidation in the web space on Chrome. We need an alternative. There has to be someone who can counter Chrome’s dominance with a modern, fast browser engine. [link]

I understand this sentiment but I want to challenge it. No non-programmer cares what libraries make up their favorite browser. They don’t care whose JPEG decoder it is inside, or which CSS parser. Why does it make a difference which large browser engine is inside either?

All three major browser engines are open source. Anyone can see how it works, submit patches, or make a fork. Yes, some companies have undue power over browsers and web standards, but that is due to OS lock-in and market share, not because of the internal rendering library. Adding a fourth browser engine wouldn’t change that.

So I ask you:

Why do you care that the browser engine landscape is becoming a mono-culture? Why is a browser engine mono-culture a bad thing? And if the real issue is browser product marketshare dominance, then wouldn’t reusing an existing browser engine be a better choice than using something else?

Gameboy Emulator Progress

Wed, 23 Feb 2022 18:40:39 +0000

After a few weeks of work I’ve been able to get Tetris to boot and play. I can also run Dr Mario to enter the play screen but all of the pieces are hidden for some reason.

I’m not going to walk through all of the code I’ve written (it would be both long and boring) but I do want to touch on a few things I’ve learned, suggestions for using Rust better, and tips that will make the process easier for those who come after me. (full code in the repo)

Making an emulator is hard because there are so many pieces that must work together perfectly. If one piece is wrong you may see a failure in an entirely different section of the emulator, making it almost impossible to debug. To address this you need three things:, unit tests, test data, and a debugger.

Unit Tests

I knew building the CPU would be hard, so I started making unit tests. Load up two bytes of memory with an instruction, execute the instruction, check that the registers hold the right values. This worked pretty well for the first few instructions but started to become incredibly repetitive. The GameBoy’s Z80 derived CPU has over 500 op codes, that’s a lot of unit tests, and many of them are very similar. After ten tests I could tell this wouldn’t scale so I took a different approach: building a DSL using Rust enums.

Most instructions are in the pattern of Load, source, destination, or math operation, source, destination. There are many possible combinations, but all of the sources and destinations always a register, an immediate value (meaning it’s in memory right after the instruction), or somewhere in memory pointed to by a value in a register. Knowing this pattern I created a series of Rust enums for the different possible sources, destinations, and operations, which I can assemble into every pattern I need, almost like a DSL (domain specific language).

Let’s look at an example. Op code 6A loads the contents of register D into register C. I started with a match statement of every opcode.

0x006A => {
   let v = cpu.get_reg_D()
     cpu.set_reg_L(v)
   cpu.pc += 1;
}

This is easy to understand but when the next instruction comes along that is the same, but with register B instead of D, we will start to see a lot of repetitive code. Next I created enums for the different registers so I could load them like this:

0x006A => {
  let v = cpu.get_reg(Reg8::D())
  cpu.set_reg(Reg8::L(),v);
  cpu.pc += 1;
}

This works but doesn’t really shorten things and doesn’t help me when I come to an instruction that wants an immediate value instead of a register. Eventually I realized that any 8 bit source and destination was interchangeable, so I could use a register enum or one representing an immediate 8 bit value like this:

0x006A => {
  let v = get_source(Source::SrcR8(Imm8())
  cpu.set_reg(Dest::DstR8(Reg8:L());
  cpu.pc += 1;
}

Since most load operations take the same number of cycles and move the PC (program counter) the same amount, I could condense it all into a DSL like this:

op_table.load8(0x6A, DstR8(L), SrcR8(D));
op_table.load8(0x6B, DstR8(L), SrcR8(E));
op_table.load8(0x16, DstR8(D), Im8());

Each call to load8 creates an entry in the op table with all of the information needed to perform each instruction. The actual code is then a match on the enums instead of opcodes, of which there are far fewer, so the code is far smaller.

The implementation for all load ops now looks like this.

match op {
  Load8(dst,src) => {
   let val = src.get_value();
   dst.set_value(val);
  }
  cpu.pc += op.len
}

A side benefit of using enums for everything is that we can add extra methods on the enums for debugging and pretty printing assembly code, which is critical for the next part:

You’ll have to write a debugger

Yep. It's true. As you debug your emulator you’ll find yourself trying to print each step the CPU goes through to find the errors. Load instruction, Load memory, Add, store memory, etc. Over and over again. Eventually it will be too much for println debugging. You’ll have to write a debugger. Fun!.

I created an interactive debugger using the console Rust crate. It lets you write command line programs that take single key text input. I built a simple program to wait for a keystroke, then execute the next instruction and print out the current registers. If I press certain keystrokes it will execute 16 instructions ahead, or an unlimited number unit it hits the next VBlank (vertical blank).

As development went along I ended up adding more and more features to the debugger, including dumping VRAM to a PNG so I could see sprites, viewing the current status of the hardware registers, and running until particular interrupts are hit. Without a debugger it would have taken me many times longer to even get Tetris working.

Test ROMs

Once you have a debugger and your basic emulator up and running you’ll want to try running actual gameboy programs. I do not recommend trying to run a full game like Pokemon or Zelda. They are too big and you’ll never figure out where it fails. Remember you are emulating a CPU that does hundreds of thousands of operations a second. Full games are too much. Instead start with Tetris or Dr Mario since they were some of the simplest and earliest games written and don’t use memory bank switching or complex interrupts. Even better, try some test ROMs specifically designed for verifying your emulator works.

I started with the CPU instruction ROMs by Blargg The main rom requires memory banking, but the individual tests do not. Each test rom will print results to the screen, and also to the serial port, so it’s easy to see where it fails. Even once you get Tetris running these roms are still helpful because they exhaustively test every op code, even the ones that are rarely used.

That’s it for today. Have an 8bit week!

Starting a Gameboy Emulator

Mon, 24 Jan 2022 17:19:16 +0000

A week or so ago I ran across a video called The Ultimate Gameboy Talk, and indeed it was. Inspired by the simplicity and elegance of the original Gameboy, I decided to try my hand at building an emulator. A week later I have this:

Why

I'm fully aware that my obsession with this project is a manifestation of my own mental instability. You see in my day job I work on software, but nothing is in my control. I can't control the deadlines. I can't control my staff. I have to go with what happens and constantly fight fires. It's okay, that's why I'm paid to do my job, but it is no way relaxing.

Building a gameboy emulator is the exact opposite. The problem is constrained. The hardware is fully documented, there are no more games being made. The specs won't change half way through. It's sort of like a jigsaw puzzle. I *know* that every piece has a final resting place. That's why I find it simulatenously stimulating and relaxing.

Rust

For this project I'm using Rust. There already exist several Rust Gameboy emulators, but that's not the point. This will force me to learn how to use Rust better. To learn how low level bit twiddling coding is done. Programming languages are tools. The more time you invest in the tool the better you can use it. Since I think Rust is one of the most important languages of the last 20 years, I want to invest a lot.

Over the next few weeks I'll keep adding more until I can at least play Tetris. I'm a long way from that point. I can barely execute a simple test rom, as you can see above. Even that little bit was an achievement. I had to build over 200 Z80 opcodes and learn about memory access and bank switching just to get to this point. I think I have a pretty good handle on it now, though, and IntelliJ's Rust refactoring tools are far better than I expected. Hopefully visual progress will accelerate soon.

Roku IDK on Mac

Sat, 20 Nov 2021 03:25:35 +0000

I have long loved my series of TV streaming Roku devices. The UI isn’t as fancy as the Apple TV, but it’s very stable, very responsive, and far easier to use than Apple’s insane remote. Historically the Roku SDK was really only targeted at streaming TV apps (which makes sense), but there wasn’t a way to write games or other high performance apps using anything but Roku’s own BrightScript SDK. However, about a month ago Roku announced the Independent Developer Kit that lets anyone build and side load apps onto their own Roku’s using C++ and OpenGL. I was thrilled but bummed that you have to run the SDK on Linux. However, it’s just some command line scripts, so maybe we could run it in a Docker container. After a little experimenation I figured out how. Let's dig in.

Setup Docker

First install Docker for Mac

Next, from the terminal on your Mac, create a Linux Ubuntu container image.

docker run -it —name ubuntu ubuntu:xenial bash

Now you should have Ubuntu Linux running inside of a Docker container, and a shell into this container as root.

From within the linux container, install the standard GCC build chain

apt install build-essential

Setup the IDK

Back on your host macOS terminal, download the IDK and then copy the tarball into your linux container.

docker cp ~/Downloads/10.5.517-IDK.tar ubuntu:/home

back in the container shell, uncompress the tarball

tar -xvf 10.5.517-IDK.tar

go to the gles2 example

cd /home/IDK/samples/gles2

and build it

make

Now it will compile the test app and create the file:

idk_sample_gles2.squashfs.bin

Copy this file back to macOS by typing this into a terminal on the mac side

docker cp ubuntu:/home/10.5.517-IDK/samples/gles2/idk_sample_gles2.squashfs.bin .

Now you’ve got a compiled app and you just need to install in on your Roku.

Roku Developer Mode

Now enable developer mode on your Roku following these instructions. As part of the setup it will give you a URL on your local network, and username. Write these down. Set a password as part of the setup, and write that down too. The setup process reboot your Roku.

After the reboot navigate to that URL (something like http://10.0.0.121/)in your browser and put in the user/pass. You are now at the app install page. Upload the binary and click install.

Enjoy!

Done. If you have a supported device the app will launch. Otherwise it will tell you that “IDK Applications are not supported on this device.” I thought my device was supported because it’s under the no longer manufactured but still can update to the latest OS table. However you need to look for the IDK support row, and you’ll see its only some of the current devices. I have a Roku Ultra and there have been several versions with that name but mine doesn’t support the IDK. I suspect it has to do with the particular ARM chip inside, which has clearly changed over time. In any case, I ordered a new 4k streaming stick + (2021 edition) for 30$ so in a few days i’ll be able to test my apps.

I'm looking forward to messing around with the SDK and building a snow flake simulator for my annual Christmas Countdown clock. Ciao!