Josh On DesignArt, Design, and Usability for Software EngineersWed Jan 21 2015 20:09:18 GMT+0000 (UTC)\nWhy You *Can* Build a Smartphone.In what was by far my most popular post of 2013, <a href=''>Why You Can’t Build A Smartphone</a>, I explained why building a new smartphone platform was futile. Today, like any good author, I’m going <i>completely contradict</i> myself. Yes, it <b>is</b> possible to create a new smartphone platform. You just have to follow a few constraints. <p> <a href=''>Recent coverage</a> of Google’s Project Ara modular smartphone made me think back to my webOS days. </p> <p> Oh, we were so young and naive, thinking we could make a dent into the coming mobile platform duopoly (sorry MS). Palm, of course, had Handspring in it’s history. <a href=''>The Visor</a> was the original modular mobile device with a Gameboy like swappable hardware port. </p> <p> While at Palm (before we were HP) I pushed the idea of bringing back swappable hardware, though as more of a standard dock connector with kung-fu grip. Unfortunately it was infeasible when trying to compete in mass market carrier stores. The last thing the carriers wanted was more SKUs to manage when they were already killing us with the Droid. Customization would have come at the software level. A true hardware modular phone would be DOA. </p> <p> All of that said, I think creating a new mobile platform, even one with modular hardware, is very doable today. There’s just a few constraints. It’s true you can’t create a new successful mainstream smartphone platform but if you are willing to compromise on a few things, there is plenty of room for new entrants. </p> <p> </p> <h3 id="id95478">What is a Smartphone Platform?</h3> <p> </p> <p> First let’s define our terms. A smartphone platform is: </p> <p> </p> <ul><li>Something that makes phone calls. I’m actually willing to fudge on this one now that Skype is everywhere. Let's just say, something with a SIM card.</li> <li>It has a cellphone contract and is sold in carrier stores. That means dealing with carrier sales guys.</li> <li>Has mass market pricing. No one will buy a $2000 phone. It’s got to be less than $500 (w/o subsidies for a base model).</li> <li>Sells millions of units. High volume is how you can make a phone with mass market pricing.</li> <li>Has a complete app store with all of the apps most people need. This is a struggle even for Microsoft.</li> </ul> <p> Nailing all of these is <b>required</b> to be a successful smartphone platform today. As you can imagine, this is practically impossible to do from scratch. Smartphones are now a rich man’s game. You must be prepared to spend upwards of a few billion dollars a year just to have a seat at the table. Only a crazy person with too much cash would try it. Come on <a href=''>Larry!</a> </p> <p> </p> <h3 id="id79994">Compromise</h3> <p> However, smartphones ain’t what they used to be. Android, the open source project, (AOSP) is a good core with excellent driver support from chipset vendors. The flood of cheap Chinese phones means there are a bunch of factories who would love to make a device for you. Factories that can handle orders smaller than a million per run. </p> <p> All of this means that if you are willing to compromise on one or more of the points above you can make money. It won’t be a ‘smartphone platform’ in the traditional sense and you won’t make billions, but you can still be profitable. Success doesn’t have to mean taking a 10% share of the global market. There are other ways to make money. </p> <p> The key is to <i>not</i> build a smartphone, but rather a device built <i>with smartphone components</i>. It may still effectively be a smartphone (just as smartphones are effectively handheld computers); but don’t <b>call it</b> a smartphone. There are lots of markets underserved by current smartphones. </p> <p> Here’s a few approaches: </p> <p> </p> <ul><li>Fork Android. Create a custom skin, replace the Google services, and build your own app store. This is the approach Amazon and Xiaomi have taken and it’s worked pretty well for them. (if we ignore the Fire Phone). They each have their own market with satisfied customers. Neither sells in a carrier store or has contracts. Forking Android is still a lot of work, but it’s perfectly viable for some markets and getting cheaper every day.</li> </ul> <p> </p> <ul><li>Crazy hardware. Build a phone with a gigantic hi-res screen. Or a tiny projector in the side. Or medical sensors. You won’t sell millions, but there are underserved markets that are prepared to pay a lot more than a typical phone for these features.</li> </ul> <p> </p> <ul><li>Cheap hardware. Build a no frills device with hardware from two years ago. Moore’s law gives you an incredibly steep discount on components. You can now build a flagship from a few years ago for under 100$, or even as low as 30$. Performance won’t be great but depending on the audience it will be good enough for many uses.</li> </ul> <p> </p> <ul><li>Dedicate yourself to an underserved app market like quality educational software. Many of the educational devices you’d find at a typical Toys 'R Us take this approach. They are essentially large phones (or small tablets) which can’t make phone calls. They are completely skinned and come with their own app stores. The key is they aren’t *at all* in the same market as regular smartphones. They are replacing existing educational devices that have far fewer features.</li> </ul> <p> </p> <ul><li>Modular hot-swappable hardware. This brings us back to Project Ara. </li> </ul> <p> The big question is: who would want a modular phone? That is the wrong question to ask. The right question is: who would want a modular device built out of phone parts. I think the answer is: a lot of people. </p> <p> </p> <h3 id="id67168">Project Ara</h3> <p> Don’t think of Ara as mass customization. Very few people want an everyday phone with swappable parts. However a lot of people would like a custom <b>non-phone</b> device built on a production run of 1. </p> <p> Some things we could build with an Ara device: </p> <p> </p> <ul><li>a medical scanner that can target the particular disease you are fighting this week, and a different disease next week. In the field. In India. Where your only connectivity is a weak cellphone signal. And data tracking with a laptop would take too much power.</li> </ul> <p> </p> <ul><li>Inventory scanning. Those guys who stock the shelves at Walmart would love something like this. They can add the latest tracking technology by just mailing a small module to each store.</li> </ul> <p> </p> <ul><li>UPS package tracking. When the people in brown drop off your packages they scan it with what is essentially a smartphone with a custom screen and scanner. This device costs a lot more than Ara would. UPS would love it.</li> </ul> <p> </p> <ul><li>A phone with a real gamepad attached to it for serious gaming, then taken off when you go out to dinner.</li> </ul> <p> </p> <ul><li>A phone integrated into a GoPro for Xtreme Sporting.</li> </ul> <p> </p> <ul><li>A smart digital film camera for indie film makers. They want they cool software and connectivity of a smartphone, but with a real camera sensor and large swappable lenses.</li> </ul> <p> </p> <ul><li>The portable research lab. Today my phone is a digital microscope. Tomorrow it becomes a projector, then a media server. I would use this every day.</li> </ul> <p> </p> <ul><li>An Arduino breakout board. Now the flexibility and ease of programming from Arduino comes to your smart phone.</li> </ul> <p> </p> <h3 id="id13083">End Game</h3> <p> Ara isn’t a PC. Don’t think in terms of upgrading RAM or graphics cards. Those are red herrings, <a href=''>like communism</a>. </p> <p> The value of Ara is building something completely different out of smartphone parts. This is already happening with smaller runs of custom Android devices (in the 100k range). Ara will let you build a custom device with a <b>unit scale of one</b>. Ara is the democratization of smartphone technology taken to it’s inevitable conclusion. </p> <p> So you can take your mega-smartphone platforms to the bank. I’ve got a <a href=''>tricorder</a> to build. </p>\nIdeal OS Part III: User Attention Is SacredIn the first two (<a href=''>1</a>, <a href=''>2</a> ) installments of this essay I covered overall system design, the window manager, and applications. I talked about how the user will communicate with the system, but I haven’t discussed much about how the system communicates back to the user. This brings us to the next big problem of today’s operating systems: notifications and concentration. <p> <a href=' '>Jef Raskin</a> famously said: "User data is sacred". We can say the same about the user’s time and attention. </p> <p> The computer must never waste the user’s time. The computer must never break the user’s attention. While these rules are impossible to keep absolutely, they are excellent guidelines. If the computer must interrupt your current task, it better have a damn good reason to do so. And that brings us to notifications. </p> <p> </p> <p> </p> <h2 id="id86358">Notifications: The Big Bugaboo</h2> <p> </p> <p> Notifications have become the bane of my existence. I get notifications for everything: new emails, VIP emails, tweets, likes, +1s, stack overflows. Everything sends me a notification. Every app I install on my phone wants to notify me of their newest stuff. </p> <p> None of these notifications is really spam. Each source is providing me with genuine information. The problem is <i>timing</i>. Notifications are too frequent. They appear at times when I need to concentrate. They are too aggressive in their interruptions. </p> <p> They are also not <i>sorted</i>. While I eventually want to receive all notifications at certain times; I want only the "important" ones now. To solve this we really need to solve two problems: creating a way to notify me of information in less disruptive ways, and how to define <i>important</i>. </p> <p> Let’s look at other solutions. </p> <p> </p> <h3 id="id61277">iOS 7/8</h3> <p> Apple’s solution in iOS7/8 is an improvement from iOS6, but not great. You get this giant list of apps. For each app, on a separate screen, can you configure whether you want notifications at all, and where they should be displayed. On the lock screen? With sounds? All of this may be configured <i>for each application</i>. This is too much work so most people just accept the defaults. A system no one uses is less that worthless. I usually love Apple’s design, but iOS notifications are abysmal. I don’t know how it got past design review. </p> <p> One possible solution is to position the apps relative to colored lines. Everything below the red line may not notify you in any way beyond it’s app icon. Everything above the line can notify you with a dropdown notification that quickly fades away. Above the red line is a green line. Anything above the green can notify you even when do not disturb is turned on. These are only for the very important things. To adjust an app you simply drag it up and down. The list becomes a hierarchy of which apps do more urgent things. The position indicates it’s setting. This is far easier to manage than a per app settings screen. </p> <p> The problem with this plan is that an "app" is too granular. My email client receives both urgent and non-urgent messages yet it is treated as one "app". We need a more fine grained approach. </p> <p> One idea is for the app generating a notification to set the priority level, but this complicates the app config screen. Would you have one entry for 'email: low priority' and a second one for 'email: high priority? This will quickly become confusing. There has to be a better way. </p> <p> </p> <h3 id="id5203">Content based Prioritization</h3> <p> Manually managing notifications in a preferences app simply won’t scale. I can manage these individually but it’s death by a thousand cuts. We need to solve this problem as a whole, just like we did with memory management and security. We need content based analysis to determine the priority. </p> <p> Content based notifications are a complex and under-researched area, but it should be possible. Ten years ago we were drowning in spam but [Bayesian filters] have pretty much put an end to it. Something similar should be possible for prioritizing notifications. </p> <p> Setting priority should be based on the content of the message, where the app is coming from, and which person is trying to communicate with me. A message from my wife or boss is always higher priority than messages from Google or a recruiter. </p> <p> Obviously prioritizing would have to be done by a trusted part of your computer since it literally sees your entire life stream; but something like this will eventually have to be built. It might require training. Something like a star button to indicate if something is important so the system can learn for next time. Not trivial, but it <b>is</b> possible. Spam filters prove it. </p> <p> </p> <h3 id="id91464">Zen Mode</h3> <p> The other big problem with notifications is making sure they are not only important but arrive at the right time. Sometimes I am in communication mode. I don’t mind having lots of interruptions. I’m just doing email, chatting, paperwork, etc. Other times I’m in deep programming or research mode. I need several hours of uninterrupted time to do ‘real work’. </p> <p> The computer should have a ‘zen' mode which lets me work full screen with just a single app, or with the set of apps I need for that particular task. Apps can be notified of zen mode so they can modify their UI. Zen mode also blocks notifications except the highest level. The rest are collated so I can deal with them when I have free time. </p> <p> The important thing is that Zen mode is a system wide feature. All apps must respect it, and are in fact forced to respect it. An app can alert all it once, but in Zen mode the alert won’t actually be able to interrupt me. </p> <p> </p> <h3 id="id61769">Presence</h3> <p> The key to zen mode is that the computer needs to know a lot about you. This is called presence detection. It needs to know your history. It needs to know your current state. Are you at your desk? Did you really leave the office or just to get coffee. Are you concentrating hard or just messing around on Reddit? The computer should know this stuff. </p> <p> My desktop computer has access to my phone, plus it’s own sensors like a microphone and camera, but most OSes don’t use them. Google makes a phone that’s always listening for you to say ‘Hey Google’. Why can’t my desktop computer, which has far more computing power than a phone, do the same? </p> <p> Your computer monitors you across devices to establish the context. Context is whether you can be interrupted. Your mood. Your concentration level. Your health. Are you present reading or moving or typing? The computer should have the camera and microphone on all the time. It recognizes when you are on the network, and others around you. It knows what devices you have with you. Context detection and management must be a central part of the operating system. </p> <p> Presence should be just another data service event. It should use all inputs available to the computer, including networked resources. Use the phone and camera. If I’m at my computer all notifications come to the computer. Project the screen too, if needed. VNC is a built in protocol supported by all apps. Imagine if Apple’s AirPlay worked with all devices and apps, not just the blessed few. Apple actually built this with their Continuity system in Yosemite. If only it wasn’t so buggy that I could actually use it. </p> <p> As a side note, why can’t I use my phone as a peripheral device for my desktop? I have my phone; why can’t I use it like a webcam so I can set it up on a tripod or microscope stand and use it just like a directly connected webcam in any application on my computer. </p> <p> Again, it must be said, the security and privacy implications of this are huge. Anonymized and secure database algorithms are still <a href=' '>an open area of research</a>. One key will be keeping calculations as local as possible. The farther away your data gets the easier it is to steal. </p> <p> </p> <p> </p> <h2 id="id47486">Real Customization</h2> <p> Now on to a new topic. What if our computers could really be customized. I mean <b>really</b> customized, not just changing the theme or setting keybindings (though having system wide editable keybindings would be wonderful. If there from the beginning all apps would use it. But on to real customization.) </p> <p> Customization isn’t just tweaking defaults. It’s about reconfiguring the computer to your needs for the particular task you are doing. With something like a RaspberryPI, the cost is so low I might have several of these running clones of each other. I should be able to do the following with just a few lines of code, or ideally a single visual diagram without anything we could today call ‘coding’. </p> <p> </p> <ul><li>Become a digital picture frame for (Facebook stream | flicker | directory of photos)</li> <li>Edit a document with live code snippets from the Internet, and add microscope photos directly from my smartphone camera. </li> <li>Turn a smartphone into a time-lapse computer. As photos come in they are added to a stream on the desktop.</li> <li>Make my computer play a sound when I press a button on my watch.</li> <li>Set up a 3D text viewer with effects that shows the latest tweet and plays music at a party.</li> <li>Turn a laptop into a photo booth. Full screen interface, choose effect, snaps a photo when you press a physical button, prints out immediately.</li> <li>Use a game pad to move a pan-tilt zoom camera with the computer hooking them together.</li> <li>Build a party mp3 player that lets you advance song with a button and control volume with a knob.</li> <li>Browse a list of DVDs held in a library.</li> <li>Project white noise into a speaker in another specific room.</li> </ul> <p> </p> <p> All of the above cases are really just snapping a few components together, yet today this would be extremely challenging for most people to build. Even for a professional engineer like myself I’d spend a lot of time dealing with interoperability issues like data formats and app permissions. It shouldn’t be this hard. Computers were supposed to be a general purpose tool. Why aren’t they accessible to everyone? </p> <p> </p> <h3 id="id78113">Everyone should code.</h3> <p> This flows into my belief that everyone should learn to code. I’m not saying we all need to be programmers, or that we all need to even learn to use a programming language, but everyone should be comfortable with computational thinking. Everyone should understand what an algorithm is and how to make a device do something for you through instruction with loops and conditionals. </p> <p> It does sound a bit crazy to say that everyone should learn to think computationally, but it was crazy to say everyone should learn how to read or do arithmetic just a few hundred years ago. This really is a topic for it’s own full essay. (one day…) For today let’s just say that you should be able to reconfigure your computer with simple tools. </p> <p> </p> <p> </p> <h2 id="id37543">A few notes on implementation</h2> <p> This essay series isn’t really about implementations. The user experience is what matters, not how it’s built; but tools and formats <b>do</b> matter, especially if you want a long term computing platform. So, a few things must be said. </p> <p> </p> <h3 id="id66192">Theming</h3> <p> Look and feel theming. When people say the look and feel they usually mean the <i>look</i>, even though the feel is far more important. The feel must be carefully designed, and then carefully customized for the user in very specific and limited ways. Applications should have no control over this. The user (and their agent, the OS, not apps) are the final arbiters. </p> <p> On the look side we really do mean theming. I guess it’s okay, but i’d prefer not to support theming, at least not with a public API. In theory it’s great, but in practice I have never seen a 3rd party theme as good as the default from the OS vendor. Perhaps it’s a market failure. Perhaps good designers have better things to do with their time than build free themes. Perhaps it’s hard to judge the quality of a theme from a screenshot. Perhaps end users interested in theming have no taste. I don’t know. But it’s definitely a low priority. </p> <p> </p> <h3 id="id70459">Device Drivers:</h3> <p> In short, I hate them. They break all the time, and in the computers most vunerable spots. The kernel. Ideally access to the hardware would be completely virtualized. OS provided hooks connect the real hardware to the ‘device drivers’, where are isolated user-level modules that understand the details of that particular hardware. If a driver crashes it crashes, and the service it was providing to the user will stop, but the whole OS won’t come down. </p> <p> We’ve invented amazing hardware virtualization technology for the server room. Let’s bring that back to our desktops. With modern GPUs even the screen should be virtualizable without a significant speed hit. </p> <p> </p> <h3 id="id16687">Message bus format</h3> <p> In general the ideal OS would be based on a simple messaging system. And I do mean simple. Simple. Simple! Text or a text equivalent like JSON. It <b>must</b> have multiple implementations. It <b>must</b> be language agnostic. Just messages, not remote objects. We aren’t rebuilding CORBA here. Something more like REST. It doesn’t have to be high perf. This is for system components to communicate at slightly faster than human speeds. When they need to do bulk data transfers they can use other existing systems like files and shared memory. </p> <p> </p> <h2 id="id50796">In Memoriam</h2> <p> Throughout this essay series I’ve highlighted the way our desktop operating systems, which are destined to be used by at least 10% of us, are horrible. While amazing technically, they fail us at every step. We still don’t have useful features that were designed in the 1970s, and we still have technical underpinnings that <b>were</b> designed in the 1970s. Seriously. We can do better. </p> <p> I really hope in 300 years, when we are all computing on the Starship Enterprise, we won’t still struggle to combine applications and deal with segfaults. Let’s fix it in this century. </p>\nIdeal OS Part II: The User InterfaceIn the future touch interfaces will take over most computing tasks but 10% of people will still need ‘full general purpose computers’. We can’t let the interface stagnate. This white paper represents a decade of my thinking on what is wrong with desktop style operating systems (WIMP) style and proposed solutions. PCs are not obsolete. They just need improvements to become ‘workstations’ again. <p> <a href=''>Last time</a> I gave you an overview of what an Operating System would look like if we took away all the bad parts, leaving not too much left, and start building replacements. But what would these new parts look like? How would you start programs and manage windows? Without a filesystem how would the desktop folders work? For the answers to these questions and just so much more, keep reading. </p> <p> </p> <p> </p> <h3 id="id93828">Desktop Folders</h3> <p> Since the filesystem now becomes a database, finding your files would be done through queries. Creating a folder is essentially creating a new saved query. A list of all audio files marked as being songs. A list of all code files marked as being part of a particular project. Folder contents can be readonly queries based only on the attributes of documents (like the list of songs in an album) or they could be adhoc where the user drags files into the folder. In this case a file receives a tag referring to that folder, thus simulating the old kind of folder. However, unlike traditional directories, however, a file can be in any number of folders at once. </p> <p> </p> <h3 id="id39184">Command line</h3> <p> The new OS should have a command line. Part of the magic of Unix is being able to pipe simple commands together at the shell. We still need that, but the pipes would carry streams of simple objects instead of bytes. How much better would the ImageMagick operators be if they could stream proper metadata? Building new commands that talk with the old ones would be trivial. </p> <p> With a single command line you could do complex operations like: find all photos taken in the last four years within 50 miles of Yosemite, and that have a star rating of 3 or higher, resize them to be 1000px on the longest size, then upload them to a new Flickr album called “Best of Yosemite”, and link to the album on Facebook. This could all be done with built in tools; no custom coding required. Just combining a few primitives on the command line. </p> <p> Of course a traditional command line is still difficult to use for novice users. Even with training you need to memorize a lot of commands. A better solution is a hybrid. In the short (default) form you can chain commands with pipes using auto-completion to help remember commands and arguments. In the expanded form you can visually chain commands together with a visual drawing-like tool. This would be similar to OSX's Automator. Switching between the modes is always possible. </p> <p> </p> <p> </p> <h3 id="id85962">Windows</h3> <p> Windows are still a good thing. Sometimes you need to resize them to see multiple things at once. However, they could be a lot more powerful and flexible than they are today. </p> <p> First, every window should be a tab. <b>Every window</b>. Snap any window to any other window, just like Chrome tabs. Who cares if the tabs aren’t from the same application. We don’t have applications anymore anyways. If the user wants to snap a todo list window onto an email window, <b>let them</b>. User desires trump ancient technical architecture. </p> <p> Second, windows should be pluggable. We’ve covered how an email app is really multiple pieces, including an inbox view and a message view. Sometimes you might want those views connected, as with a traditional email client. Other times you might want them separate. This should be as trivial as snapping them together or dragging them apart. </p> <p> For more complex layouts the system should have patterns like ‘master view’, and ‘vertical accordion’. The user can pull out a new empty pattern then drop the views where they like. We already do similar things in Wordpress and other web editors. Let’s make it universal. </p> <p> At first this might seem to come with some challenges. What if the user accidentally creates multiple inbox views? Well, so what? If I want an inbox on each screen of my computer, I can do that. Maybe I want an inbox that just shows work email on my first screen, and personal email on my second. Maybe I want an inbox that just shows emails around a particular project, or from a particular person. These are all just different database queries so why not? If I want to setup my windows that way I should be able to do so. The computer must adapt to how the human works, not the other way around. </p> <p> </p> <p> </p> <h3 id="id4041">The Window Manager</h3> <p> Now the window manager itself. A WM has a bunch of duties. It must render windows (obviously). It must let you move and resize them. It must handle notifications. It must manage the graphics card. It (usually) implements transparency and special effects. It shows dialog boxes. It starts and stops applications. There really are a lot of things required of a modern window manager. </p> <p> Because of this complexity, many operating systems divide this task up in various ways. Some move app launching to a separate launcher system but still close apps with the window manager. Some split the drawing of windows from the moving. Some put notifications in a separate process. All of these are good ideas, but they don’t take things far enough. For the IdealOS we should explicitly chop the window manager into different pieces. When I say <i>explicitly</i> I mean actual separate processes that communicate with fully documented APIs. Documented means hackable, and hackable means we can start extending the desktop in interesting ways. </p> <p> </p> <h3 id="id21673">Proposed Window Manager Architecture</h3> <p> <b>Compositor</b>: Individual applications draw to an offscreen buffer or directly to a texture in the OpenGL context. The compositor draws these buffers and textures to the real screen. Since only the compositor has access to the real screen, only the compositor can do interesting effects like MacOSX’s Expose system. For hackability, the compositor must expose a (protected) API to manipulate windows and apply shader effects. </p> <p> <b>Window Manager</b>: This draws the actual window controls and handles moving and resizing. it should be easily swappable to support theming and playing around with interaction ideas. The window manager is also in charge of deciding where new windows go when they are created. We’ll get back to this in a second. </p> <p> <b>Launcher</b>: This is an interface for launching apps. It is a separate program but it still starts apps using the app service. It has no extra privileges. Any other program could launch an app just the same. This means we have multiple launchers at the same time. ex: a big dock bar and global search field (like the new spotlight system on OS X Yosemite). </p> <p> <b>Notification Manager</b>: The actual processing of notifications can happen in a separate notification manager, but creating the notifications on screen should happen in the window manager because it’s the thing which decides where windows go and when. We’ll cover the details of notifications it a moment. </p> <p> </p> <h2 id="id703">The Fun Begins</h2> <p> </p> <h3 id="id45881">Window Placement</h3> <p> What interesting things can we do with our new system? The first thing is to let the window manager be smarter about placing windows. <a href=''>xmonad</a> has some good ideas about tiling windows. When you are doing a lot of work it’s common to want multiple windows at once laid out without overlapping. Sometimes you might want a grid. Other times two columns. Those can be just a keypress away with an xmonad style window manager. </p> <p> The window manager should be smarter about placing dialogs. Ideally we wouldn’t have dialogs at all, but they are sometimes needed. The WM should be smart about placing them so they don't obscure the content. </p> <p> Current OSes have three strategies for window placement: attaching dialogs to the apps which opened them (save/open dialogs), center the dialog on the screen, or to simply not use dialogs. (90% of the iPad solution). A few WMs will take into account the available empty space on screen, but this is still very primitive. They consider the <i>size</i> of the new window but not it’s <i>content</i>. </p> <p> <a href=' '>This paper</a> by Ishak & Feiner called Content Aware Layout has some great ideas. </p> <p> If the window manage considers the contents of windows then it can be smart about placing new ones. If a background window has large blank spaces, then use that area for the new window, possibly with some transparency. </p> <p> When searching your operating system for a particular word you can search the contents of windows too. The window manager can zoom in just those windows. Even better, it can show just a subset of each window: the part containing the found word. Windows are just bitmaps. We can slice and dice them to do all sort of cool things. </p> <p> Here’s another example: When copying text from one document to another the WM could help the user maintain their mental state by showing the windows involved. Move to window A. Select and copy some text. Now move to window B. The WM knows you are in the middle of a copy and paste action, so it can shrink but not hide Window A. That way you are always aware of where your clipboard content came from. The WM can also show you the current clipboard contents in a floating window. </p> <p> This brings us to another horrible pain point of desktop operating systems: <b>the clipboard</b>. </p> <p> </p> <h2 id="id56697">Copy, Paste, and the Clipboard</h2> <p> </p> <p> Why should the humans have to remember what is stored in a hidden data structure called <i>the clipboard</i> (or <i>the pasteboard</i> for you old school mac-enzies). We should <b>make it visible</b> and relieve the human of this burden. This visible clipboard could also show previous copied contents. The clipboard should be a persistent infinitely long data structure, not just a single slot. </p> <p> Did you copy something the other day but can’t remember what or where? Just look in the clipboard’s history. Copying multiple things at once becomes trivial. Grab content from four different sources then paste them together into a new document. The clipboard isn’t a hidden box that holds only one chunk of data. Now it becomes a shelf or tray that holds the many things you are working with right now. Pick up what you need then place it all in the final destination. Gather and place, not copy and paste. </p> <p> </p> <h2 id="id74643">Working Sets</h2> <p> Finally, the window manager should implement working sets. A working set is a set of documents, resources, applications, or whatever that the human is currently using to do something. </p> <p> In the <b>ideal</b> world, when faced with a task like make "a presentation on Disneyland", the human would search through the library for some books, find some photos on the web, read a few articles, then distill all of this into a single document. When done, the human puts everything away, prints out the final document, and moves on to the next task. Very neat and orderly. </p> <p> Of course, in the <b>real</b> world that doesn’t happen. We build up a collection of notes over a few months. Probably a stack of books related to the problem. Over a few days we read the books and articles, collects the quotes and images, then put it on hold as other projects come up. You might be in the middle of writing when a phone call comes in, or a screaming child needs lunch, then finally come back to your office wondering what you were in the middle of. </p> <p> When the project is finally over the books and notes hang around the office until the annual cleanup. Real world work is messy and full of constant interruptions. Our tools should <i>accept this reality</i> and help, no hinder it. </p> <p> A good window manger would let you group windows by topic and help you focus on a single task when you need to focus. One way to do this is by having multiple virtual screens where each screen is dedicated to a particular project. The screen can contain not just windows of active documents, but also all of the research files collected for that project. It will contain all of the emails and chat windows related to that project and no others. </p> <p> A project screen is really a topic specific slice of everything on your computer across all applications and data types. Furthermore, such a screen can be saved and reloaded later; possibly months later. Remembering where you were in that open source project after a 6 week hiatus will be easy. Just load up the workspace and everything related to the project, even emails and github issues, will come up in a single screen. </p> <p> <a href=' '>This paper</a> by Keith Edwards has some great ideas on the topic. </p> <p> </p> <p> </p> <h2 id="id53596">Special Effects</h2> <p> By giving the window manager full control of the screen, combined with a good API, we can do amazing things on a modern GPU that were infeasible just a decade ago. After all, a window is simply a bitmap in GPU RAM. It can be manipulated and distorted like any other texture. We could make an area of the screen a black hole with all windows stretched as they approach it. We could render a window with icicles or dust on it to indicate how long it has been since the user interacted with it. </p> <p> Snowflakes and other particle effects are trivial to implement on the GPU but the real power will be in manipulating windows automatically for the human. A window that would be partly off the screen could be distorted hyperbolically instead. While the text would be squished it would still be readable enough for the user to get the gist of it. When the user wants that window they just click on it and it stretches back to normal size. </p> <p> How about window zooming? In a web browser I can zoom any page with + or - buttons. Any webbased app can do the same. I often have multiple sizes of text at once in different browser windows. What if this wasn’t restricted to just web pages? Any app should be able to respond to a zoom event to increase it’s base font size. If all layout and windows are derived from the base font (as they should be) then the app will zoom just like a webpage. for apps which don’t support zooming for whatever reason, the texture itself could be zoomed. While this would result in some blurriness, modern GPUs do a very good job of smoothing zoomed textures, and it won't be an issue at all with the new HiDPI screens just arriving on the market. </p> <p> </p> <h3 id="id90803">Distorting Input</h3> <p> You may have seen effects like those I've described on Linux desktops using Compviz, a compositing window manager for X windows. The effects look cool but they are largely useless because of a major flaw in the X windows design. The window manager can control the output of windows - the actual bitmaps - but it cannot control the input. No matter how the windows are distorted the apps themselves will still receive input events normally. This means clicking on what appears to be a scaled button may instead send the mouse event to another part of the window. To fix this problem both input and output must go through the window manager so that it can keep them in sync. </p> <p> The sad thing is: these problems with identified and solved <b>decades ago</b>. <a href=' '>This research paper</a> I helped with as my senior project in 1997 talks about the problem and solution. Window modification must apply to both input and output. </p> <p> </p> <p> </p> <h2 id="id2193">Could we really build this?</h2> <p> We have to write everything from scratch. By not being backward compatible with anything we can’t reuse existing programs. I think that’s okay, actually. iOS was built with all new apps too. Existing code doesn’t matter as much as we think. It’s the ideas and protocols that matter. That’s what we get to reuse. </p> <p> </p> <h3 id="id15630">Versioning.</h3> <p> How do we version modules? If your editor experience is actually the combination of 10 different modules working together, how are they upgraded? Do we have a fixed API between them that never changes? Could one module upgrade break the rest? Have we reinvented class path hell? This new OS design doesn’t fix these issues but it does make them explicit. We already have these problems today, but they are solved in adhoc, inconsistent ways. The new OS would make dependencies in the system explicit; forcing us to deal with them. I expect we’d end up with a system like Firefox where you have different channels to get the modules from depending on the amount of risk you are comfortable with. Probably with NPM like <a href=''>semantic versioning</a>. </p> <p> So could we really build this? Yes, I think we could. However, rather than trying to reinvent absolutely everything we should start with a bare Linux system; similar to Android but without the Java stuff. Then add a good lightweight document database (CouchDB?) and a hardware accelerated scene graph (Amino?). Then we need an object stream oriented programming system. I’d suggest Smalltalk or Self, but NodeJS is more widely supported and has tons of modules. Perhaps ZeroMQ as the IPC mechanism. </p> <p> The exact building blocks don’t matter very much as we will probably change them over time anyway. The important thing is that we build an OS with a cohesive vision and consistent metaphors. Let’s bring back the idea that the users and their data are the central parts of a computing environment, not applications and system plumbing. Let’s make machines for getting stuff done, not babysitting hardware. Let’s make work stations again. </p>\nIdeal OS: An Epic Tale of IrrationalityNote: Parts <a href=''>II</a> and <a href=''>III</a> are up now. <p> In the art world there is this idea of <a href=' '>anti-art</a>. The goal is to do all of the things backwards or wrong so that you can discover new rights. You have to tear down the world before you can build it again. I’m not entirely sure how it works, but they seem happy with it so I figured I’d give it a go with something that really needs shaking up: the <b>desktop operating system</b>. </p> <p> Beforwarned. This post is an epic. Not epic in a"'that movie was awesome" sort of way. It's epic in a "3000 stanza poem you had to read in english class" way. Just FYI. </p> <p> </p> <p> Since this is <b>my</b> blog I’ll start by getting rid of everything I <b>personally</b> hate about operating systems. </p> <p> </p> <h3 id="id11823">Fixed Font Sizes</h3> <p> I’m getting old. My eyes don’t work as well as they used to. I should be able to resize any window, or the entire OS, by hitting cmd-+. It works for the web, why not the desktop? </p> <p> </p> <h3 id="id98785">Filesystems</h3> <p> They crash. They have weird restrictions. They have a single hierarchy. They are slow to search. And the folder metaphor is sooo 1980s I feel like I need a Flock of Seagulls haircut (kids, ask your parents). Let’s just get rid of filesystems right away. In the bin. </p> <p> </p> <h3 id="id71182">Device Drivers</h3> <p> Device drivers are the crap software written by hardware companies. They fix just enough bugs to ship, then move on to the next project. Installing them is a pain. Updating them is a pain. Uninstalling them is nearly impossible. Device drivers need to just go. Seriously. Just go out the way you came in. </p> <p> </p> <h3 id="id55965">Backing up your computer</h3> <p> So annoying! Built in backup software sucks and 3rd party software sucks even more. Backups are slow, we forget to schedule them, and they magnify the real problem: the filesystem. First you’ve got a filesystem. Then you’ve got a backup of the filesystem. Now you’ve got two problems. Just go. </p> <p> </p> <h3 id="id42386">System utilities.</h3> <p> These are the redheaded step children of OS applications. Apple spends a lot of time polishing Mail. The system console? Not so much. Oh look, there’s a whole folder of these poor guys. Who knew? </p> <p> </p> <h3 id="id79622">Window managers</h3> <p> I sorta like Window Managers, but the existing ones kinda suck. I like application <b>windows</b>. It’s the <b>management</b> part I hate. All of that moving and resizing is annoying and error prone. If only I had some sort of automated tool to manage windows for me. We could call it a window manager. Nah, that’ll never work. </p> <p> </p> <p> </p> <h3 id="id97857">Inputs</h3> <p> I like the mouse and keyboard, but my computer has a camera and microphone too. Yet the OS itself doesn’t use it for interaction. No voice recognition. No visual gesture support. They should really just call the camera a Skype accessory since that’s all it gets used for. We should find a use for these or get rid of'em. The NSA wouldn’t be happy, but that’s life. </p> <p> </p> <h3 id="id65482">Applications:</h3> <p> This is the big one. I could learn to live with all of the other problems but applications are too big to skirt around. "Your application is so fat when it walks around the house it really walks around the house” (kids, ask your parents). </p> <p> Supposedly we buy computers to run applications, but that’s just not true. We buy computers to get some work done. In theory applications help us do that but in practice they often get in the way. </p> <p> Each application looks and works differently. Each has it’s own file format, or worse, unreliable internal database. They are supposed to work together but in reality they just share files on disk, and that only works half the time. Drag and drop would seem like a solution but it’s really just a short hand for sharing files. </p> <p> Each application is it’s own silo. A world unto itself. In the age of the app store this isolation is just getting worse. Every year another brick in the wall. (kids, ask your parents). Applications definitely have to go. </p> <p> </p> <h3 id="id24377">What's Left?</h3> <p> So, after throwing away applications, filesystems, device drivers, system utilities, and everything else; what does that leave us with? Not much. Just the kernel for managing CPU and memory, an empty bitmapped screen, and, um.. some memory buffers I guess. Now what? </p> <p> Well, if we were willing to add the filesystem back in we’d pretty much have Unix, or at least old Unix before X. A bunch of tiny tools, not applications, that connect together in different ways to get work done. Quite productive. The only problem is we have to time travel back to the 70s. Going with pure Unix means giving up graphics, typography, modern networking, cameras and microphones and mice. In other words, giving up all of the progress of the last 40 years. </p> <p> Surely there is some sort of middle ground? </p> <p> Yes! Yes, there is! We <b>can</b> rebuild the desktop computer. We have the technology. (kids, ask your... oh just Youtube it!). </p> <p> The creators of Unix had the right idea: little tools that you combine to perform different tasks. The problem is the interchange. Unix tools share data through a single stream of bytes. Great for 1970 but not enough for today. And because that wasn’t enough to do everything you’d want to do, they continued to hack ioctrls and control characters on top. When that didn’t work they added X-windows, a horrible abomination against heaven and nature that somehow crawled out of the primordial slime and refused to die when evolution gave up on it. (Kids, read a book). </p> <p> </p> <h3 id="id46332">Simple Tools</h3> <p> What we need are simple tools that communicate using a stream of small objects. They could be events or records or something else. I’d go with JSON, but that’s just me. The important thing is they must be structured and human parseable; higher level than pure byte streams. </p> <p> If object streams form the primary way our tools communicate, then how do they store things. We got rid of the filesystem, remember? </p> <p> How about a document database? Databases have gotten really good in the 40 years since SQL was invented. They can store structured documents, byte streams, and metadata. They can give you live queries. They can do full text search. Let’s finally use a system that allows a file appear in more than one folder at a time. Hierarchical filesystems in 2014 are madness. It’s time for something better. </p> <p> Now then. Armed with a database and streams of objects we can rebuild the world. Lets start with graphics. </p> <p> </p> <h2 id="id95574">Graphics</h2> <p> We are more than a decade into the 21st century. We can assume a proper GPU will be available. That means our base graphics layer can be a real hardware accelerated OpenGL scenegraph, not bitmaps layered with software. </p> <p> Which scenegraph? Who cares. There’s a ton to choose from or we can build a new one. The important thing is hardware acceleration is mandatory. <b>Mandatory</b>. Something with shaders. OpenGL ES 2.0 or higher. And I don't care about X-Windows network transparency. That's an idea that died along with Network Computers and SunRays. </p> <p> So we have three things: a database to hold stuff, object streams to send data around, and a GPU to draw it. Now we can start to build some tools. But we can’t just start building applications. Applications are bad, remember. Let’s think about tasks we want to complete, then talk about how to build the tools to accomplish those tasks. </p> <p> </p> <h2 id="id16270">Modular Email</h2> <p> Let’s start with something simple: checking your email. There are actually many pieces to this. First you have to set up your email. The computer needs to know the server where you receive email, any proxies or security layers (SSL,TLS, etc), and of course your username and password (assuming that is the authentication mechanism your email provide uses). Then the computer needs to speak the server’s protocol. Let’s keep it simple and assume we only support IMAP. Now the computer needs to download your email message and store local copies on disk (in our <b>real database</b>, not files) Then the computer must show the new messages to the user in some sort of sortable list. Finally, when the user selects an email message the computer must display it. </p> <p> So far so good. We know how an email is viewed. The problem is that in a traditional desktop operating system this is all done by a single application called the "email client" even though many of these tasks have little to do with one another. In our new OS we can split these out. </p> <p> </p> <ul><li>a setup wizard for capturing email account information.</li> <li>a service to check for new emails and download them into the database</li> <li>an inbox view which is just a saved database query rendered in a particular way (namely, ‘from' and ‘subject' fields from messages displayed in a standard sortable list view)</li> <li>and finally an message view which can display a single message.</li> </ul> <p> By splitting up these functions into separate modules we make it a lot easier to write email support for our new OS. It also means we can change and add functionality in one part of the system without modifying others. Let’s consider a few additions: </p> <p> A <b>spam filter</b> doesn’t need to know about how emails are viewed or downloaded into the system. It just needs a list of new messages. In other words: a saved database query. Whenever a new message comes the filter analyzes it and adds a <code>spam</code> tag if it’s spam. Then the inbox viewer can display spam messages in a different color, or under a spam folder, or not display them at all. Separating out functions makes everything easier. </p> <p> Adding a <b>new email</b> service is simple. We need a new wizard to create the "account" document stored in the database. When the email checker runs it looks for all account documents then connects to each one looking for new message to download. If the new mail service uses a different API (say, GMail’s new API), then we add a new email checker implementation. The rest of the email modules don’t have to change at all. They don't care. Proper separation of concerns. Brooks was right. </p> <p> <b>Trigger rules</b>: Want to control your music by email? Add a filter that looks for messages with a subject like: “play music, beatles random”. When it finds such an email it marks it as deleted then plays the music. Nothing else in the system changes. </p> <p> By splitting the email task into separate modules the system becomes more flexible and hackable. A trigger rule can be written in any programming language. It can have an interface or be headless. As long as it can talk to the database it can do it’s thing. Modules with standard communication gives us flexibility and power. </p> <p> </p> <h2 id="id24995">The Other Apps</h2> <p> Imagine if we apply this modularity principle to everything else on your computer. The applications I use every day include iTunes, Evernote, Mail, Things (a todo list), Contacts, Calendar, Messages, iPhoto, Coda (a code editor), and Safari. Except for Safari, everything else is essentially an editable view into a custom database. They could all be broken up into modules that combine in different ways to replace the existing functions, and let us do so much more. </p> <p> <b>iTunes</b>: becomes a few sortable list views of database queries, a music playback service which supports local and remote resources, and a visualizer. All separate and swappable. iTunes is pretty much a galaxy unto itself these days and is in danger of collapsing into a black hole. Time to split it out. </p> <p> <b>Evernote</b>: a text editor and a db query, plus a background service to sync with Evernote’s servers. </p> <p> <b>Mail</b>: As discussed before, many different modules. </p> <p> <b>Things</b>: a tiny text editor (single line) and a DB query of tiny documents. </p> <p> <b>Contacts</b>: a view into the database of address entries, plus a syncing service. </p> <p> <b>Calendar</b>: a view into the database of event entries, plus a syncing service. </p> <p> <b>Message</b>: background modules to connect to various chat services. Conversations become a view into the database for chat events. </p> <p> <b>iPhoto</b>: again a view into the database, but this time with an image service to perform fast scaling and an view to edit photos. Syncing to iCloud, Facebook, and Flickr become more background services. </p> <p> <b>Coda</b>: A fancy text editor. It would be the least changed under a new OS, but could benefit from file syncing (replaces SFTP client), and easily managing editor plugins for theming, syntax highlighting, and keybindings. Wouldn’t it be nice to define your keybindings once for the OS rather than in each application? Wouldn’t it be nice if all applications could use SFTP to edit remote files instead of duplicating this feature in each app? </p> <p> <b>Safari</b>: The many parts of a web browser probably need to remain tightly coupled, but at least bookmarks and plugins could become modules in the system instead of tied directly to the browser. Even the renderer could be a module. In a modern Mac we already have these things (web views and bookmarks databases), but in the new OS the modularity would be explicit and usable from any language. </p> <p> </p> <h2 id="id83155">Only Data Matters</h2> <p> There is another, arguably more important trend here. By breaking up applications into pieces we shift the focus from the application to the data. The things the human wants to do with the data is the only important part. </p> <p> Our computers can again become "machines to do work", not places to manipulate applications. They become workstations. We interact with our data. The applications are simply small tools that we rearrange as needed. It's more like a craftsman's workbench then the office desk of today. </p> <p> </p> <h2 id="id75584">Next Time</h2> <p> Sadly for you, this is just part one of my series. Time time I'll dig into the actual UI and window manger. How will people actually interact with the system? How will we handle the ever growing information deluge of the 21st century. </p> <p> Until next time, keep tearing things down. </p>\nGetting Started with NodeJS and ThrustI’ve used a lot of cross platform desktop toolkits over the years. I’ve even built some when I worked on Swing and JavaFX at Sun, and I continue development of Amino, an OpenGL toolkit for JavaScript. I know far more about toolkits than I wish. You would think the hardest part of making a good toolkit is the graphics or the widgets or the API. Nope. It’s deployment. Client side Java failed because of deployment. How to actually get the code running on the end user’s computer 100% reliably, no matter what system they have. Deployment on desktop is hard. Perhaps some web technology can help. <p> </p> <h3 id="id72777">Thrust</h3> <p> Today we’re going to play with a new toolkit called <a href=''>Thrust</a>. Thrust is an embeddable web view based on Chromium, similar to Atom-Shell or Node-webkit, but with one big difference. The Chromium renderer runs in a separate process that your app communicates with over a simple JSON based RPC pipe. This one architectural decision makes Thrust far more flexible and reliable. </p> <p> Since the actual API is over a local connection instead of C bindings, you can use Thrust with the language of your choosing. Bindings already exist for NodeJS, Go, Scala, and Python. The bindings just speak the RPC protocol so you could roll your own bindings if you want. This split makes Thrust far more reliable and flexible than previous Webkit embedding efforts. Though it’s still buggy, I’ve already had far more success with Thrust than I did with Atom-Shell. </p> <p> For this tutorial I’m going to show you how to use Thrust with NodeJS, but the same principles apply to other language bindings. This tutorial assumes you already have node installed and a text editor available. </p> <p> </p> <p> </p> <h3 id="id82521">A simple app</h3> <p> First create a new directory and node project. </p> <pre><code>mkdir thrust_tutorial cd thrust_tutorial npm init</code></pre> Accept all of the defaults for <code>npm init</code>. <p> Now create a minimal <code>start.html</code> page that looks like this. </p> <pre><code>&lt;html> &lt;body> &lt;h1>Greetings, Earthling&lt;/h1> &lt;/body> &lt;/html></code></pre> <p> Create another file called <code>start.js</code> and paste this in it: </p> <pre><code>var thrust = require('node-thrust'); var path = require('path'); thrust(function(err, api) { var url = 'file://'+path.resolve(__dirname, 'start.html'); var window = api.window({ root_url: url, size: { width: 640, height: 480, } });; window.focus(); });</code></pre> <p> This launches Thrust with the <code>start.html</code> file in it. Notice that you have to use an absolute URL with a <code>file:</code> protocol because Thrust acts like a regular web browser. It needs real URLs not just file paths. </p> <p> </p> <h3 id="id3777">Installing Thrust</h3> <p> Now install Thrust and save it to the package.json file. The node bindings will fetch the correct binary parts for your platform automatically. </p> <pre><code>npm install --save node-thrust</code></pre> <p> </p> <p> Now run it! </p> <pre><code>node start.js</code></pre> <p> You should see something like this: </p> <p> <img src='' alt='text'/> </p> <p> A real local app in just a few lines. Not bad. If your app has no need for native access (loading files, etc.) then you can stop right now. You have a local page up and running. It can load JavaScript from remote servers, though I’d copy them locally for offline usage. </p> <p> However, you probably want to do something more than . The advantage of Node is the amazing ecosystem of modules. My personal use case is an Arduino IDE. I want the Node side for compiling code and using the serial port. The web side is for editing code and debugging. That means the webpage side of my app needs to talk to the node side. </p> <p> </p> <h3 id="id69180">Message Passing</h3> <p> Thrust defines a simple message passing protocol between the two halves of the application. This is mostly hidden by the language binding. The node function ‘window.focus()” actually becomes a message sent from the Node side to the Chromium side over an internal pipe. We don’t need to care about how it works, but we do need to pass messages back and forth. </p> <p> On the browser side add this code to <code>start.html</code> to send a message using the <code>THRUST.remote</code> object like this: </p> <pre><code>&lt;script type='text/javascript'> THRUST.remote.listen(function(msg) { console.log("got back a message " + JSON.stringify(msg)); }); THRUST.remote.send({message:"I am going to solve all the world's energy problems."}); &lt;/script></code></pre> <p> Then receive the message and respond on the Node side with this code: </p> <pre><code> window.on('remote', function(evt) { console.log("got a message " + JSON.stringify(evt)); window.remote({message:"By blowing it up?"}); });</code></pre> <p> The messages may be any Javascript object that can be serialized as JSON, so you can't pass functions back and forth, just data. </p> <p> If you run this code you'll see a bunch of debugging information on the command line including the <code>console.log</code> output. </p> <pre><code>[55723:1216/] REMOTE_BINDINGS: SendMessage got a message {"message":{"message":"I am going to solve all the world's energy problems."}} [55721:1216/] [API] CALL: 1 remote [55721:1216/] ThrustWindow call [remote] [55721:1216/141202:INFO:CONSOLE(7)] "got back a message {"message":"By blowing it up?"}", source: file:///Users/josh/projects/thrust_tutorial/start.html (7)</code></pre> <p> Notice that both ends of the communication are here; the node and html sides. Thrust automatically redirects console.log from the html side to standard out. I did notice, however, that it doesn't handle the multiple arguments form of console.log, which is why I use <code>JSON.stringify()</code>. Unlike in a browser, doing <code>console.log("some object",obj)</code> would result in only the some object text, not the structure of the actual object. </p> <p> </p> <p> Now that the UI can talk to node, it can do almost anything. Save files, talk to databases, poll joysticks, or play music. Let’s build a quick text editor. </p> <p> </p> <h3 id="id12933">Building a text editor</h3> <p> </p> <p> Create a file called <code>editor.html</code>. </p> <pre><code>&lt;html> &lt;head> &lt;script src="" type="text/javascript" charset="utf-8">&lt;/script> &lt;script src=''>&lt;/script> &lt;style type="text/css" media="screen"> #editor { position: absolute; top: 30; right: 0; bottom: 0; left: 0; } &lt;/style> &lt;/head> &lt;body> &lt;button id='save'>Save&lt;/button> &lt;div id="editor">function foo(items) { var x = "All this is syntax highlighted"; return x; }&lt;/div> &lt;script> var editor = ace.edit("editor"); editor.setTheme("ace/theme/monokai"); editor.getSession().setMode("ace/mode/javascript"); $('#save').click(function() { THRUST.remote.send({action:'save', content: editor.getValue()}); }); &lt;/script> &lt;/body> &lt;/html></code></pre> <p> This page initializes the editor and creates a handler for the save button. When the user presses save it will send the editor contents to the node side. Note the <code>#editor</code> css to give it a size. Without a size an Ace editor will shrink to 0. </p> <p> </p> <p> This is the new Node side code, <code>editor.js</code> </p> <pre><code>var fs = require('fs'); var thrust = require('node-thrust'); thrust(function(err, api) { var url = 'file://'+require('path').resolve(__dirname, 'editor.html'); var window = api.window({ root_url: url, size: { width: 640, height: 480, } });; window.focus(); window.on('remote',function(evt) { console.log("got a message" + JSON.stringify(evt)); if(evt.message.action == 'save') return saveFile(evt.message); }); }); function saveFile(msg) { fs.writeFileSync('editor.txt',msg.content); console.log("saved to editor.txt"); }</code></pre> <p> </p> <p> Run this with <code>node editor.js</code> and you will see something like this: </p> <p> </p> <p> <img src='' alt='text'/> </p> <p> Sweet. A real text editor that can save files. </p> <p> </p> <p> </p> <h3 id="id52801">Things I haven't covered </h3> <p> You can control native menus with Thrust. On Windows or Linux this should be done in app view itself with the various CSS toolkits. On Mac (or Linux with Unity) you will want to use the real menubar. You can do this with the api documented <a href=''>here</a>. It's a pretty simple API but I want to mention something that might bite you. Menus in the menubar will in the order you create them, not the order you add them to the menubar. </p> <p> </p> <p> Another thing I didn’t cover, since I’m new to it myself, is <a href=''>webviews</a>. Thrust lets you embed a second webpage inside the first, similar to an iFrame but with stricter permissions. This webview is very useful is you need to load untrusted content, such as an RSS reader might do. The webvew is encapsulated so you can run code that might crash in it. This would be useful if you were making, say, an IDE or application editor that must repeatedly run some (possibly buggy) code and then throw it away. I’ll do a future installment on web views. </p> <p> I also didn't cover is packaging. Thrust doesn’t handle packaging with installers. It gives you an executable and some libraries that you can run from a command line, but you still must build a native app bundle / deb / rpm / msi for each platform you want to support if you want a nicer experience. Fortunately there are other tools to help you do this like <a href=''>InnoSetup</a> and <a href=''>FPM</a> . </p>\nBeautiful Lego 2: DarkThe follow up to last year’s Beautiful Lego, Mike Doyle brings us back for more of the best Lego models from around the world. This time the theme is Dark. As the book explains it: “destructive objects, like warships and mecha, and dangerous and creepy animals… dark fantasies of dragons and zombies and spooks” I like the concept of a theme as it helps focus the book. The theme of Dark was stretched a bit to include banks and cigarettes, and vocaloids (mechanical japanese pop-stars), but it’s still 300+ gorgeous pages of the world’s best Lego art. Beautiful Lego 2 is filled to the brim with Zerg like insect hordes, a <b>lot</b> of Krakens, and some of the cutest mechs you’ve ever seen. <p> </p> <p> </p> <p> </p> <p> <img src='' alt='text'/> </p> <p> Unlike the previous book, Beautiful Lego 2 is a hardcover. I guess the first book was popular enough that No Starch Press could really invest in this one, and it shows. It’s a thick book with proper stitching, a dust jacket, and quality paper. The book lays nicely when open like a good library edition would. This is a book that will still look new in a few decades. </p> <p> Beautiful Lego 2 is a true picture book, and well worth the price for the Lego fan in your family. I know you can get an electronic edition, but this is the sort that lets us know why physical books still exist. BTW. you can get 30% off right now with the coupon code SHADOWS . </p> <p> Buy it now at <a href=''></a> </p>\nAmino: Now with 33% less C++My hatred of C and C++ is world renown, or at least it should be. It's not that I hate the languages themselves, but the ancient build chain. A hack of compilers and #defines that have to be modified for every platform. Oh, and segfaults and memory leaks. The usual. Unfortunately, if you want to write fast graphics code you're pretty much going to be stuck with C or C++, and that's where Amino comes in. <p> <a href=''>Amino</a>, my JavaScript library for OpenGL graphics on the Raspberry Pi (and other platforms), uses C++ code underneath. NodeJS binds to C++ fairly easily so I can hide the nasty parts, but the nasty is still there. </p> <p> As the Amino codebase has developed the C++ bindings have gotten more and more complicated. This has caused a few problems. </p> <p> </p> <h3 id="id63379">3rd Party Libraries</h3> <p> First, Amino depends on several 3rd party libraries; libraries the user must already have installed. I can pretty much assume that OpenGL is installed, but most users don't have FreeType, libpng,or libjpeg. Even worse, most don't have GCC installed. While I don't have a solution for the GCC problem yet, I want to get rid of the libPNG and jpeg dependencies. I don't use most of what the libraries offer and they represent just one more thing to go wrong. Furthermore, there's no point is dealing with file paths on the C++ side when Node can work with files and streams very easily. </p> <p> So, I ripped that code completely out. Now the native side can turn a Node buffer into a texture, regardless of where that buffer came from. Jpeg, PNG, or in-memory data are all treated the same. To decompress JPEG and PNGs I found two image libraries, <a href=''>NanoJPEG</a> and <a href=''>uPNG</a>, (each a single file!), to get the job done with minimal fuss. This code is now part of Amino so we have fewer dependencies and more flexibility. </p> <p> I might move to pure JS for image decoding in the future, as I'm doing for my <a href=''>PureImage library</a> but that might be very slow on the PI so we'll stick with native for now. </p> <p> </p> <h3 id="id41582">Shaders Debugging</h3> <p> GPU Shaders are powerful but hard to debug on a single platform, much less multiple platforms. As the shaders have developed I've found more differences between the Raspberry Pi and the Mac, even when I use ES 2.0 mode on the Mac. JavaScript is easier to change and debug than C++ (and faster to compile on the Pi), so I ripped out all of the shader init code and rewrote it in JavaScript. </p> <p> The new JS code initializes the shaders from files on disk instead of inline strings, as before. This means I don't have to recompile the C++ side to make changes. This also means I can change the init code on a per platform basis at runtime rather than #defines in C++ code. For speed reasons the drawing code which <b>uses</b> the shaders is still in C++, but at least all of the nasty init code is in easier to maintain JS. </p> <p> </p> <h3 id="id52820">Input Events</h3> <p> Managing input events across platforms is a huge pain. The key codes vary by keyboard and particular locale. Further complicating matters GLFW, the Raspbian Linux Kernel, and the web browser also use different values for different keys, as well as mouse and scroll events. Over the years I've built key munging utilities over and over. </p> <p> To solve this problem I started moving Amino's keyboard code into a new Node module: <a href=''>inputevents</a>. It does not depend on Amino and will, eventually, be usable in plain browser code as well as on Mac, Raspberry PI, and wherever else we need it to go. Eventually it will support platform specific <a href=''>IMEs</a> but that's a ways down the road. </p> <p> </p> <h3 id="id61570">Random Fixes and Improvements</h3> <p> Since I was in the code anyway, I fixed some alpha issues with dispmanx on the Pi, made opacity animatable, added more unit tests, turned clipping back on, built a new PixelView retained buffer for setting pixels under OpenGL (ex: software generation of fractals), and started using pre-node-gyp to make building the native parts easier. </p> <p> I've also started on a new rich text editor toolkit for both Canvas and Amino. It's extremely beta right now, so don't use it yet unless you like 3rd degree burns. </p> <p> That's it for now. <a href=''>Enjoy, folks.</a> </p> <p> Carthage and C++ must be destroyed. </p>\nMulti-threaded fractals with Amino and NodeJSI recently added the ability to set individual pixels in Amino, my Node JS based OpenGL scene graph for the Raspberry Pi. To test it out I thought I'd write a simple Mandlebrot generator. The challenge with CPU intensive work is that Node only has one thread. If you block that thread your UI stops. Dead. To solve this we need a background processing solution. <p> </p> <h3 id="id6769">A Simple Background Processing Framework</h3> <p> While there are true threading libraries for Node, the simplest way to put something into the background is to start another Node process. It may seem like starting a process is heavyweight compared to a thread in other languages, but if you are doing something CPU intensive the cost of the exec() call is tiny compared to the rest of the work you are doing. It will be lost in the noise. </p> <p> To be really useful, we don't want to just start a child process, but actually communicate with it to give it work. The <code>childprocess</code> module makes this very easy. <code>childprocess.fork()</code> takes the path to another script file and returns an event emitter. We can send messages to the child through this emitter and listen for responses. Here's a simple class I created called <code>Workman</code> to manage the process. </p> <pre><code>var Workman = { count: 4, chs:[], init: function(chpath, cb, count) { if(typeof count == 'number') this.count = count; console.log("using thread count", this.count); for(var i=0; i&lt;this.count; i++) { this.chs[i] = child.fork(chpath); this.chs[i].on('message',cb); } }, sendcount:0, sendWork: function(msg) { this.chs[this.sendcount%this.chs.length].send(msg); this.sendcount++; } }</code></pre> <p> Workman creates <code>count</code> child processes, then saves them in the <code>chs</code> array. When you want to send some work to it, call the <code>sendWork</code> function. This will send the message to one of the children, round robin style. </p> <p> Whenever a child sends an event back, the event will be handed to the callback passed to the <code>workman.init()</code> function. </p> <p> Now that we can talk to the child processes it's time to do some drawing. </p> <p> </p> <h3 id="id30890">Parent Process</h3> <p> This is the code to actually talk to the screen. First the setup. <code>pv</code> is a new PixelView object. A PixelView is like an image view, but you can set pixel values directly instead of using a texture from disk. <code>w</code> and <code>h</code> are the width and height of the texture in the GPU. </p> <pre><code>var pv = new amino.PixelView().pw(500).w(500).ph(500).h(500); root.add(pv); stage.setRoot(root); var w =; var h =;</code></pre> <p> Now let's create a Workman to schedule the work. We will submit work for each row of the image. When work comes back from the child process the <code>handleRow</code> function will handle it. </p> <pre><code>var workman = Workman; workman.init(__dirname+'/mandle_child.js',handleRow); var scale = 0.01; for(var y=0; y&lt;h; y++) { var py = (y-h/2)*scale; var msg = { x0:(-w/2)*scale, x1:(+w/2)*scale, y:py, iw: w, iy:y, iter:100, }; workman.sendWork(msg); }</code></pre> Notice that the work message must contain all of the information the child needs to do it's work: the start and end values in the x direction, the y value, the length of the row, the index of the row, and the number of iterations to do (more iterations makes the fractal more accurate but slower). This message is the only communication the child has from the outside world. Unlike with threads, child processes do not share memory with the parent. <p> Here is the <code>handleRow</code> function which receives the completed work (an array of iteration counts) and draws the row into the PixelView. After updating the pixels we have to call updateTexture to push the changes to the GPU and screen. lookupColor converts the iteration counts into a color using a look up table. </p> <pre><code>function handleRow(m) { var y = m.iy; for(var x=0; x&lt;m.row.length; x++) { var c = lookupColor(m.row[x]); pv.setPixel(x,y,c[0],c[1],c[2],255); } pv.updateTexture(); }</code></pre><pre><code>var lut = []; for(var i=0; i&lt;10; i++) { var s = (255/10)*i; lut.push([0,s,s]); } function lookupColor(iter) { return lut[iter%lut.length]; }</code></pre> <p> </p> <h3 id="id72131">Child Process</h3> <p> Now let's look at the child process. This is where the actual fractal calculations are done. It's your basic Mandelbrot. For each pixel in the row it calculates a complex number until the value exceeds 2 or it hits the maximum number of iterations. Then it stores the iteration count for that pixel in the <code>row</code> array. </p> <pre><code>function lerp(a,b,t) { return a + t*(b-a); } process.on('message', function(m) { var row = []; for(var i=0; i&lt;m.iw; i++) { var x0 = lerp(m.x0, m.x1, i/m.iw); var y0 = m.y; var x = 0.0; var y = 0.0; var iteration = 0; var max_iteration = m.iter; while(x*x + y*y &lt; 2*2 && iteration &lt; max_iteration) { xtemp = x*x - y*y + x0; y = 2*x*y + y0; x = xtemp; iteration = iteration + 1; } row[i] = iteration; } process.send({row:row,iw:m.iw,iy:m.iy}); })</code></pre> <p> After every pixel in the row is complete it sends the row back to the parent. Notice that it also sends an iy value. Since the children could complete their work in any order (if one row happens to take longer than another), the iy value lets the parent know which row this result is for so that it will be drawn in the right place. </p> <p> Also notice that all of the calculation happens in the <code>message</code> event handler. This will be called every time the parent process sends some work. The child process just waits for the next message. The beauty of this scheme is that Node handles any overflow or underflow of the work queue. If the parent sends a lot of work requests at once they will stay in the queue until the child takes them out. If there is no work then the child will automatically wait until there is. Easy-peasy. </p> <p> Here's what it looks like running on my Mac. Yes, Amino runs on Mac as well as Linux. I mainly talk about the Raspberry Pi because that's Amino's sweet spot, but it will run on almost anything. I chose Mac for this demo simply because I've got 4 cores there and only 1 on my Raspberry Pi. It just looks cooler to have for bars spiking up. :) </p> <p> <img src='' alt='text'/> </p> <p> </p> <p> This code is now in the <a href=''>aminogfx</a> repository under <code>demos/pixels/mandle.js</code>. </p>\nThoughts on APL and Program NotationA post about <a href=' '>Arthur Whitney and kOS</a> made the rounds a few days ago. It concerns a text editor Arthur made with four lines of K code, and a complete operating system he’s working on. These were all built in <a href=''>K</a>, a vector oriented programming language derived from <a href=''>APL</a>. This reminded me that I really need to look at APL after all of the language ranting I’ve done recently. <p> Note: For the purposes of this post I’m lumping K, J, and the other APL derived languages in with APL itself, much as I’d refer to Scheme or Clojure as Lisps. </p> <p> After reading up, I’m quite impressed with APL. I’ve always heard it can do complex tasks in a fraction of the code as other languages, and be super fast. It turns out this is very true. Bernard Legrand's <a href=''>APL – a Glimpse of Heaven</a> provides a great overview of the language and why it’s interesting. </p> <p> APL is not without it’s problems, however. The syntax is very hard to read. I’m sure it becomes easier once you get used to it, but I still spent a lot more time analyzing a single line of code than I would in any another language. </p> <p> APL is fast and compact for some tasks, but not others. Fundamentally it’s a bunch of operations that work on arrays. If your problem can be phrased in terms of array operations then this is awesome. If it can’t then you start fighting the language and eventually it bites you. </p> <p> I found anything with control structures to be cumbersome. This isn’t to say that APL can’t do things that require an <code>if</code> statement, but you don’t get the benefits. <a href=' '>This code to compute a convex hull</a>, for example, seems about as long as it would be in a more traditional language. With a factor of 2, at least. It doesn’t benefit much from APL’s strengths. </p> <p> Another challenge is that the official syntax uses non-ASCII characters. I actually don’t see this as a problem. We are a decade and a half into the 21st century and can deal with non-ASCII characters quite easily. The challenge is that the symbols themselves to most people. I didn’t find it hard to pick up the basics after reading a half hour tutorial, so I think the real problem is that the syntax scares programmers away before they ever try it. </p> <p> I also think enthusiasts focus on how much better APL is than other languages, rather than simply showing someone why they should spend the time to learn it. They need to show what it can <b>do</b> that is also <b>practical</b>. While it’s cool to be able to calculate all of the primes from 1 to N in just a few characters, that isn’t going to sell most developers because that’s not a task they actually need to accomplish very often. </p> <p> APL seems ideal for solving mathematical problems, or at least a good subset of them. The problem for APL is that Mathematica, MathLab, and various other tools have sprung up to do that better. </p> <p> Much like Lisp, APL seems stuck between the past and the future. The things it’s really good at it is too general for. More recent specialized tools to the job better. APL isn't general enough to be good as a general purpose language. And many general purpose languages have added array processing support (often through libraries) that make them good enough for the things APL is good at. Java 8 streams and lambda functions, for example. Thus it remains stuck in a few niches like high speed finance. This is not a bad niche to be in (highly profitable, I’m sure) but APL will never become widely used. </p> <p> That said, I really like APL for the things it’s good at. I wish APL could be embedded in a more general purpose language, much like regular expressions are embedded in JavaScript. I love the concept of a small number of functions that can be combined to do amazing things with arrays. This is the most important part of APL &mdash; for me at least &mdash; but it’s hidden behind a difficult notation. </p> <p> I buy the argument that any notation is hard to understand until you learn it, and with learning comes power. Certainly this is true for reading prose. </p> <p> Humans are good pattern recognizers. We don’t read by parsing letters. Only children just learning to read go letter by letter. The letters form patterns, called words, that our brains recognize in their entirety. After a while children's brains pick up the patterns and process them whole. In fact our brains are <b>so</b> good at picking up patterns that we can read most English words with all of the letters scrambled <a href=''>as long as the first and last letters are correct</a>. </p> <p> I’m sure this principle of pattern recognition applies to an experienced APL programmer as well. They can probably look at this </p> <pre><code>x[⍋x←6?40]</code></pre> <p> and think: <code>pick six random numbers from 1 to 40 and return them in ascending order</code>. </p> <p> After a time this mental processing would become natural. However, much like with writing, code needs spacing and punctuation to help the symbolic "letters" form words in the mind of the programmer. Simply pursuing compactness for the sake of "mad skillz props" doesn’t help anyone. It just makes for <a href=''>write-only code</a>. </p> <p> Were I to reinvent computing (in my fictional JoshTrek show where the computer understands all spoken words with 200% accuracy), I would replace the symbols with actual meaningful words, then separate them into chunks with punctuation, much like sentences. </p> <p> this </p> <pre><code>x[⍋x←6?40]</code></pre> would become<pre><code>deal 6 of 1 to 40 => x, sort_ascending, index x</code></pre> <p> The symbols are replaced with words and the ordering swapped, left to right. It still takes some training to understand what it means, but far less. It’s not as compact but far easier to pick up. </p> <p> So, in summary, APL is cool and has a lot to teach us, but I don’t think I’d ever use it in my daily work. </p> <p> </p> <p> addendum: </p> <p> Since writing this essay I discovered <a href=' '>Q</a>, also by Arthur Whitney, that expands K’s terse syntax, but I still find it harder to read than it should be. </p>\nElectron 0.4 beta 3I am unhappy to announce the release of Electron 0.4 beta 3. <p> What's that? <b>unhappy</b>?! Well...... </p> <p> I haven't done a release quite some time. Part of this delay is from a complete refactoring of the user interface; but another big chunk of time comes from trying to build Electron with Atom Shell. </p> <p> <a href=''>AtomShell</a> is a tool that bundles WebKit/Chromium and NodeJS into a single app bundle. This means developers can download a single app with an icon instead of running Electron from the command line. It might even let us put it into the various app stores some day. </p> <p> Unfortunately, the switch to AtomShell hasn't been as smooth as I would like. The Mac version builds okay but I have yet to get Windows to work. There seems to be some conflict between the version of Node that the native serial port module uses and the version of Node inside of AtomShell. While I'm sure these are solvable problems I don't want to hold back the rest of Electron. It's still useful even if you have to launch it from the command line. So... </p> <p> </p> <h3 id="id21384">Electron 0.4 beta 3</h3> <p> You can download a Mac .app bundle <a href=''>from here</a>, or <a href=''>check out the source</a> and run <code>node electron</code> to start it from the command line. The new file browser works as do the various dialogs. Compiler output shows up in the debug panel. You can upload through the serial port but the serial port console is still disabled (due to other bugs I'm still working through). </p> <p> Undoubtedly many things are still broken during the transition from the old UI to the new. Please, please, please <a href=''>file issues on github</a>. I'll get to them ASAP. </p> <p> Thanks, Josh </p>