GPU Computing: the Mac Pro and the Raspberry Pi.

Now that Apple has given us final specs and cost for the redesigned Mac Pro I’ve heard complaints that it is underpowered and non-expandable, especially for the price. The Pro comes with reasonably beefy CPUs but they will be out of date in a few years. The buyer can only expand the ram and disk, and not so much on the disk side given the lack of available space. So how can this be worth the $3000 entry price Apple is charging?

First we must realize that the Mac Pro isn’t for everyone. It really is for creative professionals who spend a lot of time in Logic, Aperture, Final Cut Pro, Maya, and other pro apps. These people need the maximum ram and processing power possible, and will pay for it. Expandability of storage isn’t a problem because they don’t care about internal storage anyway. Anyone who buys one of these will be using a stack of external drives or NAS. I can buy a 3TB drive at Costco for under 200 bucks! Thus the nice collection of Thunderbolt and USB ports on MacPro’s backside.

More importantly, however, the CPUs aren’t the real focus of the new Mac Pro. Apple is betting that the future of high speed computation is GPU computing. Apple is right.

I recently went to the International Super Computing conference when it was held here in Oregon. At least 50% of the talks were about how to restructure computing tasks to take advantage of GPUs. GPUs are the future of almost all high performance computing. GPUs are not as general purpose as a modern CPU, but if you can structure your problem in a way that a GPU can compute, then you can get a 5x to 10x performance boost for the same watt (or dollar). Intel and Nvidia are happy to sell you a stack of GPUs without video connectors. These cards exist purely for GPU computation. Daisy chained together a stack of GPUs will beat any traditional super computer.

Of course, with the GPUs doing the heavy lifting the challenge becomes how to get your data *to* the GPU quickly. That’s why Apple’s MacPro site spends so much time talking about the IO bus and memory bandwidth. Internal storage? CPU upgrades? Who cares! The MacPro is all about moving data in and out of beefy GPUs as fast as possible.

Apple has been working on this for a while. Initially they started shifting graphics work to the GPU with Quartz Extreme. This enabled the OSX compositing window manager to run smoothly on older hardware. Later Apple introduced full Mac support for OpenCL, a computation companion API to OpenGL. When you write some code in OpenCL the Mac can shift the computation dynamically between the CPU and the GPU. Powerful GPUs can make up for weak CPUs.

And this brings me to the Raspberry Pi, my favorite cheap ARM based mini-computer -- so cheap I’ve seen hard drives with Pi’s glued to the side of them as files servers. At 700mhz the Raspberry Pi’s CPU is anemic but the GPU is surprisingly powerful. Broadcom’s VideoCore IV not only supports OpenGL 2.0, meaning real shader support, it also has H264 video encoding and decoding in hardware. It can decode a 1080p video in real time on this 35$ computer. The CPU just has to stream the compressed video file to memory; the GPU will care of the rest.

The Pi’s GPU also has an interesting API called dispmanx. While it is extremely undocumented, I’ve learned that this API lets you set up an almost unlimited number of hardware layers in the GPU. You can have one layer with 3D content from OpenGL while a second layer plays video and a third shows images. Most importantly each of these layers can be resized and alpha blended entirely by the GPU. This means we can create a full compositing window manager like OSX and Window 7 have, all on our tiny 700mhz computer. These guys are already working on a port of the composited Wayland/Weston library to the RaspberryPI.

While the Raspberry Pi does not support OpenCL it is possible to use the GPU for accelerated JPG decompression and there is ongoing efforts to directly target the VideoCore’s internal APIs for SIMD processing.

All of this power comes from shifting computation from the general purpose CPU to the custom purpose GPU. This is a long term trend. Over time more and more work will be shifted. GPUs can’t do all computational tasks of course; but if you can transform your problem in to something the GPU can handle (preferably something highly parallel), then you’re golden. He who controls the GPU... controls the world! Now let’s get some cheese, Pinky.

Talk to me about it on Twitter

Posted November 12th, 2013

Tagged: gpu raspberrypi mac