The complexity of large software: an AT&T example

I purchased an iPhone yesterday and tried to activate it. Foolish, I know, but I'm crazy like that. Activating my iPhone today (starting last night, really) proved to be a multi-hour ordeal involving Googling, and spending over an hour on hold with AT&T representatives. The root cause was my Atlanta area code phone number not matching the zipcode of the town where I live in Oregon. The registration software didn't make this clear, instead giving me obscure errors and eventually sending me an email requesting I call a 1800 number. Adding to the confusion, Apple's own email program put the message into my junk folder. Had I not thought to search my junk folder a few hours later I might still be holding an iBrick. Now, after two hours with customer service and almost 24 hours after receiving my phone, I am finally syncing some music.

This blog really it's about the iPhone or AT&T in particular. It's about software. The horrible unspeakable complexities of software. And my question is what can we do about it? Please allow me, gentle reader, to dive deeper.

Somewhere inside of AT&T is a database. This database isn't a single instance of what we would think of as a database (a collection of indexed SQL tables). It's most likely a variety of different systems from different manufacturers (and different decades!) pulled together with scripts, web services, and duct tape to form a collective repository of everything that makes up AT&T (which is itself a frankensteinian glob of various companies assimilated by AT&T over the past decade).

My problem with registration ultimately stemmed from the fact that I have a billing address no longer inside the area code of my phone number. This simple change so confuses their automated systems that my case required manual intervention. But why is it an issue? Most likely there was a time when AT&T had different plans for certain areas of the company, or contract agreements with local providers, or FCC regulations that prevented them from servicing certain areas. Whatever the reasons used to be, they are gone; but the software that enforced those rules is not. In big companies the software never changes, but is only built upon.

The most frustrating part of this ordeal is that my Googling turns up many people with the same problem, most of them dating from the evening of the first iPhone activations three months ago. In other words, AT&T and Apple have known about this problem for months and still not fixed it. The software still hasn't changed, even though it cost them several hours of customer support time, as well as having frustrated customers.

I recall reading cyberpunk books in the early nineties featuring arcane software systems with holes breached by other arcane software systems. There was often a super character who would earn big bucks by navigating the lost and forgotten pathways of software to crack a system, defuse a bomb, or otherwise solve the plot point at hand. At the time I thought: "This is totally unrealistic. Why would anyone write software that allowed (or required) such hacks. Why wouldn't we just build better simpler software?". Well, now I know these scenarios aren't as fictional as I thought.

So why is software like this? Why does it grow and grow in complexity, with obsolete features still active and still causing pain? Why aren't these systems replaced? Why can't we make software that evolves as the organization changes? Is this such a difficult problem that we will never solve it, or can we imagine a time when software becomes a fluid and flexible part of the organization that built it?

Talk to me about it on Twitter

Posted October 7th, 2007