Call for a Data Bill of Rights

Early open source pioneer Brian Behlendorf famously said, "the most important requirement [in open source] is the right to fork.” He wisely observed that the right to fork source code generally ensured it never actually be done. The mere threat of forking creates an incentive driving good behavior. Most open source communities are able to self-police well enough that true forking is a rarity.

I would like to update this thesis for the year 2015: the most important requirement today is the right to take your data and leave. Security, privacy, and tractability of data collection can and will be solved by free markets as long as the incentives are right.  It all hinges on what I’m calling a Data Bill of Rights.

The Internet of Chattering Things

The future will be a busy place with an endless array of devices collecting data about us. Cisco says they will number 50 billion. Their estimate is a bit high for my taste, but the number will surely be large.

A better way to think about devices is per-capita. With an eventual world population of ~10 billion that’s 50 devices per person. Hmm.. that doesn’t seem so unrealistic now. I know I’ve got at least 20things in my house that can collect data in some form or another, and it’s only a matter of time before they attach to the network. Having 50 per person by mid-century is quite feasible.

Such a future is both awesome and terrifying. The power will be amazing. We already have the abilities of supermen. We can transmit a thought around the world to anyone who’ll listen. We can summon a car without a word, have any product shipped to our homes in two days, and control machines with a gesture.

Yes, future technology will be powerful but, as always, power has it’s downsides. Constant communication through products and services which aren’t ours opens many questions. Is our data private? Is our data secure? Which device is talking to which? Can my devices from different vendors even communicate? However I’m not actually worrying about any of those things. I’m more worried about a deeper issue: the underlying incentives.

The Social Network

Let’s consider a future theoretical social network. We’ll call it WhatTwitFace or WTF for short. WTF lets people share their private thoughts and emotions through the eternal medium of the selfie. Each photo is tagged with time, location, and mental state (using a neural sensor embedded into their Apple Watch 4’s). 

Users of WTF love the service but they worry about their privacy. What if WTF is hacked? What if WTF sells their data to advertisers? WTF might use neural targeting to subconsciously influence our purchasing habits.  How do what what WTF is really doing? 

WTF has a terms of service but we have no way of knowing if they are really abiding by those rules; which they could change at a moment’s notice anyway.  How do we make sure they behave?

One option is to create a regulatory environment that sets out specific privacy and security rules. A set of laws specifying what companies can and can’t do. I don’t think this will work.

Here's the problem: technology laws age like milk. They proscribe specific behavior that is barely relevant by the time they are passed, much less a few decades in the future. Just look at the DMCA.  New laws also cut off many valid business models that aren’t anticipated at the time they are passed. And even if the laws are perfect how do we know a company hasn’t found a clever way around it them? Enforcement is hard, and humans are sneaky. Fortunately we already have a solution. Instead of proscribing behavior we can use rules to create a market.

Data Ownership

Too often people say "free markets” when they only mean the second part: the market; a place to exchange. We often forget the ‘free as in freedom’ part, which is really more important. A market can find good solutions only when it moves freely. If there is only one player in the market, the market won’t work. If not all companies play by the same rules, the market won’t work. 

Let’s turn this into an engineering problem again. What laws do we need — preferably the minimum amount required -- for the free market to work? We need trust. Trust comes from transparency and ownership. B knows what A is doing because B can see it.This is transparency.  B knows that even if A is sneaky or lies he can still leave because he owns his stuff. It is this ‘ownership' that is crucial. 

We must each own our data.  Data ownership lowers the barrier to switching, which promotes competition so the players act better, so actual switching is less needed. It’s insurance.

Ye Olde Dumb Phones

Let’s talk about a few real world examples. I couldn’t easily switch cell phone carriers in the US until the mid-2000s. There was very little competition for a given customer. Barriers to switching were very high. You not only had to buy a new phone and pay to exit your contract, you also need a new phone number.  When number portability came carriers became a lot better but they still owned our phones to some extent. The phone really itself wasn’t the issue. It’s the data. 

Today can I realistically switch from one phone to another? It used to be a lot of work to move your address book from one phone to another. In theory SIM cards supported this, but in practice it was a huge pain so most people just typed it all in again.  This is a key problem Apple and Google have solved with Google accounts and Apple IDs. I can type my id on a new device and most of my stuff just moves over like magic. My contacts, my calendars, my apps. Almost everything. It’s not perfect but it’s a lot better than what we had.   We gained more ownership of our data.

Here's another example. When I was at Palm we had a particularly tricky problem. We couldn’t fully sync your address book with Facebook’s contacts because they imposed specific rules about how this data had to be treated, including not caching it for more than a certain time. I thought this was ridiculous. The data belongs to the uses, not Facebook. If the user wants to download it all to their device they should be able to do so, but we were bound by Facebook’s rules. With a strong data ownership framework this would be easy to solve. 

Today we experience the same problem in other areas. Chat apps, for example. We started with AIM and ICQ and Yahoo Messenger, then began moving in the right direction with Jabber and other open protocols. But now we are back to interoperability problems with Facebook, Google+ and Apple Messenger (and Slack, and Flowdock, and SnapChat, and….)   This wouldn’t be as much of an issue if I could extract my data: contacts list, history, attachments, and switch to another service. Data portability promotes competition and improves the behavior of the big actors.

A Counter Example

To consider the benefits of data ownership, consider Wordpress. Wordpress is awesome. I can set up a blog easily on their site but they don’t lock me in. I can download all of my data -- the pages, posts, and comments --  into a single big zip file.  This file is complex. As an end user I can’t do anything with the file directly but it is easily parseable by tools. Other blog providers can import WordPress data. Some developers have written tools to work with the zip. The users own their data and can always take it with them.

This openness actually benefits WordPress as well. First, the insurance of being able to leave means I am less like to actually do it. It keeps WordPress on their best behavior. Second, I can easily migrate from WordPress's hosted version to my own private install. Or, as my volume grows and I don’t want to maintain it anymore, I can move to one of WordPress’ professional plans hosted by their experts.  Data ownership enables portability, which enables new business models.

A Data Bill of Rights

By now I hope I’ve convinced you we really need a Data Bill of Rights.   I propose the following:

1) I own my data in the legal sense. I have some say over what happens to it. I can take a violator to court over it. I own it.

2) I can delete my data at any time.

3) I can get all of my data in a non-obfuscated format at any time. All of it. At any time. For free. Easily.  In a world of on-demand cloud services this is not hard to implement. Most downloads can be implemented with a three line Node script.

4) I can get realtime incremental access to my data by date or other criteria. No forcing me into a single download or archive only.

5) The data format must not hinder switching. No patents. No extra requirements. My data is my data.

Note that this Data Bill of Rights does not prevent:

1) Sharing my data with others.

2) Letting a service aggregate my data with other users to analyze trends (even in the deletion case).

3) Signing a service contract. If I’m on the hook for 2 years of cell service with Verizon then I’m on the hook for it. But, I can take my data with me at any time.

Strong data ownership laws will let the market solve most of the other problems.  Vendors have an incentive to let two services work together if they know you can jump ship at any time. A company that has taken good care of your data earns your trust. You are more likely to buy better and new services from this company.  

With a Data Bill of Rights acompany that relies on data aggregation for it’s business model can still do so, while allowing individuals their freedom.  Yes we can have our cake and eat it too, but only as long as individuals get to own their data, and that is why we need a Data Bill of Rights.

Talk to me about it on Twitter

Posted October 8th, 2015

Tagged: data privacy