After 1984

iTunes Genius

iTunes 8 has added a new feature called “Genius” that harnesses the collective behavior of iTunes Music Store shoppers to generate “perfect” playlists.

Had an interesting email exchange with my mom earlier today about Monica Hesse’s story Bytes of Life. The crux of the story is that more and more people are self-monitoring and collecting data about themselves, in many cases, because, well, it’s gotten so much easier, so, why not?

Well, yes, it is easier, but just because it is easier, doesn’t automatically mean that one should do it, so let’s look at this a little more deeply.

First, my mom asked about the amount of effort involved in tracking all this data:

I still have a hard time even considering all that time and effort spent in detailing every moment of one’s life, and then the other side of it which is that it all has to be read and processed in order to “know oneself”. I think I like the Jon Cabot Zinn philosophy better — just BE in the moment, being mindful of each second doesn’t require one to log or blog it, I don’t think. Just BE in it.

Monica didn’t really touch on too many tools that we use to self-monitor. It’s true that, depending on the kind of data we’re collecting, the effort will vary. But so will the benefits.

MyMileMarkerIf you take a look at MyMileMarker’s iPhone interface, you’ll see how quick and painless it is to record this information. Why bother? Well, for one thing, over time you get to see not only how much fuel you’re consuming, but how much it’s going to cost you to keep running your car in the future:

View my Honda Civic - My Mile Marker

Without collecting this data, you might guess at your MPG, or take the manufacturer’s rating as given, but when you record what actually is happening, you can prove to yourself whether filling up your tires really does save you money (or the planet).

On the topic of the environment, recording my trips on Dopplr gives me an actual view of my carbon footprint (pretty damning, indeed):

DOPPLR Carbon

As my mom pointed out, perhaps having access to this data will encourage me to cut back excess travel — or to consolidate my trips. Ross Mayfield suggests that he could potentially quit smoking if his habit were made more plainly visible to him.

What’s also interesting is how passive monitors, or semi-passive monitoring tools, can also inform, educate or predict — and on this point I’m thinking of Last.fm where of course my music taste is aggregated, or location-based sites like Brightkite, where my locative behavior is tracked (albeit, manually — though Fire Eagle + Spot changes that).

My mom’s other point about the ability to just BE in the moment is also important — because self-tracking should ideally be non-invasive. In other words, it shouldn’t be the tracking that changes your behavior, but your analysis and reflection after the fact.

One of the stronger points I might make about this is that data, especially when collected regularly and when the right indicators are recorded, you can reduce a great amount of distortion from your self-serving biases. Monica writes:

“We all have the tendency to see our behaviors in a little bit of a halo,” says Jayne Gackenbach, who researches the psychology of the Internet at Grant MacEwan College in Alberta, Canada. It’s why dieters underestimate their food intake, why smokers say they go through fewer cigarettes than they do. “If people can get at some objective criteria, it would be wonderfully informative.” That’s the brilliance, she says, of new technology.

big-brotherSo that’s great and all, but all of this, at least for my mom, raises the spectre of George Orwell’s ubiquitous and all-knowing “Big Brother” from Nineteen Eighty-Four and neo-Taylorism:

I do agree that people lie, or misperceive, and that data is a truer bearer of actualities. I guess I don’t care. Story telling is an art form, too. There’s something sort of 1984ish about all this data collection – – as if the accumulated data could eventually turn us all into robotic creatures too self-programmed to suck the real juice out of life.

I certainly am sympathetic to that view, especially because the characterization of life in 1984 was so compelling and visceral. The problem is that this analogy invariably falls short, especially in other conversations when you’re talking about the likes of Google and other web-based companies.

In 1984, Big Brother symbolized the encroachment of the government on the life of the private citizen. Since the government had the ability to lock you up or take you away based on your behavior, you can imagine that this kind of dystopic vision would resonate in a time when increasingly fewer people probably understand the guts of technology and yet increasingly rely on it, shoveling more and more of their data into online repositories, or having it collected about them as they visit various websites. Never before has the human race had so much data about itself, and yet (likely) so little understanding.

The difference, as I explained to my mom, comes down to access to — and leverage over — the data:

I want to write more about this, but I don’t think 1984 is an apt analogy here. In the book, the government knows everything about the citizenry, and makes decisions using that data, towards maximizing efficiency for some unknown — or spiritually void — end. In this case, we’re flipping 1984 on its head! In this case we’re collecting the data on OURSELVES — empowering ourselves to know more than the credit card companies and banks! It’s certainly a daunting and scary thought to realize how much data OTHER people have about us — but what better way to get a leg up then to start looking at ourselves, and collecting that information for our own benefit?

I used to be pretty skeptical of all this too… but since I’ve seen the tools, and I’ve seen the value of data — I just don’t want other people to profit off of my behaviors… I want to be able to benefit from it as well — in ways that I dictate — on my terms!

In any case, Tim O’Reilly is right: data is the new Intel inside. But shouldn’t we be getting a piece of the action if we’re talking about data about us? Shouldn’t we write the book on what 2014 is going to look like so we can put the tired 1984 analogies to rest for awhile and take advantage of what is unfolding today? I’m certainly weary of large corporate behemoths usurping the role the government played in 1984, but frankly, I think we’ve gone beyond that point.

Privacy, publicity and open data

Intelligence deputy to America: Rethink privacy - CNN.com

This one should be a quickie.

A fascinating article came out of CNN today: “Intelligence deputy to America: Rethink privacy“.

This is a topic I’ve had opinions about for some time. My somewhat pessimistic view is that privacy is an illusion, and that more and more historic vestiges of so-called privacy are slipping through our fingers with the advent of increasingly ubiquitous and promiscuous technologies, the results of which are not all necessarily bad (take a look at just how captivating the Facebook Newsfeed is!).

Still, the more reading I’ve been doing lately about international issues and conflict, the more I agree with Danny Weitzner that there needs to be a robust dialogue about what it means to live in a post-privacy era, and what demands we must place on those companies, governments and institutions that store data about us, about the habits to which we’re prone and about the friends we keep. He sums up the conversation space thus:

Privacy is not lost simply because people find these services useful and start sharing location. Privacy could be lost if we don’t start to figure what the rules are for how this sort of location data can be used. We’ve got to make progress in two areas:

  • technical: how can users sharing and usage preferences be easily communicated to and acted upon by others? Suppose I share my location with a friend by don’t want my employer to know it. What happens when my friend, intentionally or accidentally shares a social location map with my employer or with the public at large? How would my friend know that this is contrary to the way I want my location data used? What sorts of technologies and standards are needed to allow location data to be freely shared while respective users usage limitation requirements?
  • legal: what sort of limits ought there to be on the use of location data?
  • can employers require employees to disclose real time location data?
  • is there any difference between real-time and historical location data traces? (I doubt it)
  • under what conditions can the government get location data?

There’s clearly a lot to think about with these new services. I hope that we can approach this from the perspective that lots of location data will being flowing around and realize the the big challenge is to develop social, technical and legal tools to be sure that it is not misused.

I want to bring some attention to his first point about the technical issues surrounding New Privacy. This is the realm where we play, and this is the realm where we have the most to offer. This is also an area that’s the most contentious and in need of aggressive policies and leadership, because the old investment model that treats silos of data as gold mines has to end.

I think Tim O’Reilly is really talking about this when he lambasts Google’s OpenSocial, proclaiming, “It’s the data, stupid!” The problem of course is what open data actually means in the context of user control and ownership, in terms of “licensing” and in terms of proliferation. These are not new problems for technologists as permissioning dates back to the earliest operating systems, but the problem becomes infinitely complex now that it’s been unbounded and non-technologists are starting to realize a) how many groups have been collecting data about them and b) how much collusion is going on to analyze said data. (Yeah, those discounts that that Safeway card gets you make a lot more money for Safeway than they save you, you better believe it!)

With Donald Kerr, the principal deputy director of national intelligence, taking an equally pessimistic (or Apocalyptic) attitude about privacy, I think there needs to be a broader, eyes-wide-open look at who has what data about whom and what they’re doing about — and perhaps more importantly — how the people about whom the data is being collected can get in on the game and get access to this data in the same way you’re guaranteed access and the ability to dispute your credit report. The same thing should be true for web services, the government and anyone else who’s been monitoring you, even if you’ve been sharing that information with them willingly. In another post, I talked about the value of this data — calling it “Data Capital“. People need to realize the massive amount of value that their data adds to the bottom line of so many major corporations (not to mention Web 2.0 startups!) and demand ongoing and persistent access to it. Hell, it might even result in better or more accurate data being stored in these mega-databases!

Regardless, when representatives from the government start to say things like:

Those two generations younger than we are have a very different idea of what is essential privacy, what they would wish to protect about their lives and affairs. And so, it’s not for us to inflict one size fits all, said Kerr, 68. Protecting anonymity isn’t a fight that can be won. Anyone that’s typed in their name on Google understands that.

Our job now is to engage in a productive debate, which focuses on privacy as a component of appropriate levels of security and public safety, Kerr said. I think all of us have to really take stock of what we already are willing to give up, in terms of anonymity, but [also] what safeguards we want in place to be sure that giving that doesn’t empty our bank account or do something equally bad elsewhere.

…you know that it’s time we started framing the debate on our own terms… thinking about what this means to the Citizen Centric Web and about how we want to become the gatekeepers for the data that is both rightfully ours and that should willfully be put into the service of our own needs and priorities.

Data capital, or: data as common tender

Legal TenderWikipedia states that … is payment that, by law, cannot be refused in settlement of a debt denominated in the same currency. , in turn, is a unit of exchange, facilitating the transfer of goods and/or services.

I was asked a question earlier today about the relative value of open services against open data served in open, non-proprietary data formats. It got me thinking whether — in the pursuit of utter openness in web services and portability in stored data — that’s the right question. Are we providing the right incentives for people and companies to go open? Is it self-fulfilling or manifest destiny to arrive at a state of universal identity and service portability leading to unfettered consumer choice? Is this how we achieve VRM nirvana, or is there something missing in our assumptions and current analysis?

Mary Jo Foley touched on this topic today in a post called Are all ‘open’ Web platforms created equal? She asks the question whether Microsoft’s PC-driven worldview can be modernized to compete in the network-centric world of Web 2.0 where no single player dominates but rather is made up of Best of Breed APIs/services from across the Web. The question she alludes to is a poignant one: even if you go open (and Microsoft has, by any estimation), will anyone care? Even if you dress up your data and jump through hoops to please developers, will they actually take advantage of what you have to offer? Or is there something else to the equation that we’re missing? Some underlying truism that is simply refracting falsely in light of the newfound sexiness of “going open”?

We often tell our clients that one of the first things you can do to “open up” is build out an API, support microformats, adopt OpenID and OAuth. But that’s just the start. That’s just good data hygiene. That’s brushing your teeth once a day. That’s making sure your teeth don’t fall out of your head.

There’s a broader method to this madness, but unfortunately, it’s a rare opportunity when we actually get beyond just brushing our teeth to really getting to sink them in, going beyond remedial steps like adding microformats to web pages to crafting just-in-time, distributed open-data-driven web applications that actually do stuff and make things better. But as I said, it’s a rare occasion for us because we’ve all been asking the wrong questions, providing the wrong incentives and designing solutions from the perspective of the silos instead of from the perspective of the people.

Let me make a point here: if your data were legal tender, you could take it anywhere with you and it couldn’t be refused if you offered to pay with it.

Last.fm top track chartsLet me break that down a bit. The way things are today, we give away our data freely and frequently, in exchange for the use of certain services. Now, in some cases, like Pandora or Last.fm, the use of the service itself is compelling and worthwhile, providing an equal or greater exchange rate for our behavior or taste data. In many other cases, we sign up for a service and provide basic demographic data without any sense of what we’re going to get in return, often leaving scraps of ourselves to fester all across the internet. Why do we value this data so little? Why do we give it away so freely?

I learned of an interesting concept today while researching legal tender called “Gresham’s Law” and commonly stated as: When there is a legal tender currency, bad money drives good money out of circulation.

Don’t worry, it took me a while to get it too. Nicolas Nelson offered the following clarification: if high quality and low quality are forced to be treated equally, then folks will keep good quality things to themselves and use low quality things to exchange for more good stuff.

Think about this in terms of data: if people are forced (or tricked) into thinking that the data that they enter into web applications is not being valued (or protected) by the sites that collect the data, well, eventually they’ll either stop entering the data (heard of social network fatigue?) or they’ll start filling them with bogus information, leading to “bad data” driving out the “good data” from the system, ultimately leading to a kind of data inflation, where suddenly the problem is no longer getting people to just sign up for your service, but to also provide good data of some value. And this is where data portability — or data as legal tender — starts to become interesting and allows us to start seeing around through the distortion of the refraction.

Think: Data as currency. Data to unlock services. Data owned, controlled, exchanged and traded by the creator of said data, instead of by the networks he has joined. For the current glut of web applications to maintain and be sustained, we must move to a system where people are in charge of their data, where they garden and maintain it, and where they are free to deposit and withdraw it from web services like people do money from banks.

If you want to think about what comes next — what the proverbial “Web 3.0” is all about — it’s not just about a bunch of web applications hooked up with protocols like OAuth that speak in microformats and other open data tongue back and forth to each other. That’s the obvious part. The change comes when a person is in control of her data, and when the services that she uses firmly believe that she not only has a right to do as she pleases with her data, but that it is in their best interest to spit her data out in whatever myriad format she demands and to whichever myriad services she wishes.

The “data web” is still a number of years off, but it is rapidly approaching. It does require that the silos popular today open up and transition from repositories to transactional enterprises. Once data becomes a kind of common tender, you no longer need to lock it; in fact, the value comes from its reuse and circulation in commerce.

To some degree, Mint and Wesabe are doing this retroactively for your banking records, allowing you to add “data value” to the your monetary transactions. Next up Google and Microsoft will do this for your health records. For a more generic example, Swivel is doing this today for the OECD but has a private edition coming soon. Slife/Slifeshare, i use this and RescueTime do this for your use of desktop apps.

This isn’t just attention data that I’m talking about (though the recent announcements in support of APML are certainly positive). This goes beyond monitoring what you’re doing and how you’re spending your time. I’m talking about access to all the data that it would take to reconstitute your entire digital existence. And then I’m talking about the ability to slice, dice, and splice it however you like, in pursuit of whatever ends you choose. Or choose not to.


I’ll point to a few references that influenced my thinking: Social Capital To Show Its Worth at This Week’s Web 2.0 Summit, What is Web 2.0?, Tangled Up in the Future – Lessig and Lietaer, , Intentional Economics Day 1, Day 2, Day 3.

Civil libertarians should get hip to personal data harvesting

Despite my tonqe-in-cheek title, I wanted to take a moment to respond to this article, because, though it is likely well-intentioned and in fact rather truthful, it glosses over a more important discussion that should be going on.

Despite my tonqe-in-cheek title, I wanted to take a moment to respond to this article, because, though it is likely well-intentioned and in fact rather truthful, it glosses over a more important discussion that should be going on.

Whether anonymous Internet usage will ever exist is not important. What is important is that companies become aware that Internet activity is easy to monitor from a variety of locations, even when data encryption is in use.

In context:

There are several jokes and cartoons out there that play on the idea of the “anonymous” Web, an Internet where you can be whatever and whoever you want. Most mainstream computer users willingly buy into this concept, deceived by the ability to adopt cryptic usernames and e-mail addresses.

Anonymous Internet usage is an appealing concept to many people, but whether it’s actually possible is a different matter. Generally speaking, it’s relatively simple to intercept–and at the least, monitor–the transmission of digital information.

Every time you transmit data from a computer to or from somewhere else using the Internet, literally dozens of places can exist that are monitoring the transmission. Clear-text protocols offer no built-in protection from eavesdropping. In addition, the transmission leaves traces of “evidence” on your computer–regardless of if you use data encryption or one of those software “evidence eliminator” packages.

An anonymous Internet, if such a thing existed, would be immune to eavesdropping entirely, and it would have no record of a communication ever existing. Anonymous Internet usage is like a “cash” form of communication: It would leave no traceable evidence.

In certain countries, the government restricts and/or controls Internet use. For example, China has one of the most extensive Web proxy server and monitoring capabilities in the world, aptly dubbed the “Great Firewall of China.”

The Chinese government controls, monitors, and censors Internet access at will. Dissidents and those opposed to the Chinese government, including other governments, constantly try to bypass the censors, but the Great Firewall soon discovers and blocks these noncensored “anonymous” proxy servers.

So it’s understandable why some people see the benefits in leaving no traces of any communication, especially when there’s a fear of reprisal from a government or other organizations. It would be as if the transmission never happened. There’s no record of it ever occurring, and therefore it doesn’t exist.

But, however appealing this concept may be to some, the fact remains that it isn’t realistic. Companies and individuals alike need to be aware that there really is no such thing as anonymous Internet usage. If someone wants to determine what a computer is doing on the Internet, there’s always a trail to follow.

Computer users leave traces of information with almost every data transmission. In fact, an entire computer subindustry has evolved to deal with removing these traces of information, but these companies can only remove what’s on a computer. There are so many other points that can record the “digital footprints” of Internet activity that it’s impossible to completely guarantee anonymity.

Whether anonymous Internet usage will ever exist is not important. What is important is that companies become aware that Internet activity is easy to monitor from a variety of locations, even when data encryption is in use.

Jonathan Yarden is the senior UNIX system administrator, network security manager, and senior software architect for a regional ISP.

If we take the author’s premise as a given (that anonymous internet usage will never ever exist), then the important discussion to have is what information should be collected about you, and if collected, who has control over it and what can you, as the source of that information, do to control its use, administration and distribution?

If one persists with a blanket notion that personal information collected about one’s behavior on the internet is bad, the future will be very difficult to cope with. The fact is that more and more companies, big and small, are amassing huge databases of information about people. Frankly, if you’re really concerned about this kind of thing, you should stop using your ATM and credit cards because as it is now, it’s easier to track your behavior through your purchases than through your web browser.

But that is going to change. And the dangers are such that, unless a cogent counter-argument is made that fairly deals with the benefits that come with the harvesting of this data, it will be increasingly difficult to take back control or change corporate policies once they’re instated (as with a civil liberty lost is nearly impossible to get back).

So what am I driving at? Well, I think that a more realistic and proactive attitude is needed from the civil libertarian camp that shows its understanding of the value in this kind of data. I also think that a more nuanced attitude towards privacy is desperately needed because all or nothing is not going to cut it as technology gets simpler and better at collecting information about you. I also believe that civil libertarians can benefit from this kind of data collection in ways that I don’t think have been realized. Once we start to see data collection as a strategic tool rather than as an invasion of our private space, we may indeed become powerful enough to take back control over our data.