privacy – Factory Joe

BBC Digital Planet podcast featuring OpenID

Update: The BBC has posted a write-up of the report called Easy login plans gather pace.

I was interviewed by Gareth Mitchell last week about OpenID for the BBC’s Digital Planet podcast.

Our conversation lasted about 10 minutes — of which only about two minutes survived (mirrored here as they currently do not keep an archive of previous episodes).

It was a familiar conversation for me, since the primary concerns Gareth expressed had to do with privacy, identity and the notion that “someone else” could “own” another’s identity on the web. His premise sounded familiar: “Won’t OpenID make my identity more hackable?”

The answer, of course, isn’t that straight-forward, and depends on a lot of mitigating factors. However, the fundamental take-away is that OpenID really is no more insecure than email, and even then, provides a future-facing design that that leads to many kinds of protection that email, in practice, does not.

. . .

I’ve also noticed over the past several years that Europeans harbor much greater sensitivities to privacy issues while Americans tend to concentrate on matters concerning “property” (physical, personal and intellectual). This is evidenced by yesterday’s blow up around Facebook’s changes to their Terms of Service. On the one hand, there’s this weird American outcry against Facebook owning your data (in common, at least) forever. From the European side, it seems like the concern is centered more around what the changes mean to one’s privacy, rather than whether Facebook can perpetually “make money” off your stuff.

I bring this up because it’s immensely relevant with regards to the conversation I had with Gareth (given that he’s based in the UK).

With the current case, I’m sympathetic to Facebook, because I know that this will be the year that people have their “mindframes” bent around new conceptions of personal privacy and control and ownership of data. I believe (as Facebook purports to) that people’s desire to share will overcome their desire for control over their personal data, and that they will gradually realize that sharing will require letting go. It is this reality — the reality of networked data in the cloud — that necessitated Facebook’s change to their terms of service — not some nefarious desire to steal your first born (or your data).

In other words, the conditions and kind of thinking that lead to the backlash against Plaxo known as Scoblegate will cease to exist in the future. Facebook’s change is merely a recognition of this new environment.

It remains unclear to me whether the pundits in this space realize that this shift will occur, and will occur naturally (as it has already begun — consider the integration of Facebook and Flickr in iPhoto ’09), or whether they just want to scream and holler when they notice something that seems astray.

. . .

Last December, I spent time talking to Boaz Sender of HTML Times at length about several of these topics (including discussing the intellectual property issues surrounding many of the technologies that are helping to ensure that the web remain an open playing field) in an interview about Identity in the Network. In juxtaposition to my interview with the BBC, I think this interview gets into some of the deeper issues at work here that must also be considered when it comes to the future of online identity, privacy and data control and (co)-ownership.

Responding to criticisms about OpenID: convenience, security and personal agency

Chris Dracket responded to one of my tweets the other day, saying that “OpenID should be dead… it’s way over-rated”. I’ve of course heard plenty of criticisms of OpenID, but hadn’t really heard that it was “overrated” (which implies that people have a higher opinion of OpenID than it merits).

Intrigued, I replied, asking him to elaborate, which he did via email:

I don’t know if overrated is the right word.. but I just don’t see OpenID ever catching on.. I think the main reason is that its too complex / scary of an idea for the normal user to understand and accept.

In my opinion the only way to make OpenID seem safe (for people who are worried about privacy online) is if the user has full control over the OpenID provider. While this is possible for people like you and me, my mom is never going to get to this point, and if she wants to use OpenID she is going to have to trust her sensitive data to AOL, MS, Google, etc. I think that people see giving this much “power” to a single provider as scary.

Lastly I think that OpenID is too complex to properly explain to someone and get them to use it. People understand usernames and passwords right away, and even OAuth, but OpenID in itself I think is too hard to grasp. I dunno, just a quick opinion.. I think there is a reason that we don’t have a single key on our key rings that opens our house, car, office and mailbox, not that that is a perfect/accurate analogy, but its close to how some people I’ve talked to think OpenID works.

Rather than respond privately, I asked whether it’d be okay if I posted his follow-up and replied on my blog. He obliged.

To summarize my interpretation of his points: OpenID is too complex and scary, potentially too insecure, and too confined to the hands of a few companies.

The summary of my rebuttals:

OpenID will become a necessary convenience in cloud computing.
OpenID can be incrementally secured and, combined with OAuth, helps to defeat the password-anti-pattern.
OpenID is about more than just accounts and fewer passwords — it’s a building block for online identity, and therefore personal agency for web citizens.

Convenience

OpenID should not be judged by today’s technological environment alone, but rather should be considered in the context of the migration to “cloud computing”, where people no longer access files on their local harddrive, but increasingly need to access data stored by web services.

All early technologies face criticism based on current trends and dominant behaviors, and OpenID is no different. At one time, people didn’t grok sending email between different services (in fact, you couldn’t). At one time, people didn’t grok IMing their AOL buddies using Google Talk (in fact, you couldn’t). At one time, you had one computer and your browser stored all of your passwords on the client-side (this is basically where we are today) and at one time, people accessed their photos, videos, and documents locally on their desktop (as is still the case for most people).

Cloud computing represents a shift in how people access and share data. Already, people rely less and less on physical media to store data and more and more on internet-based web services.

As a consequence, people will need a mechanism for referencing their data and services as convenient as the c: prompt. An OpenID, therefore, should become the referent people use to indicate where their data is “stored”.

An OpenID is not just about identification and blog comments; nor is it about reducing the number of passwords you have (that’s a by-product of user-centered design). Consider:

if I ask you where your photos are, you could say Flickr, and then prove it, because Flickr supports OpenID.
if I ask you where friends are, you might say MySpace, and then prove it, because MySpace will support OpenID.
if you host your own blog or website, you will be able to provide your address and then prove it, because you are OpenID-enabled.

The long-term benefit of OpenID is being able to refer to all the facets of your online identity and data sources with one handy — ideally memorable — web-friendly identifier. Rather than relying on my email addresses alone to identify myself, I would use my OpenIDs, and link to all the things that represent me online: from my resume to my photos to my current projects to my friends, web services and so on.

The big picture of cloud computing points to OpenIDs simplifying how people access, share and connect data to people and services.

Security

I’ve heard many people complain that if your OpenID gets hacked, then you’re screwed. They claim that it’s like putting all your eggs in one basket.

But that’s really no different than your email account getting hacked. Since your email address is used to reset your password, any or all of your accounts could have their passwords reset and changed; worse, the password and the account email address could be changed, locking you out completely.

At minimum, OpenID is no worse than the status quo.

At best, combined with OAuth, third-parties never need your account password, defeating the password anti-pattern and providing a more secure way to share your data.

Furthermore, because securing your OpenID is outside of the purview of the spec, you can choose an OpenID provider (or set up your own) with a level of security that fits your needs. So while many OpenID providers currently stick with the traditional username and password combo, others offer more sophisticated approaches, from client-side certificates and hardware keys to biometrics and image-based password shields (as in the case of my employer, Vidoop).

One added benefit of OpenID is the ability to audit and manage access to your account, just as you do with a credit card account. This means that you have a record of every time someone (hopefully you!) signs in to one of your accounts with your OpenID, as well as how frequently sign-ins occur, from which IP addresses and on what devices. From a security perspective, this is a major advantage over basic usernames and passwords, as collecting this information from each service provider would prove inconvenient and time-consuming, if even possible.

Given this benefit, it’s worth considering that identity technologies
are being pushed on the government. If you’re worried about putting all your eggs in one basket, would you think differently if the government owned that basket?

OpenID won’t force anyone to change their current behavior, certainly not right away. But wouldn’t it be better to have the option to choose an alternative way to secure your accounts if you wanted it? OpenID starts with the status quo and, coupled with OAuth, provides an opportunity to make things better.

We’re not going to make online computing more secure overnight, but it seems like a prudent place to start.

Personal agency for web citizens

Looking over the landscape of existing social software applications, I see very few (if any) that could not be enhanced by OpenID support.

OpenID is a cornerstone technology of the emerging social web, and adds value anywhere users have profiles, accounts or need access to remote data.

Historically, we’ve seen similar attempts at providing a universal login account. Microsoft even got the name right with “Passport”, but screwed up the network model. Any identity system, if it’s going to succeed on the open web, needs to be designed with user choice at its core, in order to facilitate marketplace competition. A single-origin federated identity network will always fail on the internet (as Joseph Smarr and John McCrea like to say of Facebook Connect: We’ve seen this movie before).

As such, selecting an identity provider should not be relegated to a default choice. Where you come from (what I call provenance) has meaning.

For example, if you connect to a service using your Facebook account, the relying party can presume that the profile information that Facebook supplies will be authentic, since Facebook works hard to ferret out fake accounts from its network (unlike MySpace). Similarly, signing in with a Google Account provides a verified email address.

Just like the issuing country of your passport may say something about you to the immigration official reviewing your documents, the OpenID provider that you use may also say something about you to the relying party that you’re signing in to. It is therefore critical that people make an informed choice about who provides (and protects) their identity online, and that the enabling technologies are built with the option for individuals to vouch for themselves.

In the network model where anyone can host their own independent OpenID (just like anyone can set up their own email server), competition may thrive. Where competition thrives, an ecosystem may arise, developed under the rubric of market dynamics and Darwinian survivalism. And in this model, the individual is at the center, rather than the services he or she uses.

This the citizen-centric model of the web, and each of us are sovereign citizens of the web. Since I define and host my own identity, I do not need to worry about services like Pownce being sold or I Want Sandy users left wanting. I have choice, I have bargaining power, and I have agency, and this is critical to the viability of the social web at scale.

Final words

OpenID is not overrated, it’s just early. We’re just getting started with writing the rules of social software on the web, and we’ve got a lot of bad habits to correct.

As cloud computing goes mainstream (evidenced in part by the growing popularity of Netbooks this holiday season!), we’re going to need a consumer-facing technology and brand like OpenID to help unify this new, more virtualized world, in order to make it universally accessible.

Fortunately, as we stack more and more technologies and services on our OpenIDs, we can independently innovate the security layer, developing increasingly sophisticated solutions as necessary to make sure that only the right people have access to our accounts and our data.

It is with with these changes that we must evaluate OpenID — not as a technology for 2008’s problems — but as a formative building block for 2009 and the future of the social web.

After 1984

iTunes 8 has added a new feature called “Genius” that harnesses the collective behavior of iTunes Music Store shoppers to generate “perfect” playlists.

Had an interesting email exchange with my mom earlier today about Monica Hesse’s story Bytes of Life. The crux of the story is that more and more people are self-monitoring and collecting data about themselves, in many cases, because, well, it’s gotten so much easier, so, why not?

Well, yes, it is easier, but just because it is easier, doesn’t automatically mean that one should do it, so let’s look at this a little more deeply.

First, my mom asked about the amount of effort involved in tracking all this data:

I still have a hard time even considering all that time and effort spent in detailing every moment of one’s life, and then the other side of it which is that it all has to be read and processed in order to “know oneself”. I think I like the Jon Cabot Zinn philosophy better — just BE in the moment, being mindful of each second doesn’t require one to log or blog it, I don’t think. Just BE in it.

Monica didn’t really touch on too many tools that we use to self-monitor. It’s true that, depending on the kind of data we’re collecting, the effort will vary. But so will the benefits.

If you take a look at MyMileMarker’s iPhone interface, you’ll see how quick and painless it is to record this information. Why bother? Well, for one thing, over time you get to see not only how much fuel you’re consuming, but how much it’s going to cost you to keep running your car in the future:

Without collecting this data, you might guess at your MPG, or take the manufacturer’s rating as given, but when you record what actually is happening, you can prove to yourself whether filling up your tires really does save you money (or the planet).

On the topic of the environment, recording my trips on Dopplr gives me an actual view of my carbon footprint (pretty damning, indeed):

As my mom pointed out, perhaps having access to this data will encourage me to cut back excess travel — or to consolidate my trips. Ross Mayfield suggests that he could potentially quit smoking if his habit were made more plainly visible to him.

What’s also interesting is how passive monitors, or semi-passive monitoring tools, can also inform, educate or predict — and on this point I’m thinking of Last.fm where of course my music taste is aggregated, or location-based sites like Brightkite, where my locative behavior is tracked (albeit, manually — though Fire Eagle + Spot changes that).

My mom’s other point about the ability to just BE in the moment is also important — because self-tracking should ideally be non-invasive. In other words, it shouldn’t be the tracking that changes your behavior, but your analysis and reflection after the fact.

One of the stronger points I might make about this is that data, especially when collected regularly and when the right indicators are recorded, you can reduce a great amount of distortion from your self-serving biases. Monica writes:

“We all have the tendency to see our behaviors in a little bit of a halo,” says Jayne Gackenbach, who researches the psychology of the Internet at Grant MacEwan College in Alberta, Canada. It’s why dieters underestimate their food intake, why smokers say they go through fewer cigarettes than they do. “If people can get at some objective criteria, it would be wonderfully informative.” That’s the brilliance, she says, of new technology.

big-brother So that’s great and all, but all of this, at least for my mom, raises the spectre of George Orwell’s ubiquitous and all-knowing “Big Brother” from Nineteen Eighty-Four and neo-Taylorism:

I do agree that people lie, or misperceive, and that data is a truer bearer of actualities. I guess I don’t care. Story telling is an art form, too. There’s something sort of 1984ish about all this data collection – – as if the accumulated data could eventually turn us all into robotic creatures too self-programmed to suck the real juice out of life.

I certainly am sympathetic to that view, especially because the characterization of life in 1984 was so compelling and visceral. The problem is that this analogy invariably falls short, especially in other conversations when you’re talking about the likes of Google and other web-based companies.

In 1984, Big Brother symbolized the encroachment of the government on the life of the private citizen. Since the government had the ability to lock you up or take you away based on your behavior, you can imagine that this kind of dystopic vision would resonate in a time when increasingly fewer people probably understand the guts of technology and yet increasingly rely on it, shoveling more and more of their data into online repositories, or having it collected about them as they visit various websites. Never before has the human race had so much data about itself, and yet (likely) so little understanding.

The difference, as I explained to my mom, comes down to access to — and leverage over — the data:

I want to write more about this, but I don’t think 1984 is an apt analogy here. In the book, the government knows everything about the citizenry, and makes decisions using that data, towards maximizing efficiency for some unknown — or spiritually void — end. In this case, we’re flipping 1984 on its head! In this case we’re collecting the data on OURSELVES — empowering ourselves to know more than the credit card companies and banks! It’s certainly a daunting and scary thought to realize how much data OTHER people have about us — but what better way to get a leg up then to start looking at ourselves, and collecting that information for our own benefit?

I used to be pretty skeptical of all this too… but since I’ve seen the tools, and I’ve seen the value of data — I just don’t want other people to profit off of my behaviors… I want to be able to benefit from it as well — in ways that I dictate — on my terms!

In any case, Tim O’Reilly is right: data is the new Intel inside. But shouldn’t we be getting a piece of the action if we’re talking about data about us? Shouldn’t we write the book on what 2014 is going to look like so we can put the tired 1984 analogies to rest for awhile and take advantage of what is unfolding today? I’m certainly weary of large corporate behemoths usurping the role the government played in 1984, but frankly, I think we’ve gone beyond that point.

Musings on Chrome, the rebirth of the location bar and privacy in the cloud

Imagine a browser of the web, by the web, and for the web. Not simply a thick client application that simply opens documents with the http:// protocol instead of file://, but one that runs web applications (efficiently!), that plays the web, that connects people across the boundaries of the silos and gives them local-like access to remote data.

It might not be Chrome, but it’s a damn near approximation, given what people are used to today.

Take a step back. You can see the relics of desktop computing in our applications’ file menus… and we can intuit the assumptions that the original designer must have made about the user, her context and the interaction expectations she brought with her:

This is not a start menu or a Dock. This is a document-driven menubar that’s barely changed since Netscape Communicator.

Indeed, the browser is a funny thing, because it’s really just a wrapper for someone else’s content or someone’s else’s application. That’s why it’s not about “features“. It’s all about which features, especially for developers.

It’s a hugely powerful place to insert oneself: between a person and the vast expanse that is the Open Web. Better yet: to be the conduit through which anyone projects herself on to the web, or reaches into the digital void to do something.

So if you were going to design a new browser, how would you handle the enormity of that responsibility? How would you seize the monument of that opportunity and create something great?

Well, for starters, you’d probably want to think about that first run experience — what it’s like to get behind the wheel for the very time with a newly minted driver’s permit — with the daunting realization that you can now go anywhere you please…! Which is of course awesome, until you realize that you have no idea where to go first!

Historically, the solution has been to flip-flop between portals and search boxes, and if we’ve learned anything from Google’s shockingly austere homepage, it comes down to recognizing that the first step of getting somewhere is expressing some notion of where you want to go:

The problem is that the location field has, up until recently, been fairly inert and useless. With Spotlight-influenced interfaces creeping into the browser (like David Watanabe’s recently acquired Inquisitor Safari plugin — now powered by Yahoo! Search BOSS — or the flyout in Flock that was inspired by it) it’s clear that browsers can and should provide more direction and assistance to get people going. Not everyone’s got a penchant for remembering URLs (or RFCs) like Tantek’s.

This kind of predictive interface, however, has only slowly made its way into the location bar, like fish being washed ashore and gradually sprouting legs. Eventually they’ll learn to walk and breath normally, but until then, things might look a little awkward. But yes, dear reader, things do change.

So you can imagine, having recognized this trend, Google went ahead and combined the search box and the location field in Chrome and is now pushing the location bar as the starting place, as well as where to do your searching:

This change to such a fundamental piece of real estate in the browser has profound consequences on both the typical use of the browser as well as security models that treat the visibility of the URL bar as sacrosanct (read: phishing):

The URL bar is dead! Long live the URL bar!

While cats like us know intuitively how to use the location bar in combination with URLs to gets us to where to we want to go, that practice is now outmoded. Instead we type anything into the “box” and have some likely chance that we’re going to end up close to something interesting. Feeling lucky?

But there’s something else behind all this that I think is super important to realize… and that’s that our fundamental notions and expectations of privacy on the web have to change or will be changed for us. Either we do without tools that augment our cognitive faculties or we embrace them, and in so doing, shim open a window on our behaviors and our habits so that computers, computing environments and web service agents can become more predictive and responsive to them, and in so doing, serve us better. So it goes.

Underlying these changes are new legal concepts and challenges, spelled out in Google’s updated EULA and Privacy Policy… heretofore places where few feared to go, least of all browser manufacturers:

5. Use of the Services by you

5.1 In order to access certain Services, you may be required to provide information about yourself (such as identification or contact details) as part of the registration process for the Service, or as part of your continued use of the Services. You agree that any registration information you give to Google will always be accurate, correct and up to date.

. . .

12. Software updates

12.1 The Software which you use may automatically download and install updates from time to time from Google. These updates are designed to improve, enhance and further develop the Services and may take the form of bug fixes, enhanced functions, new software modules and completely new versions. You agree to receive such updates (and permit Google to deliver these to you) as part of your use of the Services.

It’s not that any of this is unexpected or Draconian: it is what it is, if it weren’t like this already.

Each of us will eventually need to choose a data brokers or two in the future and agree to similar terms and conditions, just like we’ve done with banks and credit card providers; and if we haven’t already, just as we have as we’ve done in embracing webmail.

Hopefully visibility into Chrome’s source code will help keep things honest, and also provide the means to excise those features, or to redirect them to brokers or service providers of our choosing, but it’s inevitable that effective cloud computing will increasingly require more data from and about us than we’ve previously felt comfortable giving. And the crazy thing is that a great number of us (yes, including me!) will give it. Willingly. And eagerly.

But think one more second about the ramifications (see Matt Cutts) of Section 12 up there about Software Updates: by using Chrome, you agree to allow Google to update the browser. That’s it: end of story. You want to turn it off? Disconnect from the web… in the process, rendering Chrome nothing more than, well, chrome (pun intended).

Welcome to cloud computing. The future has arrived and is arriving.

Thoughts on dynamic privacy

A highly touted aspect of Facebook Connect is the notion of “dynamic privacy“:

As a user moves around the open Web, their privacy settings will follow, ensuring that users’ information and privacy rules are always up-to-date. For example, if a user changes their profile picture, or removes a friend connection, this will be automatically updated in the external website.

Over the course of the Graphing Social Patterns East conference here in DC, Dave Morin and others from Facebook’s Developer Platform have made many a reference to this scheme but have provided frustratingly scant detail on how it will actually work.

In a conversation with Brian Oberkirch and David Recordon, it dawned on me that the pieces for Dynamic Privacy are already in place and that, to some degree, it seems that it’s really just a matter of figuring out how to effectively enforce policy across distributed systems in order to meet user expectations.

MySpace actually has made similar announcements in their Data Availability approach, and if you read carefully, you can spot the fundamental rift between the OpenSocial and Facebook platforms:

Additionally, rather than updating information across the Web (e.g. default photo, favorite movies or music) for each site where a user spends time, now a user can update their profile in one place and dynamically share that information with the other sites they care about. MySpace will be rolling out a centralized location within the site that allows users to manage how their content and data is made available to third party sites they have chosen to engage with.

Indeed, Recordon wrote about this on O’Reilly Radar last month (emphasis original):

He explained that MySpace said that due to their terms of service the participating sites (e.g. Twitter) would not be allowed to cache or store any of the profile information. In my mind this led to the Data Availability API being structured in one of two ways: 1) on each page load Twitter makes a request to MySpace fetching the protected profile information via OAuth to then display on their site or 2) Twitter includes JavaScript which the browser then uses to fill in the corresponding profile information when it renders the page. Either case is not an example of data portability no matter how you define the term!

Embedding vs sharing

So the major difference here is in the mechanism of data delivery and how the information is “leased” or “tethered” to the original source, such that, as Morin said, “when a user deletes an item on Facebook, it gets deleted everywhere else.”

The approach taken by Google Gadgets, and hence OpenSocial, for the most part, has been to tether data back to the source via embedded iframes. This means that if someone deletes or changes a social object, it will be deleted or changed across OpenSocial containers, though they won’t even notice the difference since they never had access to the data to begin with.

The approach that seems likely from Facebook can be intuited by scouring their developer’s terms of service (emphasis added):

You can only cache user information for up to 24 hours to assist with performance.

…

2.A.4) Except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any Facebook Properties not explicitly identified as being storable indefinitely in the Facebook Platform Documentation within 24 hours after the time at which you obtained the data, or such other time as Facebook may specify to you from time to time;

2.A.5) You may store and use indefinitely any Facebook Properties that are explicitly identified as being storable indefinitely in the Facebook Platform Documentation; provided, however, that except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any such Facebook Properties: (a) if Facebook ceases to explicitly identify the same as being storable indefinitely in the Facebook Platform Documentation; (b) upon notice from Facebook (including if we notify you that a particular Facebook User has requested that their information be made inaccessible to that Facebook Platform Application); or (c) upon any termination of this Agreement or of your use of or participation in Facebook Platform;

2.A.6) You may retain copies of Exportable Facebook Properties for such period of time (if any) as the Applicable Facebook User for such Exportable Facebook Properties may approve, if (and only if) such Applicable Facebook User expressly approves your doing so pursuant to an affirmative “opt-in” after receiving a prominent disclosure of (a) the uses you intend to make of such Exportable Facebook Properties, (b) the duration for which you will retain copies of such Exportable Facebook Properties and (c) any terms and conditions governing your use of such Exportable Facebook Properties (a “Full Disclosure Opt-In”);

2.B.8) Notwithstanding the provisions of Sections 2.B.1, 2.B.2 and 2.B.5 above, if (and only if) the Applicable Facebook User for any Exportable Facebook Properties expressly approves your doing so pursuant to a Full Disclosure Opt-In, you may additionally display, provide, edit, modify, sell, resell, lease, redistribute, license, sublicense or transfer such Exportable Facebook Properties in such manner as, and only to the extent that, such Applicable Facebook User may approve.

This is further expanded in the platform documentation on Storable Information:

Per the Developer Terms of Service, you may not cache any user data for more than 24 hours, with the exception of information that is explicitly “storable indefinitely.” Only the following parameters are storable indefinitely; all other information must be requested from Facebook each time.

…

The storable IDs enable you to keep unique identifiers for Facebook elements that correspond to data gathered by your application. For instance, if you collected information about a user’s musical tastes, you could associate that data with a user’s Facebook uid.

However, note that you cannot store any relations between these IDs, such as whether a user is attending an event. The only exception is the user-to-network relation.

I imagine that Facebook Connect will work by “leasing” or “sharing” information to remote sites and require, through agreement and compliance with their terms, to check in periodically (or to receive directives through a push mechanism) for changes to data, and then to flush caches of stored data every 24 hours or less.

In either model there is still a central provider and store of the data, but the question for implementation really comes down to whether a remote site ever has direct access to the data, and if so, how long it is allowed to store it.

Of note is the OpenSocial RESTful API, which provides a web-friendly mechanism for addressing and defining resources. Recordon pointed out to me that this API affords all the mechanisms necessary to implement the “leased” model of data access (rather than the embedded model), but leaves it up to the OpenSocial applications and containers to set and enforce their own data access policies.

…Which is a world of a difference from Facebook’s approach to date, for which there is neither code nor a spec nor an open discussion about how they’re thinking through the tenuous issues imbued in making decisions around data access, data control, “tethering” and “portability“. While folks like Plaxo and Yahoo are actually shipping code, Facebook is still posturing, assuring us to “wait and see”. With something so central and so important, it’s disheartening that Facebook’s “Open” strategy is anything but open, and everything less than transparent.

Data portability and thinking ahead to 2008

So-called data portability and data ownership is a hot topic of late, and with good reason: with all the talk of the opening of social networking sites and the loss of presumed privacy, there’s been a commensurate acknowledgment that the value is not in the portability of widgets (via OpenSocial et al) but instead, (as Tim O’Reilly eloquently put it) it’s the data, stupid!

Now, Doc’s call for action is well timed, as we near the close of 2007 and set our sights on 2008.

Earlier this year, ZDNet predicted that 2007 would be the year of OpenID, and for all intents and purposes, it has been, if only in that it put the concept of non-siloed user accounts on the map. We have a long way to go, to be sure, but with OpenID 2.0 around the corner, it’s only a matter of time before building user prisons goes out of fashion and building OpenID-based citizen-centric services becomes the norm.

Inspired by the fact that even Mitchell Baker of Mozilla is talking about Firefox’s role in the issue of data ownership (In 2008 … We find new ways to give people greater control over their online lives — access to data, control of data…), this is going to be issue that most defines 2008 — or at least the early part of the year. And frankly, we’re already off to a good start. So here are the things that I think fit into this picture and what needs to happen to push progress on this central issue:

Economic incentives and VRM: Doc is right to phrase the debate in terms of VRM. When it comes down to it, nothing’s going to change unless 1) customers refuse to play along anymore and demand change and 2) there’s increased economic benefit to companies that give back control to their customers versus those companies that continue to either restrict or abuse/sell out their customers’ data. Currently, this is a consumer rights battle, but since it’s being fought largely in Silicon Valley where the issues are understood technically while valuations are tied to the attractiveness a platform has to advertisers, consumers are at a great disadvantage since they can’t make a compelling economic case. And given that the government and most bureaucracy is fulled up with stakeholders who are hungry for more and more accurate and technologically-distilled demographic data, it’s unlikely that we could force the issue through the legal system, as has been approximated in places like Germany and the UK.
Reframing of privacy and access permissions: I’ve harped on this for awhile, but historic notions of privacy have been out-moded by modern realities. Those who do expect complete and utter control need to take a look at the up and coming generation and realize that, while it’s true that they, on a whole, don’t appreciate the value and sacredness of their privacy, and that they’re certainly more willing to exchange it for access to services or to simply dispense with it altogether and face the consequences later (eavesdroppers be damned!), their apathy indicates the uphill struggle we face in credibly making our case.
Times have changed. Privacy and our notions of it must adapt too. And that starts by developing the language to discuss these matters in a way that’s obvious and salient to those who are concerned about these issues. Simply demanding the protection of one’s privacy is now a hollow and unrealistic demand; now we should be talking about access, about permissions, about provenance, about review and about federation and delegation. It’s not until we deepen our understanding of the facets of identity, and of personal data and of personal profiles, tastestreams and newsfeeds that can begin to make headway on exploring the economic aspects of customer data and who should control it, have access to it, can create, read, update, and delete
Data portability and open/non-proprietary web standards and protocols: Since this is an area I’ve been involved in and am passionate about, I have some specific thoughts on this. For one thing, the technologies that I obsess over have data portability at their center: OpenID for identification and “hanging” data, microformats for marking it up, and OAuth for provisioning controlled access to said data… The development, adoption and implementation of this breed of technologies is paramount to demonstrating both the potential and need for a re-orientation of the way web services are built and deployed today. Without the deployment of these technologies and their cousins, we risk web-wide lock-in to vender-specific solutions like Facebook’s FBML or Google’s OpenSocial, greatly inhibiting the potential for market growth and innovation. And it’s not so much that these technologies are necessarily bad in and of themselves, but that they represent a grave shift away from the slower but less commercially-driven development of open and public-domained web standards. Consider me the frog in the luke warm water recognizing that things are starting to get warm in here.
Citizen-centric web services: The result of progress in these three topics is what I’m calling the “citizen-centric web”, where a citizen is anyone who inhabits the web, in some form or another. Citizen-centric web services, are, of course, services provided to those inhabitants. This notion is what I think is, and should, going to drive much of thinking in 2008 about how to build better citizen-centric web services, where individuals identify themselves to services, rather than recreating themselves and their so-called social-graph; where they can push and pull their data at their whim and fancy, and where such data is essentially “leased” out to various service providers on an as-needed basis, rather than on a once-and-for-all status using OAuth tokens and proxied delegation to trusted data providers; where citizens control not only who can contact them, but are able to express, in portable terms, a list of people or companies who cannot contact them or pitch ads to them, anywhere; where citizens are able to audit a comprehensive list of profile and behavior data that any company has on file about them and to be able to correct, edit or revoke that data; where “permission” has a universal, citizen-positive definition; where companies have to agree to a Creative Commons-style Terms of Access and Stewardship before being able to even look at a customer’s personal data; and that, perhaps most import to making all this happen, sound business models are developed that actually work with this new orientation, rather than in spite of it.

So, in grandiose terms I suppose, these are the issues that I’m pondering as 2008 approaches and as I ready myself for the challenges and battles that lie ahead. I think we’re making considerable progress on the technology side of things, though there’s always more to do. I think we need to make more progress on the language, economic, business and framing fronts, though. But, we’re making progress, and thankfully we’re having these conversations now and developing real solutions that will result in a more citizen-centric reality in the not too distant future.

If you’re interested in discussing these topics in depth, make it to the Internet Identity Workshop next week, where these topics are going to be front and center in what should be a pretty excellent meeting of the minds on these and related topics.

Privacy, publicity and open data

This one should be a quickie.

A fascinating article came out of CNN today: “Intelligence deputy to America: Rethink privacy“.

This is a topic I’ve had opinions about for some time. My somewhat pessimistic view is that privacy is an illusion, and that more and more historic vestiges of so-called privacy are slipping through our fingers with the advent of increasingly ubiquitous and promiscuous technologies, the results of which are not all necessarily bad (take a look at just how captivating the Facebook Newsfeed is!).

Still, the more reading I’ve been doing lately about international issues and conflict, the more I agree with Danny Weitzner that there needs to be a robust dialogue about what it means to live in a post-privacy era, and what demands we must place on those companies, governments and institutions that store data about us, about the habits to which we’re prone and about the friends we keep. He sums up the conversation space thus:

Privacy is not lost simply because people find these services useful and start sharing location. Privacy could be lost if we don’t start to figure what the rules are for how this sort of location data can be used. We’ve got to make progress in two areas:

technical: how can users sharing and usage preferences be easily communicated to and acted upon by others? Suppose I share my location with a friend by don’t want my employer to know it. What happens when my friend, intentionally or accidentally shares a social location map with my employer or with the public at large? How would my friend know that this is contrary to the way I want my location data used? What sorts of technologies and standards are needed to allow location data to be freely shared while respective users usage limitation requirements?

legal: what sort of limits ought there to be on the use of location data?

can employers require employees to disclose real time location data?

is there any difference between real-time and historical location data traces? (I doubt it)

under what conditions can the government get location data?

There’s clearly a lot to think about with these new services. I hope that we can approach this from the perspective that lots of location data will being flowing around and realize the the big challenge is to develop social, technical and legal tools to be sure that it is not misused.

I want to bring some attention to his first point about the technical issues surrounding New Privacy. This is the realm where we play, and this is the realm where we have the most to offer. This is also an area that’s the most contentious and in need of aggressive policies and leadership, because the old investment model that treats silos of data as gold mines has to end.

I think Tim O’Reilly is really talking about this when he lambasts Google’s OpenSocial, proclaiming, “It’s the data, stupid!” The problem of course is what open data actually means in the context of user control and ownership, in terms of “licensing” and in terms of proliferation. These are not new problems for technologists as permissioning dates back to the earliest operating systems, but the problem becomes infinitely complex now that it’s been unbounded and non-technologists are starting to realize a) how many groups have been collecting data about them and b) how much collusion is going on to analyze said data. (Yeah, those discounts that that Safeway card gets you make a lot more money for Safeway than they save you, you better believe it!)

With Donald Kerr, the principal deputy director of national intelligence, taking an equally pessimistic (or Apocalyptic) attitude about privacy, I think there needs to be a broader, eyes-wide-open look at who has what data about whom and what they’re doing about — and perhaps more importantly — how the people about whom the data is being collected can get in on the game and get access to this data in the same way you’re guaranteed access and the ability to dispute your credit report. The same thing should be true for web services, the government and anyone else who’s been monitoring you, even if you’ve been sharing that information with them willingly. In another post, I talked about the value of this data — calling it “Data Capital“. People need to realize the massive amount of value that their data adds to the bottom line of so many major corporations (not to mention Web 2.0 startups!) and demand ongoing and persistent access to it. Hell, it might even result in better or more accurate data being stored in these mega-databases!

Regardless, when representatives from the government start to say things like:

Those two generations younger than we are have a very different idea of what is essential privacy, what they would wish to protect about their lives and affairs. And so, it’s not for us to inflict one size fits all, said Kerr, 68. Protecting anonymity isn’t a fight that can be won. Anyone that’s typed in their name on Google understands that.

Our job now is to engage in a productive debate, which focuses on privacy as a component of appropriate levels of security and public safety, Kerr said. I think all of us have to really take stock of what we already are willing to give up, in terms of anonymity, but [also] what safeguards we want in place to be sure that giving that doesn’t empty our bank account or do something equally bad elsewhere.

…you know that it’s time we started framing the debate on our own terms… thinking about what this means to the Citizen Centric Web and about how we want to become the gatekeepers for the data that is both rightfully ours and that should willfully be put into the service of our own needs and priorities.

And you wonder why people in America are afraid of the Internet

Ladies and gentlemen, I would like to present to you two exhibits.

Here is Exhibit A from today’s International Herald Tribune:

In contrast (Exhibit B) we have the same exact article, but with a completely different headline:

Now, for the life of me, I can’t figure out how the latter is a more accurate or more appropriate title for the article, which is ostensibly about Google’ acquisition of Jaiku.

But, for some reason, the editor of the NY Times piece decided that it would — what? — sell more papers? — to use a more incendiary and moreover misleading headline for the story.

Here’s why I take issue: I’m quoted in the article. And here’s where the difference is made. This is how the how the article ends:

“To date, many people still maintain their illusion of privacy,” he said in an e-mail message.

Adapting will take time.

“For iPhone users who use the Google Maps application, it’s already a pain to have to type in your current location,” he said. “‘Why doesn’t my phone just tell Google where I am?’ you invariably ask.”

When the time is right and frustrations like this are unpalatable enough, Mr. Messina said, “Google will have a ready answer to the problem.”

Consider the effect of reading that passage after being lead with a headline like “Google’s Purchase of Jaiku Raises New Privacy Issues” versus “Will Google take the mobile world of Jaiku onto the Web?” The latter clearly raises the specter of Google-as-Big-Brother while ignoring the fallacy that privacy, as people seem to understand it, continues to exist. Let’s face it: if you’re using a cell phone, the cell phone company knows where you are. It’s just a matter of time before you get an interface to that data and the illusion that somehow you gave Google (or any other third party) access to your whereabouts.

I for one do not understand how this kind of headline elevates or adds to the discourse, or how it helps people to better understand and come to gripes with the changing role and utility of their presence online. While I do like the notion that any well-engineered system can preserve one’s privacy while still being effective, I contend that it’s going to take a radical reinterpretation of what we think is and isn’t private to feel secure in who can and can’t see data about us.

So, to put it simply, there are no “new” privacy issues raised by Google’s acquisition of Jaiku; it’s simply the same old ones over and over again that we seem unable to deal with in any kind of open dialogue in the mainstream press.

Data capital, or: data as common tender

Wikipedia states that Legal tender … is payment that, by law, cannot be refused in settlement of a debt denominated in the same currency. Currency, in turn, is a unit of exchange, facilitating the transfer of goods and/or services.

I was asked a question earlier today about the relative value of open services against open data served in open, non-proprietary data formats. It got me thinking whether — in the pursuit of utter openness in web services and portability in stored data — that’s the right question. Are we providing the right incentives for people and companies to go open? Is it self-fulfilling or manifest destiny to arrive at a state of universal identity and service portability leading to unfettered consumer choice? Is this how we achieve VRM nirvana, or is there something missing in our assumptions and current analysis?

Mary Jo Foley touched on this topic today in a post called Are all ‘open’ Web platforms created equal? She asks the question whether Microsoft’s PC-driven worldview can be modernized to compete in the network-centric world of Web 2.0 where no single player dominates but rather is made up of Best of Breed APIs/services from across the Web. The question she alludes to is a poignant one: even if you go open (and Microsoft has, by any estimation), will anyone care? Even if you dress up your data and jump through hoops to please developers, will they actually take advantage of what you have to offer? Or is there something else to the equation that we’re missing? Some underlying truism that is simply refracting falsely in light of the newfound sexiness of “going open”?

We often tell our clients that one of the first things you can do to “open up” is build out an API, support microformats, adopt OpenID and OAuth. But that’s just the start. That’s just good data hygiene. That’s brushing your teeth once a day. That’s making sure your teeth don’t fall out of your head.

There’s a broader method to this madness, but unfortunately, it’s a rare opportunity when we actually get beyond just brushing our teeth to really getting to sink them in, going beyond remedial steps like adding microformats to web pages to crafting just-in-time, distributed open-data-driven web applications that actually do stuff and make things better. But as I said, it’s a rare occasion for us because we’ve all been asking the wrong questions, providing the wrong incentives and designing solutions from the perspective of the silos instead of from the perspective of the people.

Let me make a point here: if your data were legal tender, you could take it anywhere with you and it couldn’t be refused if you offered to pay with it.

Let me break that down a bit. The way things are today, we give away our data freely and frequently, in exchange for the use of certain services. Now, in some cases, like Pandora or Last.fm, the use of the service itself is compelling and worthwhile, providing an equal or greater exchange rate for our behavior or taste data. In many other cases, we sign up for a service and provide basic demographic data without any sense of what we’re going to get in return, often leaving scraps of ourselves to fester all across the internet. Why do we value this data so little? Why do we give it away so freely?

I learned of an interesting concept today while researching legal tender called “Gresham’s Law” and commonly stated as: When there is a legal tender currency, bad money drives good money out of circulation.

Don’t worry, it took me a while to get it too. Nicolas Nelson offered the following clarification: if high quality and low quality are forced to be treated equally, then folks will keep good quality things to themselves and use low quality things to exchange for more good stuff.

Think about this in terms of data: if people are forced (or tricked) into thinking that the data that they enter into web applications is not being valued (or protected) by the sites that collect the data, well, eventually they’ll either stop entering the data (heard of social network fatigue?) or they’ll start filling them with bogus information, leading to “bad data” driving out the “good data” from the system, ultimately leading to a kind of data inflation, where suddenly the problem is no longer getting people to just sign up for your service, but to also provide good data of some value. And this is where data portability — or data as legal tender — starts to become interesting and allows us to start seeing around through the distortion of the refraction.

Think: Data as currency. Data to unlock services. Data owned, controlled, exchanged and traded by the creator of said data, instead of by the networks he has joined. For the current glut of web applications to maintain and be sustained, we must move to a system where people are in charge of their data, where they garden and maintain it, and where they are free to deposit and withdraw it from web services like people do money from banks.

If you want to think about what comes next — what the proverbial “Web 3.0” is all about — it’s not just about a bunch of web applications hooked up with protocols like OAuth that speak in microformats and other open data tongue back and forth to each other. That’s the obvious part. The change comes when a person is in control of her data, and when the services that she uses firmly believe that she not only has a right to do as she pleases with her data, but that it is in their best interest to spit her data out in whatever myriad format she demands and to whichever myriad services she wishes.

The “data web” is still a number of years off, but it is rapidly approaching. It does require that the silos popular today open up and transition from repositories to transactional enterprises. Once data becomes a kind of common tender, you no longer need to lock it; in fact, the value comes from its reuse and circulation in commerce.

To some degree, Mint and Wesabe are doing this retroactively for your banking records, allowing you to add “data value” to the your monetary transactions. Next up Google and Microsoft will do this for your health records. For a more generic example, Swivel is doing this today for the OECD but has a private edition coming soon. Slife/Slifeshare, i use this and RescueTime do this for your use of desktop apps.

This isn’t just attention data that I’m talking about (though the recent announcements in support of APML are certainly positive). This goes beyond monitoring what you’re doing and how you’re spending your time. I’m talking about access to all the data that it would take to reconstitute your entire digital existence. And then I’m talking about the ability to slice, dice, and splice it however you like, in pursuit of whatever ends you choose. Or choose not to.

I’ll point to a few references that influenced my thinking: Social Capital To Show Its Worth at This Week’s Web 2.0 Summit, What is Web 2.0?, Tangled Up in the Future – Lessig and Lietaer, , Intentional Economics Day 1, Day 2, Day 3.

The Krypton of Privacy

Looks like we now know that the white underbelly of the beast lives just down the street — as well as what it looks like:

In San Francisco the “secret room” is Room 641A at 611 Folsom Street, the site of a large SBC phone building, three floors of which are occupied by AT&T. High-speed fiber-optic circuits come in on the 8th floor and run down to the 7th floor where they connect to routers for AT&T’s WorldNet service, part of the latter’s vital “Common Backbone.” In order to snoop on these circuits, a special cabinet was installed and cabled to the “secret room” on the 6th floor to monitor the information going through the circuits. (The location code of the cabinet is 070177.04, which denotes the 7th floor, aisle 177 and bay 04.) The “secret room” itself is roughly 24-by-48 feet, containing perhaps a dozen cabinets including such equipment as Sun servers and two Juniper routers, plus an industrial-size air conditioner.

And hey, the next time they hold a conference on “Intelligence Support Systems for Lawful Interception and Internet Surveillance”, let’s hold a BarCamp and riff on things like:

…lawful intercept of voice over the Internet (VoIP) and real-time Internet surveillance and the need for lawful interception and Internet surveillance
…what real-time Internet surveillance technology solutions are available, what tariffing mechanisms are available to pass costs off to the general public and how investments in Intelligence Support Systems (ISS) can generate a financial return without jeopardizing consumer privacy
…and how there are no lawful intercept or real-time Internet surveillance barriers that can’t be solved with adequate research and development investment and service provider commitments

That’s the spirit! Anything can be accomplished if you put your mind to it. Whether it’s right or wrong! Whohoo! Moral absolution!

Fuckers.