Thoughts on DataPortability

Introduction

Over the last several days I’ve started and abandoned four drafts of this post. Usually it doesn’t take me this long to write out my thoughts, or to go through so many different approaches, but I wanted to express myself as clearly as I could given the amount and overlapping texture of what I wanted to say. I ended up gutting a lot, and tried to focus on some basics, making as few assumptions about the reader (you) as possible.

The reality is that I’m eyeballs-deep in this stuff, and realized that in earlier drafts, I had included a lot of subtext that just wasn’t helping me get my message across and that really only made sense to other folks similarly in the thick of it.

So I got rid of the subterfuge and divided this up into four sections, inspired by a conversation I had with Brynn.

I encourage and invite feedback, but I would prefer to discuss the substance of what I’m arguing, rather than focusing on tit-for-tat squabbly disagreements.

  1. What is data portability?
  2. How does DataPortability (DP) relate to OpenID?
  3. Are there risks associated with DataPortability?
  4. What’s good about DataPortability?

What is data portability?

Contrary to what some folks have argued, I think that the semantics and meaning of the phrase “data portability” are important. To me data portability denotes the act of moving data from one place to another, and that the data should, therefore, be thought of like a physical thing, with physical properties.

Let me draw an analogy here to illustrate the problem with this model.

Take an iPod. With an iPod, you literally copy files from one device to another — for example, from your laptop to your iPod. This is, on the one hand, a limitation imposed by a lack of connectivity and restrictions in copyright law, but on the other, is actually by design. This scenario is not altogether unmanageable unless you have dozens of iPods that you want to sync up with your music, especially if you don’t typically think to connect your iPod every time you add new music, create new playlists or otherwise change your music library.

Now take an always-connected player, like Pandora Mobile, where the model works by federating continuous access from a central source — to consuming devices that play back music. Ignoring the restrictions that make it impossible for Pandora to let you listen to what you want on demand, the point is that, rather than making numerous copies across many unaffiliated and disconnected devices, Pandora affords a consistent experience and uniform access by streaming live data to any device that is authorized (and is online).

The former model (the iPod) is what you might call the “desktop model of data portability”. Certainly you can copy your data and take it with you, but it doesn’t reflect a model where always-on connectivity is assumed, which is the situation with online social networks. The offline model works well for physical devices that don’t require an internet connection to function — but it is a model that fails for services like Pandora, that requires connectivity, and whose value derives from ready access to up-to-date and current information, streamed and accessible from anywhere (well, except in Canada).

It’s nuance, but it’s critical to conceptualizing the value and import of this shift, and it’s nuance which I think is often left out of the explanation of “DataPortability” (whose official definition is the option to share or move your personal data between trusted applications and vendors (emphasis added)). In my mind, when the arena of application is the open, always-on, hyper-connected web, constructing best practices using an offline model of data is fraught with fundamental problems and distractions and is ultimately destined to fail, since the phrase is immediately obsolete, unable to capture in its essence contemporary developments in the cloud concept of computing (which consists of follow-your-nose URIs and URLs rather than discreet harddrives), and in the move towards push-based subscription models that are real-time and addressable.

So if you ask me what is “data portability”, I’ll concede that it’s a symbol for starting a conversation about what’s wrong with the state of social networks. Beyond that, I think there’s a great danger that, as a result of framing the current opportunity around “data portability”, the story that will get picked up and retold will be the about copying data between social networks, rather than the more compelling, more future-facing, and frankly more likely situation of data streaming from trusted brokered sources to downstream authorized consumers. But, I guess “copying” and “moving” data is easier to grasp conceptually, and so that’s what I think a lot of people will think when they hear the phrase. In any case, it gets the conversation started, and from there, where it goes, is anyone’s guess.

How does DataPortability (DP) relate to OpenID?

OpenID, along with OAuth, microformats, RSS, OPML, RDF, APML and XMPP are all open and non-proprietary technologies — formats and protocols — that grace the DataPortability homepage. How they ended up on the homepage, or what selection criteria is used to pick them, is beyond me (for example, I would have added ATOM to the list). So the best way that I can describe the relationship between any of these technologies and DataPortability is that, at some point, the powers that be within the group decided to throw a logo on their homepage and add it to their “social software stack”.

To reiterate (and I won’t speak for the OpenID Foundation since I’m unfamiliar with any conversations that they might have had with DP), no one necessarily asked if it would be okay to put the OAuth or microformats logos on the homepage of DP, or to include those technologies in the DP stack. They just did it. It wasn’t like DP had been around for awhile with a mandate to develop best practices for the future of social networks, and groups like the microformats community petitioned or was nominated to be included. They simply were. There was no process, as far as I’m aware, as to what was included, and what was not.

So while OpenID and the other technologies may be part of the technologies recommended by DP, it should be known that there really is no official relationship between these efforts and DP (though it is true that many members of each group coordinate, meet and discuss related topics, for example, at tomorrow’s Internet Identity Workshop, and at events like the Data Sharing Summit).

Beyond that, it should be noted that OpenID, OAuth, microformats et al have been in development for the last several years, and have been building up momentum and communities all on their own, without and prior to the existence of the DP initiative. In fact, the DP project really only got its start last November with an idea presented by Josh Patterson and Josh Lewis called WRFS, or the “Web Relational File System”. At the time, the WRFS was intended to serve as a “reference design” for describing how data portability should work and this was to serve as the foundation of the DP recommendations.

In January, after ongoing discussions, Josh decided that it would be best to spin WRFS off into its own project and started a separate mailing list, leaving DP to focus exclusively on evangelizing existing technologies and communities and, in the oft-repeated words of Chris Saad, to invent nothing new (a mantra inherited from the OAuth and microformats efforts).

Are there risks associated with DataPortability?

If you accept that DP is primarily a symbol for starting the conversation about transforming social networks from walled gardens into interoperating, seamful web services, then no, not really. If you believe or buy into the hype, or blindly follow the forthcoming “technical specifications“, I see significant risks that need to addressed.

First, DP does not speak for the community as a whole, for any specific social network (except, perhaps, MySpace), or for any individuals except those who publicly align themselves with the group. On too many occasions to feel comfortable about, I’ve seen or read members of the DP project claim authority far beyond any reasonable mandate, which to me have read like attempts to seize control and influence that not only isn’t justified, but that shouldn’t be ascribed to any individual or organization. I worry that this hubris (conceivably a result of proximity to certain A-Listers) is leading them to take more credit than they’re due, and in consequence, folks interested but previously uninitiated with any of the core technologies will be lead to believe that the DataPortability group is responsible and in control of those technologies. Furthermore, if it is the case that people are mislead, I have little faith that folks from the DP project will prevent themselves from speaking on behalf of (or pseudo-knowledgeably about) those technologies, leading to confusion and potential damage.

Second, I have a great deal of concern about the experiences and priorities that are playing into the group’s approach to privacy, security, publicity and disclosure. These are concerns that I would have with any effort that aims to bridge different social or commercial contexts where norms and expectations have already been established, and where there exists few examples (apart from Beacon) of how people actually respond to semi-automatic social network cross-fertilization. Not that privacy isn’t a hot topic on the DP mailing lists, it’s just that statements like this one reflects fishtailing in the definition and approach to privacy from a leader of the group, and that I worry could skid wildly out of control if clarity on how to achieve these dictims isn’t developed very soon:

The thing is that while Privacy is certainly important, in the end these are *social* platforms. By definition they are about sharing. The problem with Facebook Beacon was not that it was sharing, but rather it was sharing the WRONG information in the WRONG way.

Also again, don’t forget, just because data is portable or accessible does NOT mean it is public or ‘open’. This is why I stayed away from the ‘Open Data’ terminology when thinking up DataPortability. Just like a Hard Drive and a PC that runs certain applications, ultimately the applications that USE the data that need to ensure they treat the data with respect – or users will simply stop using them.

[. . .]

You are right that DP should NOT be positioned that Privacy is not important – that is certainly not my intention with my answers. But being important and being a major sticking point is two different things.

Again I tend to think of this as one big Hard Disk. While you provide read/write permissions to folders on a network (for privacy) it is ultimately up to the people and applications you trust to respect your privacy and not just start emailing your word docs to your friends.

So if the second risk is that an unrealistic, naive or incomplete model of privacy [coupled with a lack of effective enforcement mechanisms in the case of fraud or abuse] will be promoted by the DP group, the third risk is that groups or communities that are roped into the DP initiative may open themselves up to a latent social backlash should something go wrong with specific implementations of DataPortability best practices. Specifically, if the final privacy model demands certain approaches to user data, and companies or organizations go along with them by adopting the provided “social technology stack” (i.e. libraries offered that implement the DP data model), the technical implementation may be flawless, but if people’s data starts showing up in places where they didn’t expect it to, they may reject the whole notion of “data portability” and seek to retreat back to the days of “safe” walled gardens of today. And it may be that, because of the emphasis on specific technologies in the DP group’s propaganda, that brands like OpenID and OAuth will become associated with negative experiences, like downloadable .exes in email are today. It’s not a foregone conclusion in my mind that this future is inevitable, but it’s one that the individual groups affected should avoid at all costs, if only because of the significant progress we’ve made to date on our own, and it would be a shame if ignorance or lack of clear communication about the proper methods of adoption and implementation of these technologies lead people to blame the technology means instead of particular instances of its application.

What’s good about DataPortability?

I don’t want to just be a negative creep, so I do think that there is a silver lining to the DP initiative, which I mentioned earlier: it provides a token phrase that we can throw around to tease out some of the more gnarly issues involved in developing future social applications. It is about having a conversation.

While OpenID and OAuth have actual technology and implementations behind them, they also serve as symbols for having conversations about identity and authorization, respectively. Similarly, microformats helps us to think about lightweight semantic markup that we can embed in human-friendly web pages that are also compatible with today’s web browsers, and that additionally make those pages easier for machines to parse. And before these symbols, we had AJAX and Web 2.0, both of which, during their inception, were equally controversial and offensive to the folks who knew the details of the underlying technological innovation behind the terms but who also stood to lose their shamanic positions if simpler language were adopted as the conversations migrated into the mainstream.

Now, is there a risk that we might lose some of the nuance and sophistication with which we data junkies and user-centric identity advocates communicate if we adopt a less precise term to describe the present trends towards interoperable social networks? Absolutely. But this also means that, as the phrase “data portability” makes its way into common conversation, people can begin to think about their social networking activities and what they take for granted (“Wait, you mean that I wouldn’t have to sign up for a new account on my friend’s social network just to send them a photo? Really?”), and to realize that the way things are today not only aren’t the way that they have to be, but that there is a better way for social applications to be designed, architected and presented, that give the enthusiasts and customers of these services greater choice and greater latitude to actually pick services that — what else? — serve them best!

So just as Firefox gave rise to a generation of web developers that take web standards much more seriously, and have in turn recognized and capitalized on the power of having a “rectangle” that actually behaves in a way that they expect (meaning that it fully complies with the standards as they’ve been defined), I think the next evolution of the social web is going to be one where we take certain things, like identity, like portable contact lists, like better and more consistent permissioning systems as givens, and as a result, will lead to much more interesting, more compelling, and, perhaps even more lucrative, uses of the open social web.

Author: Chris Messina

Inventor of the hashtag. #1 Product Hunter. Techmeme Ride Home podcaster. Ever-curious product designer and technologist. Previously: Google, Uber, Republic, YC W'18.

31 thoughts on “Thoughts on DataPortability”

  1. Fantastic post, Chris. I’m very happy to see somebody elucidate so clearly the current situation in a relatively unbiased fashion.

    However, pointing to those references from DP documents is only going to lead to more cries of: “We’re not up to that phase yet!”

    Also, the blinkered mention of OpenID and OAuth as symbols for conversation caught me wrong. They are, of course, and I understand that you’re simplifying, but they’re not the only symbols that lead to conversations on identity and authorisation.

    Lastly, I agree that the phrase ‘data portability’ serves very well as an equivalent to ‘Ajax’, ‘Web 2.0’ etc however you fail to distinguish the phrase explicitly from the group. I can’t see how we can expect those not actively in the community (and how does one define that?) to understand that ‘data portability’ is not sourced from http://dataportability.org/“ in some fashion.

    Then again, if the big players in the DP group continue to deliberately dilute and obscure the message as per MySpace’s recent ‘Data Availability’ announcement unchecked, we might be looking for a new phrase to symbolise the conversation!

  2. Very nice analysis.

    I think that to the unwashed masses, data portability will ultimately be about “never having to type the same crap into the web ever again”.

    Of course, in the meanwhile, we’ll have the just-informed-enough-to-be-dangerous technical crowd to try to keep as unconfused as possible.

  3. This is a really good post, Chris, and earlier in the week I posted some similar sentiments on floe.tv’s development blog,
    http://news.floe.tv/index.php/2008/04/20/where-do-you-think-youre-going/
    basically asserting that I believe that the term “data portability” is more the beginning of a conversation and less of an endpoint. I’m less of a fan of dp “best practice” documents and more of a fan of devs at the tip of the long tail getting together, linking their apps together via open data, and showing the internet how open can be better:

    http://jpatterson.floe.tv/index.php/2008/05/10/stone-soup/

    When we sketched up WRFS I never intended to push that towards some “this is how you do it” type situation, I always thought of it more as a “hey, wanna share data with us? lets get together, maybe we can offer more value for less”. I think this needs to be more of an evolutionary, organic process and less of a “dictated from on high” scenario.

    I also concur that there is a perfect storm scenario where the user could be setup to fail in such a way that they would be screaming back to the walled gardens, which does no one good, and its a scenario best avoided.

    Regardless, I think we are on the cusp of the linked data era, which will bring about the semantic era of the web, and I feel this can bring large “tectonic shifts” in how the web looks and works economically,
    http://jpatterson.floe.tv/index.php/2008/05/10/its-the-end-of-the-world-as-we-know-it/

    keep it coming, and I need more DiSo code to come out, we need more people with XRDS files to use in our app!

    Josh Patterson

  4. Great post, as you know we think quite similarly when it comes to this stuff.

    You have a good point that one of the values of Data Portability is already giving people a brand they can start to rally around. The importance of that shouldn’t be underestimated. While you mention AJAX, I don’t think that is anywhere near as common of a brand as Web 2.0. The question in my mind then becomes if Data Portability should be this next brand of where the web is going?

  5. @Lachlan Good points. You’re right about the wiki apologists… I pointed to those pages because I’m waiting for them to be filled, but it really points to the need for a “stub” distinction — as in, this page is not finished — it’s just a placeholder! However, so many pages on the wiki are like this, you wonder when they’ll ever get attended to!

    I brought up OpenID and OAuth as symbols probably because I’m biased towards them. When people as me what I’m working on, I mentioned these technologies, which invariably lead to conversations about online identity and safer transmitting of data between services. In that way, they serve me as tokens, but are certainly not the *only* or even *primary* symbols for taking about identity and related topics!

    In earlier drafts I spent some time talking about camelcase DataPortability™ (the brand) and ‘data portability’ (the kind of data that is moveable). I decided to remove it from this post simply because it harkens back to conversations about BarCamp, trademarks, community marks, centralization and whatnot. I actually had quite a bit decrying the group’s odd penchant to centralize into a formal body already when in most cases that slows things down or leads to the exact politics that the group claims to want to avoid. Well, you can see I could probably write a whole post on that topic, hence I left it out of this one!

    I also had written up something about the MySpace ‘data availability’ thing, which, ironically, is actually pretty close to the model that I see in the future — the model that will simply ape the “portability moment”.

    @Josh Thanks for stopping by. I do think that a lot of your thinking here is applicable, and a lot of the stuff you guys have come up fits this model pretty well.

    I do agree that getting in there and solving problems, with real uses cases is going to be key, and I appreciate the nudge to get more code into DiSo, and getting more sites to adopt DiSo tech! Don’t worry, things’ll be heating up shortly! 😉

    I also appreciate you pointing out that WRFS was never intended to be the final spec for the DP project, but that it sort of (or seemed to) inherit that role. Anyway, I’m glad that you spun it out when you did.

    @David I think that for developers, AJAX is very common! But you’re right, for the wider media and enterprise/business markets, Web 2.0 has really taken hold and vaguely represents “non-1.0 type websites”. I think we’re at a point now where people would be able to tell whether a site is “2.0 or not” — just by looking at it. Eventually that might be the case for sites that are designed as services — by supporting OpenID, allowing people to pull in their data using OAuth, allowing them to subscribe to their remote hcard-based profiles. Of course, all this is the behind-the-scenes, but it’s nonetheless worth considering, since the Web 2.0 meme is starting to get a bit old in the Valley (and 3.0 is kind of a frightful, though semi-inevitable next age).

    @Chris Wow, you’re fast dude. I’ll read up — thanks for taking the time!

  6. Great post, Chris!
    I’ve also been mulling over the practical implications on a short post on my blog: http://www.hereinthehive.com/2008/05/04/identity-and-control/

    There’s obviously an issue past the vast array you’ve mentioned such as the integrity of hosting ‘identity profiles’ and which companies or hosts would you trust to host your identity and preferences?

    I think if the community can get this nailed, it really will transform the web; how social networks differentiate themselves – what do they offer to people that can up sticks and walk away?

  7. Hey Chris,

    great post! Many good points I can understand and agree on. My hope is though that more conversation between all those groups is going on (where I mean constructive ones which did not always happen in the past). I personally am involved more into DP mostly because I found this more approachable than e.g. DiSo (as I am not coding in PHP and not for WP and for now it seems to be more code than documentation. I still think it’s a good and important project of course).

    In the end I think we all have the sort of an similar idea that data should be more free (then again I think for sync it would be more useful to have a central store like your OpenID provider from where services sync their local copy but that’s a different discussion).

    Having said this I find it a bit sad to see some of these little flame wars popping up here and there. Sitting not in the valley and being sort of an outsider when it comes to all the existing communities (like OpenID, Oauth etc.) I wonder where all this comes from as to me at least it was always clear that DP does not want to “eat” those standards but use existing ones where possible.

    As for the process of which one gets choosen (yes, Atom should be in there) that’s a difficulty question, esp. as there always seems to be very dogmatic persons in each community (like MF vs. FOAF) who say that “we solved it all, something else to use is wrong wrong wrong” (that’s at least how it sounds for me from coming into these discussions relatively from the outside). So my proposal would always be to look what’s already deployed out there and does the job. For e.g. profiles I would then come up with FOAF/VCard and hCard.
    Of course such things will always be debated.

    In the end I think the market will decide. I doubt that we will really end up with some sort of trustmark. Vendors will always try to do something differently and I think the main job we can really do here is do stir up discussions and to promote the idea. I think both DP and DiSo really have done this and will continue to do so. The process of coming up with a solution will be incremental anyway by bouncing ideas back and forth and improving them.

    So what I would really wish for is that there would be more talk between those different projects and communities. It even seems that we all come up with similar ideas and proposals (e.g. regarding service discovery via XRDS which apparently is in the air though) so it can only be details about what we argue.

    For me at least it’s not important what group will “win”. For me it’s important to be able to discuss some ideas with people and to have a broad spectrum of libraries (in a broad spectrum of languages). That’s why I started pdataportability which is not so much about DataPortability but about all those standards related to data portability in general (like XRDS, Microformats, FOAF etc.).

    Ok, enough said.. hope it makes sense 🙂

  8. Hi Chris,

    Would love the opportunity to share some details with you on ooTao’s work. We have implemented integrations between 90% of the standards that you list and have learned a great deal in the process. Our ‘DataWeb Server’ is an implementation of, as you say: ” data streaming from trusted brokered sources to downstream authorized consumers”. We would love to get more involved in DataPortabilty.org… How do we do that?

  9. Great thread so far – I just wish we were all sitting around the Google campfire this evening, holding hands, roasting marshmellows and thinking about our brothers and sisters up in Redmond, our brethren at Facebook (espousing on ‘dynamic privacy’, our cohorts down in LaLa (and SOMA) trying to move that giant called MySpace in a widget direction.

    Oh yah and the rocket scientists at Yahoo.

    This whole movement of data movement needs to be expanded by discussions on interoperability, testing labs need to be set up (to make sure systems are compatible with each other) and maybe one day we’ll be able to start showing vendors and other platforms – why opening up is a good thing.

    See you all on Thursday.

  10. Great post! I think the emphasis is important that the term Data Portability is not some technology waiting to be implemented but rather an idea how thinks can be done better than today and the discussion how it is done. And it will take some time, where I’m not sure whether the building blocks (OpenID, OAuth, Microformats …) have to be established first and then used or if Data Portability is the “killer app” that gives these building blocks their breakthrough.

Leave a comment