Digital Identity – Page 3

After 1984

iTunes 8 has added a new feature called “Genius” that harnesses the collective behavior of iTunes Music Store shoppers to generate “perfect” playlists.

Had an interesting email exchange with my mom earlier today about Monica Hesse’s story Bytes of Life. The crux of the story is that more and more people are self-monitoring and collecting data about themselves, in many cases, because, well, it’s gotten so much easier, so, why not?

Well, yes, it is easier, but just because it is easier, doesn’t automatically mean that one should do it, so let’s look at this a little more deeply.

First, my mom asked about the amount of effort involved in tracking all this data:

I still have a hard time even considering all that time and effort spent in detailing every moment of one’s life, and then the other side of it which is that it all has to be read and processed in order to “know oneself”. I think I like the Jon Cabot Zinn philosophy better — just BE in the moment, being mindful of each second doesn’t require one to log or blog it, I don’t think. Just BE in it.

Monica didn’t really touch on too many tools that we use to self-monitor. It’s true that, depending on the kind of data we’re collecting, the effort will vary. But so will the benefits.

If you take a look at MyMileMarker’s iPhone interface, you’ll see how quick and painless it is to record this information. Why bother? Well, for one thing, over time you get to see not only how much fuel you’re consuming, but how much it’s going to cost you to keep running your car in the future:

Without collecting this data, you might guess at your MPG, or take the manufacturer’s rating as given, but when you record what actually is happening, you can prove to yourself whether filling up your tires really does save you money (or the planet).

On the topic of the environment, recording my trips on Dopplr gives me an actual view of my carbon footprint (pretty damning, indeed):

As my mom pointed out, perhaps having access to this data will encourage me to cut back excess travel — or to consolidate my trips. Ross Mayfield suggests that he could potentially quit smoking if his habit were made more plainly visible to him.

What’s also interesting is how passive monitors, or semi-passive monitoring tools, can also inform, educate or predict — and on this point I’m thinking of Last.fm where of course my music taste is aggregated, or location-based sites like Brightkite, where my locative behavior is tracked (albeit, manually — though Fire Eagle + Spot changes that).

My mom’s other point about the ability to just BE in the moment is also important — because self-tracking should ideally be non-invasive. In other words, it shouldn’t be the tracking that changes your behavior, but your analysis and reflection after the fact.

One of the stronger points I might make about this is that data, especially when collected regularly and when the right indicators are recorded, you can reduce a great amount of distortion from your self-serving biases. Monica writes:

“We all have the tendency to see our behaviors in a little bit of a halo,” says Jayne Gackenbach, who researches the psychology of the Internet at Grant MacEwan College in Alberta, Canada. It’s why dieters underestimate their food intake, why smokers say they go through fewer cigarettes than they do. “If people can get at some objective criteria, it would be wonderfully informative.” That’s the brilliance, she says, of new technology.

big-brother So that’s great and all, but all of this, at least for my mom, raises the spectre of George Orwell’s ubiquitous and all-knowing “Big Brother” from Nineteen Eighty-Four and neo-Taylorism:

I do agree that people lie, or misperceive, and that data is a truer bearer of actualities. I guess I don’t care. Story telling is an art form, too. There’s something sort of 1984ish about all this data collection – – as if the accumulated data could eventually turn us all into robotic creatures too self-programmed to suck the real juice out of life.

I certainly am sympathetic to that view, especially because the characterization of life in 1984 was so compelling and visceral. The problem is that this analogy invariably falls short, especially in other conversations when you’re talking about the likes of Google and other web-based companies.

In 1984, Big Brother symbolized the encroachment of the government on the life of the private citizen. Since the government had the ability to lock you up or take you away based on your behavior, you can imagine that this kind of dystopic vision would resonate in a time when increasingly fewer people probably understand the guts of technology and yet increasingly rely on it, shoveling more and more of their data into online repositories, or having it collected about them as they visit various websites. Never before has the human race had so much data about itself, and yet (likely) so little understanding.

The difference, as I explained to my mom, comes down to access to — and leverage over — the data:

I want to write more about this, but I don’t think 1984 is an apt analogy here. In the book, the government knows everything about the citizenry, and makes decisions using that data, towards maximizing efficiency for some unknown — or spiritually void — end. In this case, we’re flipping 1984 on its head! In this case we’re collecting the data on OURSELVES — empowering ourselves to know more than the credit card companies and banks! It’s certainly a daunting and scary thought to realize how much data OTHER people have about us — but what better way to get a leg up then to start looking at ourselves, and collecting that information for our own benefit?

I used to be pretty skeptical of all this too… but since I’ve seen the tools, and I’ve seen the value of data — I just don’t want other people to profit off of my behaviors… I want to be able to benefit from it as well — in ways that I dictate — on my terms!

In any case, Tim O’Reilly is right: data is the new Intel inside. But shouldn’t we be getting a piece of the action if we’re talking about data about us? Shouldn’t we write the book on what 2014 is going to look like so we can put the tired 1984 analogies to rest for awhile and take advantage of what is unfolding today? I’m certainly weary of large corporate behemoths usurping the role the government played in 1984, but frankly, I think we’ve gone beyond that point.

Musings on Chrome, the rebirth of the location bar and privacy in the cloud

Imagine a browser of the web, by the web, and for the web. Not simply a thick client application that simply opens documents with the http:// protocol instead of file://, but one that runs web applications (efficiently!), that plays the web, that connects people across the boundaries of the silos and gives them local-like access to remote data.

It might not be Chrome, but it’s a damn near approximation, given what people are used to today.

Take a step back. You can see the relics of desktop computing in our applications’ file menus… and we can intuit the assumptions that the original designer must have made about the user, her context and the interaction expectations she brought with her:

This is not a start menu or a Dock. This is a document-driven menubar that’s barely changed since Netscape Communicator.

Indeed, the browser is a funny thing, because it’s really just a wrapper for someone else’s content or someone’s else’s application. That’s why it’s not about “features“. It’s all about which features, especially for developers.

It’s a hugely powerful place to insert oneself: between a person and the vast expanse that is the Open Web. Better yet: to be the conduit through which anyone projects herself on to the web, or reaches into the digital void to do something.

So if you were going to design a new browser, how would you handle the enormity of that responsibility? How would you seize the monument of that opportunity and create something great?

Well, for starters, you’d probably want to think about that first run experience — what it’s like to get behind the wheel for the very time with a newly minted driver’s permit — with the daunting realization that you can now go anywhere you please…! Which is of course awesome, until you realize that you have no idea where to go first!

Historically, the solution has been to flip-flop between portals and search boxes, and if we’ve learned anything from Google’s shockingly austere homepage, it comes down to recognizing that the first step of getting somewhere is expressing some notion of where you want to go:

The problem is that the location field has, up until recently, been fairly inert and useless. With Spotlight-influenced interfaces creeping into the browser (like David Watanabe’s recently acquired Inquisitor Safari plugin — now powered by Yahoo! Search BOSS — or the flyout in Flock that was inspired by it) it’s clear that browsers can and should provide more direction and assistance to get people going. Not everyone’s got a penchant for remembering URLs (or RFCs) like Tantek’s.

This kind of predictive interface, however, has only slowly made its way into the location bar, like fish being washed ashore and gradually sprouting legs. Eventually they’ll learn to walk and breath normally, but until then, things might look a little awkward. But yes, dear reader, things do change.

So you can imagine, having recognized this trend, Google went ahead and combined the search box and the location field in Chrome and is now pushing the location bar as the starting place, as well as where to do your searching:

This change to such a fundamental piece of real estate in the browser has profound consequences on both the typical use of the browser as well as security models that treat the visibility of the URL bar as sacrosanct (read: phishing):

The URL bar is dead! Long live the URL bar!

While cats like us know intuitively how to use the location bar in combination with URLs to gets us to where to we want to go, that practice is now outmoded. Instead we type anything into the “box” and have some likely chance that we’re going to end up close to something interesting. Feeling lucky?

But there’s something else behind all this that I think is super important to realize… and that’s that our fundamental notions and expectations of privacy on the web have to change or will be changed for us. Either we do without tools that augment our cognitive faculties or we embrace them, and in so doing, shim open a window on our behaviors and our habits so that computers, computing environments and web service agents can become more predictive and responsive to them, and in so doing, serve us better. So it goes.

Underlying these changes are new legal concepts and challenges, spelled out in Google’s updated EULA and Privacy Policy… heretofore places where few feared to go, least of all browser manufacturers:

5. Use of the Services by you

5.1 In order to access certain Services, you may be required to provide information about yourself (such as identification or contact details) as part of the registration process for the Service, or as part of your continued use of the Services. You agree that any registration information you give to Google will always be accurate, correct and up to date.

. . .

12. Software updates

12.1 The Software which you use may automatically download and install updates from time to time from Google. These updates are designed to improve, enhance and further develop the Services and may take the form of bug fixes, enhanced functions, new software modules and completely new versions. You agree to receive such updates (and permit Google to deliver these to you) as part of your use of the Services.

It’s not that any of this is unexpected or Draconian: it is what it is, if it weren’t like this already.

Each of us will eventually need to choose a data brokers or two in the future and agree to similar terms and conditions, just like we’ve done with banks and credit card providers; and if we haven’t already, just as we have as we’ve done in embracing webmail.

Hopefully visibility into Chrome’s source code will help keep things honest, and also provide the means to excise those features, or to redirect them to brokers or service providers of our choosing, but it’s inevitable that effective cloud computing will increasingly require more data from and about us than we’ve previously felt comfortable giving. And the crazy thing is that a great number of us (yes, including me!) will give it. Willingly. And eagerly.

But think one more second about the ramifications (see Matt Cutts) of Section 12 up there about Software Updates: by using Chrome, you agree to allow Google to update the browser. That’s it: end of story. You want to turn it off? Disconnect from the web… in the process, rendering Chrome nothing more than, well, chrome (pun intended).

Welcome to cloud computing. The future has arrived and is arriving.

The Social Web TV pilot episode

http://www.viddler.com/player/2cf46be8/

My buddies John McCrea, Joseph Smarr have started up a show called The Social Web and have released the pilot episode, featuring David Recordon on the hubbub between Google and Facebook following last week’s Supernova Conference.

As they point out, things are changing and happening so fast in the industry that a show like this, that cuts through the FUD and marketing hype is really necessary. I hope to participate in future episodes — and would love to hear suggestions or recommendations for topics or guests for upcoming episodes.

Here’s the FriendFeed room Dave mentioned.

Announcing Emailtoid: mapping email addresses to OpenIDs

The other night at Beer and Blog in Portland, fellow Vidooper Michael T Richardson announced and launched a new service that I’m both excited and a little apprehensive about.

The service is called Emailtoid, and while I prefer to pronounce is “email-toyed”, others might pronounce it “email two eye-dee”. And depending on your pronunciation, you might realize that this service is about using an email address as an ID — specifically an OpenID.

This is not a new idea, and it’s one that been debated and discussed in the OpenID community an awful lot, which culminated in a rough outline of how it might work by Brad Fitzpatrick following the Social Graph FOO Camp this past spring, and that David Fuelling turned into an early draft spec.

Well, we looked at this work and this discussion and felt that sooner or later, in spite of all the benefits of using actual URLs for identity, that someone needed to take a lead and actually build out this concept so we have something real to banter about.

The pragmatic reality is that many people are comfortable using email addresses as their identity online for signing up to new services; furthermore, many, many more people have email addresses who don’t also have URLs or homepages that they call their own (or can readily identify). And forcing people to learn yet another form of identifier for the web to satisfy the design of a protocol for arguably marginal value with a lesser user experience also doesn’t make sense. Put another way: the limitations of the technology should not be forced on end users, especially when it doesn’t need to be. And that’s why Emailtoid is a necessary experiment towards advancing identity on the web.

How it works

Emailtoid is a very simple service, and in fact is designed for obsolescence. It’s meant as a fallback for now, enabling relying parties to accept email addresses as identifiers without requiring the generation of a new local password and without requiring the address owner to give up or reveal their existing email credentials (otherwise known as the “password anti-pattern“).

The flow works like this:

Users enter either an OpenID or email address into a typical OpenID input field. For the purpose of this flow, we’ll presume an email address is used.
The relying party splits email addresses at the ‘@’ symbol into the username and the domain, generating a directed identity request to the email domain. If an XRDS, YADIS or XRDS-Simple document is discovered at the domain, the typical OpenID flow is invoked.
If no discovery document is found, the service falls back to Emailtoid (sending a request like http://emailtoid.net/mapper?email=jane@example.com), where users verify that they own the supplied email addresses by providing their one-time access token that Emailtoid mailed to them.
At this point, users may optionally associate an existing OpenID with their email address, or use the OpenID auto-generated by Emailtoid. Emailtoid is not intended to serve as a full-featured OpenID provider, and we encourage using an OpenID from a third-party OpenID provider.
In the case where users supply and verify their own OpenID, Emailtoid will create a 302 HTTP redirect removing Emailtoid from future interactions completely.

Should an email provider supply a discovery document after an Emailtoid mapping has been made, the new mapping will take precedence.

Opportunities and issues

The drive behind Emailtoid, again, is to reduce the friction of OpenID by reusing familiar identifiers (i.e. email addresses). Clearly the challenges of achieving OpenID adoption are not simply technological, and to a great degree rely on how the user experience needs to become more streamlined and deliver on the promise of greater security and convenience.

Therefore, if a service advertises that they support signing in with an email address, they must keep that promise.

Unfortunately, until all email providers do some kind of local resolution and OpenID authentication, we will need a centralized mapper such as Emailtoid to provide the fallback mapping. And therein lies the rub, defeating some of the distributed design of OpenID.

If anything, Emailtoid is intended to drive forward a conversation about the experience of OpenID, and about how we can make the protocol compatible with, or complementary to, existing and well-known means of identifying oneself on the web. Is it a final solution? Probably not — but it’s up, it’s running, it works and it forces us now to look critically at the question of emails as OpenIDs, now that we can actually experience the flow, and the feeling, of entering an email address into an OpenID box without ever having to enter, or create, another unnecessary password.

Thoughts on dynamic privacy

A highly touted aspect of Facebook Connect is the notion of “dynamic privacy“:

As a user moves around the open Web, their privacy settings will follow, ensuring that users’ information and privacy rules are always up-to-date. For example, if a user changes their profile picture, or removes a friend connection, this will be automatically updated in the external website.

Over the course of the Graphing Social Patterns East conference here in DC, Dave Morin and others from Facebook’s Developer Platform have made many a reference to this scheme but have provided frustratingly scant detail on how it will actually work.

In a conversation with Brian Oberkirch and David Recordon, it dawned on me that the pieces for Dynamic Privacy are already in place and that, to some degree, it seems that it’s really just a matter of figuring out how to effectively enforce policy across distributed systems in order to meet user expectations.

MySpace actually has made similar announcements in their Data Availability approach, and if you read carefully, you can spot the fundamental rift between the OpenSocial and Facebook platforms:

Additionally, rather than updating information across the Web (e.g. default photo, favorite movies or music) for each site where a user spends time, now a user can update their profile in one place and dynamically share that information with the other sites they care about. MySpace will be rolling out a centralized location within the site that allows users to manage how their content and data is made available to third party sites they have chosen to engage with.

Indeed, Recordon wrote about this on O’Reilly Radar last month (emphasis original):

He explained that MySpace said that due to their terms of service the participating sites (e.g. Twitter) would not be allowed to cache or store any of the profile information. In my mind this led to the Data Availability API being structured in one of two ways: 1) on each page load Twitter makes a request to MySpace fetching the protected profile information via OAuth to then display on their site or 2) Twitter includes JavaScript which the browser then uses to fill in the corresponding profile information when it renders the page. Either case is not an example of data portability no matter how you define the term!

Embedding vs sharing

So the major difference here is in the mechanism of data delivery and how the information is “leased” or “tethered” to the original source, such that, as Morin said, “when a user deletes an item on Facebook, it gets deleted everywhere else.”

The approach taken by Google Gadgets, and hence OpenSocial, for the most part, has been to tether data back to the source via embedded iframes. This means that if someone deletes or changes a social object, it will be deleted or changed across OpenSocial containers, though they won’t even notice the difference since they never had access to the data to begin with.

The approach that seems likely from Facebook can be intuited by scouring their developer’s terms of service (emphasis added):

You can only cache user information for up to 24 hours to assist with performance.

…

2.A.4) Except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any Facebook Properties not explicitly identified as being storable indefinitely in the Facebook Platform Documentation within 24 hours after the time at which you obtained the data, or such other time as Facebook may specify to you from time to time;

2.A.5) You may store and use indefinitely any Facebook Properties that are explicitly identified as being storable indefinitely in the Facebook Platform Documentation; provided, however, that except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any such Facebook Properties: (a) if Facebook ceases to explicitly identify the same as being storable indefinitely in the Facebook Platform Documentation; (b) upon notice from Facebook (including if we notify you that a particular Facebook User has requested that their information be made inaccessible to that Facebook Platform Application); or (c) upon any termination of this Agreement or of your use of or participation in Facebook Platform;

2.A.6) You may retain copies of Exportable Facebook Properties for such period of time (if any) as the Applicable Facebook User for such Exportable Facebook Properties may approve, if (and only if) such Applicable Facebook User expressly approves your doing so pursuant to an affirmative “opt-in” after receiving a prominent disclosure of (a) the uses you intend to make of such Exportable Facebook Properties, (b) the duration for which you will retain copies of such Exportable Facebook Properties and (c) any terms and conditions governing your use of such Exportable Facebook Properties (a “Full Disclosure Opt-In”);

2.B.8) Notwithstanding the provisions of Sections 2.B.1, 2.B.2 and 2.B.5 above, if (and only if) the Applicable Facebook User for any Exportable Facebook Properties expressly approves your doing so pursuant to a Full Disclosure Opt-In, you may additionally display, provide, edit, modify, sell, resell, lease, redistribute, license, sublicense or transfer such Exportable Facebook Properties in such manner as, and only to the extent that, such Applicable Facebook User may approve.

This is further expanded in the platform documentation on Storable Information:

Per the Developer Terms of Service, you may not cache any user data for more than 24 hours, with the exception of information that is explicitly “storable indefinitely.” Only the following parameters are storable indefinitely; all other information must be requested from Facebook each time.

…

The storable IDs enable you to keep unique identifiers for Facebook elements that correspond to data gathered by your application. For instance, if you collected information about a user’s musical tastes, you could associate that data with a user’s Facebook uid.

However, note that you cannot store any relations between these IDs, such as whether a user is attending an event. The only exception is the user-to-network relation.

I imagine that Facebook Connect will work by “leasing” or “sharing” information to remote sites and require, through agreement and compliance with their terms, to check in periodically (or to receive directives through a push mechanism) for changes to data, and then to flush caches of stored data every 24 hours or less.

In either model there is still a central provider and store of the data, but the question for implementation really comes down to whether a remote site ever has direct access to the data, and if so, how long it is allowed to store it.

Of note is the OpenSocial RESTful API, which provides a web-friendly mechanism for addressing and defining resources. Recordon pointed out to me that this API affords all the mechanisms necessary to implement the “leased” model of data access (rather than the embedded model), but leaves it up to the OpenSocial applications and containers to set and enforce their own data access policies.

…Which is a world of a difference from Facebook’s approach to date, for which there is neither code nor a spec nor an open discussion about how they’re thinking through the tenuous issues imbued in making decisions around data access, data control, “tethering” and “portability“. While folks like Plaxo and Yahoo are actually shipping code, Facebook is still posturing, assuring us to “wait and see”. With something so central and so important, it’s disheartening that Facebook’s “Open” strategy is anything but open, and everything less than transparent.

Inventing contact schemas for fun and profit! (Ugh)

And then there were three.

Today, Yahoo! announced the public availability of their own Address Book API. Though Plaxo and LinkedIn have been using this API behind the scenes for a short while, today marks the first time the API is available for anyone who registers for an App ID to make use of the bi-directional protocol.

The API is shielded behind Yahoo! proprietary BBAuth protocol, which obviates the need to request Yahoo! member credentials at the time of import initiation, as seen in this screenshot from LinkedIn (from April):

Now, like Joseph, I applaud the release of this API, as it provides one more means for individuals to have utter control and access to their friends, colleagues and contacts using a robust protocol.

However, I have to lament yet more needless reinvention of contact schema. Why is this a problem? Well, as I pointed out about Facebook’s approach to developing their own platform methods and formats, having to write and debug against yet another contact schema makes the “tax” of adding support for contact syncing and export increasingly onerous for sites and web services that want to better serve their customers by letting them host and maintain their address book elsewhere.

This isn’t just a problem that I have with Yahoo!. It’s something that I encountered last November with the SREG and proposed Attribute Exchange profile definition. And yet again when Google announced their Contacts API. And then again when Microsoft released theirs! Over and over again we’re seeing better ways of fighting the password anti-pattern flow of inviting friends to new social services, but having to implement support for countless contact schemas. What we need is one common contacts interchange format and I strongly suggest that it inherit from vcard with allowances or extension points for contemporary trends in social networking profile data.

I’ve gone ahead and whipped up a comparison matrix between the primary contact schemas to demonstrate the mess we’re in.

Below, I have a subset of the complete matrix to give you a sense for where we’re at with OpenSocial (né GData), Yahoo Address Book API and Microsoft’s Windows Live Contacts API, and include vcard (RFC 2426) as the cardinal format towards which subsequent schemas should converge:

	vcard	OpenSocial 0.8	Windows Live Contacts API	Yahoo Address Book API
UID	uid url	id	cid	cid
Nickname	nickname	nickname	NickName	nickname
Full Name	n or fn	name	NameTitle, FirstName, MiddleName, LastName, Suffix	name
First name	n (given-name)	given_name	FirstName	name (first)
Last name	n (family-name)	family_name	LastName	name (last)
Birthday	bday	date_of_birth	Birthdate	birthday (day, month, year)
Anniversary			Anniversary	anniversary (day, month, year)
Gender	gender	gender	gender
Email	email	email	Email (ID, EmailType, Address, IsIMEnabled, IsDefault)	email
Street	street-address	street-address	StreetLine	street
Postal Code	postal-code	postal-code	PostalCode	zip
City	locality	locality
State	region	region	PrimaryCity	state
Country	country-name	country	CountryRegion	country
Latitutude	geo (latitude)	latitude	latitude
Longitude	geo (longitude)	longitude	longitude
Language	N/A			N/A
Phone	tel (type, value)	phone (number, type)	Phone (ID, PhoneType, Number, IsIMEnabled, IsDefault)	phone
Timezone	tz	time_zone	TimeZone	N/A
Photo	photo	thumbnail_url		N/A
Company	org	organization.name	CompanyName	company
Job Title	title, role	organization.title	JobTitle	jobtitle
Biography	note	about_me		notes
URL	url	url	URI (ID, URIType, Name, Address)	link
Category	category, rel-tag	tags	Tag (ID, Name, ContactIDs)

Facebook, the USSR, communism, and train tracks

Low hills closed in on either side as the train eventually crawled on to high, tabletop grasslands creased with snow. Birds flew at window level. I could see lakes of an unreal cobalt blue to the north. The train pulled into a sprawling rail yard: the Kazakh side of the Kazakhstan-China border.

Workers unhitched the cars, lifted them, one by one, ten feet high with giant jacks, and replaced the wide-gauge Russian undercarriages with narrower ones for the Chinese tracks. Russian gauges, still in use throughout the former Soviet Union, are wider than the world standard. The idea was to the prevent invaders from entering Russia by train. The changeover took hours.

— Robert D. Kaplan, The Ends of the Earth

I read this passage today while sunning myself at Hope Springs Resort near Palm Springs. Tough life, I know.

The passage above immediately made me think of Facebook, and I had visions of the old Facebook logo with a washed out Stalin face next to the wordmark (I’m a visual person). But the thought came from some specific recent developments, and fit into a broader framework that I talked about loosely to Steve Gillmor about on his podcast. I also wrote about it last week, essentially calling for Facebook and Google to come together to co-develop standards for the social web, but, having been reading up on Chinese, Russian, Turkish and Central Asian history, and being a benefactor of the American enterprise system, I’m coming over to Eran and others‘ point that 1) it’s too early to standardize and 2) it probably isn’t necessary anyway. Go ahead, let a thousand flowers bloom.

If I’ve learned anything from Spread Firefox, BarCamp, coworking and the like, it’s that propaganda needs to be free to be effective. In other words, you’re not going to convince people of your way of thinking if you lock down what you have, especially if what you have is culture, a mindset or some other philosophical approach that helps people narrow down what constitutes right and wrong.

Look, if Martin Luther had nailed his Ninety-five Theses to the door but had ensconced them in DRM, he would not have been as effective at bringing about the Reformation.

Likewise, the future of the social web will not be built on proprietary, closed-source protocols and standards. Therefore, it should come as no surprise that Google wants OpenSocial to be an “open standard” and Facebook wants to be the openemest of them all!

The problem is not about being open here. Everyone gets that there’s little marginal competitive advantage to keeping your code closed anymore. Keeping your IP cards close to your chest makes you a worse card player, not better. The problem is with adoption, gaining and maintaining [developer] interest and in stoking distribution. And, that brings me to the fall of the Communism and the USSR, back where I started.

I wasn’t alive back when the Cold War was in its heyday. Maybe I missed something, but let’s just go on the assumption that things are better off now. From what I’m reading in Kaplan’s book, I’d say that the Soviets left not just social, but environmental disaster in their wake. The whole region of Central Asia, at least in the late 90s, was fucked. And while there are many causes, more complex than I can probably comprehend, a lot of it seems to have to do with a lack of cultural identity and a lack of individual agency in the areas affected by, or left behind by, Communist rule.

Now, when we talk about social networks, I mean, c’mon, I realize that these things aren’t exactly nations, nation-states or even tribal groups warring for control of natural resources, food, potable water, and so forth. BUT, the members of social networks number in the millions in some cases, and it would be foolish not to appreciate that the borders — the meticulously crafted hardline boundaries between digital nation-states — are going to be redrawn when the battle for cultural dominance between Google (et al) and Facebook is done. It’s not the same caliber of détente that we saw during the Cold War but it’s certainly a situation where two sides with very different ideological bents are competing to determine the nature of the future of the [world]. On the one hand, we have a nanny state who thinks that it knows best and needs to protect its users from themselves, and on the other, a lassé-faire-trusting band of bros who are looking to the free market to inform the design of the Social Web writ large. On the one hand, there’s uncertainty about how to build a “national identity”-slash-business on top of lots of user data (that, oh yeah, I thought was supposed to be “owned” by the creators), and on the other, a model of the web, that embraces all its failings, nuances and spaghetti code, but that, more than likely, will stand the test of time as a durable provider of the kind of liberty and agency and free choice that wins out time and again throughout history.

That Facebook is attempting to open source its platform, to me, sounds like offering the world a different rail gauge specification for building train tracks. It may be better, it may be slicker, but the flip side is that the Russians used the same tactic to try to keep people from having any kind of competitive advantage over their people or influence over how they did business. You can do the math, but look where it got’em.

S’all I’m sayin’.

Machine tagging relationships

I’ve been doing quite a bit of thinking about how to represent relationships in portable contact lists. Many of my concerns stem from two basic problems:

Relationships in one context don’t necessarily translate directly into new contexts. When we talk about making relationships “portable”, we can’t forget that a friend on one system isn’t necessarily the same kind of friend on another system (if at all) even if the other context uses the same label.
The semantics of a relationship should not form the basis for globally setting permissions. That is, just because someone is marked (perhaps accurately) as a family member does not always mean that that individual should be granted elevated permissions just because they’re “family”. While this approach works for Flickr, where how you classify a relationship (Contact, Friend, Family) determines what that contact can (or can’t) see, semantics alone shouldn’t determine how permissions are assigned.

Now, stepping back, it’s worth pointing out that I’m going on a basic presumption here that moving relationships from one site to another is valuable and beneficial. I also presume that the more convenient it is to find or connect with people who I already know (or have established acquaintance with) on a site will lead me to explore and discover that site’s actual features faster, rather than getting bogged down in finding, inviting and adding friends, which in and of itself has no marginal utility.

Beyond just bringing my friends with me is the opportunity to leverage the categorization I’ve done elsewhere, but that’s where existing formats like XFN and FOAF appear to fall short. On the one hand, we have overlapping terms for relationships that might not mean the same thing in different places, and on the other, we have unique relationship descriptions that might not apply elsewhere (e.g. fellow travelers on Dopplr). This was one of the reasons why I proposed focusing on the “contact” and “me” relationships in XFN (I mean really, what can you actually do if you know that a particular contact is a “muse” or “kin”?). Still, if metadata about a relationship exists, we shouldn’t just discard it, so how then might we express it?

Well, to keep the solution as simple and generalizable as possible, we’d see that the kinds of relationships and the semantics which we use to describe relationships can be reduced to tags. Given a context, it’s fair to infer that other relationships of the same class in the same context are equivalent. So, if I mark two people as “friends” on Flickr, they are equally “Flickr friends”. Likewise on Twitter, all people who I follow are equally “followed”. Now, take the link-rel approach from HTML, and we have a shorthand attribute (“rel”) that we can use to create a machine tag that follows the standard namespace:predicate=value format, like so:


flickr:rel=friend
flickr:rel=family
twitter:rel=followed
dopplr:rel=fellow-traveler
xfn:rel=friend
foaf:rel=knows

Imagine being able to pass your relationships between sites as a series of machine tagged URLs, where you can now say “I want to share this content with all my [contacts|friends|family members] from [Flickr]” or “Share all my restaurant reviews from this trip with my [fellow travelers] from [Dopplr|TripIt].” By machine tagging relationships, not only do we maintain the fidelity of the relationship with context, but we inherit a means of querying against this dataset in a way that maps to the origin of the relationship.

Furthermore, this would enable sites to use relationship classification models from other sites. For example, a site like Pownce could use the “Twitter model” of followers and followed; SmugMug could use Flickr’s model of contacts, friends and family; Basecamp could use Plaxo’s model of business, friend and family.

Dumping this data into a JSON-based format like jCard would also be straight-forward:


{
  "uid": "plaxo-12345",
  "fn": "Joseph Smarr",
  "url": [
    { "value": "http://josephsmarr.com", "type": "home" },
    { "value": "http://josephsmarr.com", "type": "blog" },
  ],
  "category": [ 
    { "value": "favorite" },
    { "value": "plaxo employee" }, 
    { "value": "xfn:rel=met" },
    { "value": "xfn:rel=friend" },
    { "value": "xfn:rel=colleague" },
    { "value": "flickr:rel=friend" },
    { "value": "dopplr:rel=fellow-traveler" },
    { "value": "twitter:rel=follower" } 
  ],
  "created": "2008-05-24T12:00:00Z",
  "modified": "2008-05-25T12:34:56Z"
}

I’m curious to know whether this approach would be useful, or what other possibilities might result from having this kind of data. I like it because it’s simple, it uses a prior convention (most widely supported on Flickr and Upcoming), it maintains original context and semantics. It also means that, rather than having to list every account for a contact as a serialized list with associated rel-values, we’re only dealing in highly portable tags.

I’m thinking that this would be very useful for DiSo, and when importing friends from remote sites, we’ll be sure to index this kind of information.

I’m joining Vidoop to work on DiSo full time

Well, Twitter, along with Marshall and his post on ReadWriteWeb, beat me to it, but I’m pretty excited to announce that, yes, I am joining Vidoop, along with Will Norris, to work full time on the DiSo (distributed social) Project.

For quite some time I’ve wanted to get the chance to get back to focusing on the work that I started with Flock — and that I’ve continued, more or less, with my involvement and advocacy of projects like microformats, OpenID and OAuth. These projects don’t accidentally relate to people using technology to behave socially: they exist to make it easier, and better, for people to use the web (and related technologies) to connect with one another safely, confidently, and without the need to to sign up with any particular network just to talk to their friends and people that they care about.

The reality is that people have long been able to connect to one another using technology — what was the first telegraph transmission if not the earliest poke heard round the world? The problem that we have today is that, with the proliferation of fairly large, non-interoperable social networks, it’s not as easy as email or telephones have been to connect to people, and so, the next generation of social networks are invariably going to need to make the process of connecting over the divides easier, safer and with less friction if people really are going to, as expected, continue to increase their use of the web for communication and social interaction.

So what is the DiSo Project?

The DiSo Project has humble roots. Basically Steve Ivy and I started hacking on a plugin that I’d written that added hcards to your contact list or blogroll. It was really stupidly simple, but when we combined it with Will Norris’ OpenID plugin, we realized that we were on to something — since contact lists were already represented as URLs, we now had a way to verify whether the person who ostensibly owned one of those URLs was leaving a comment, or signing in, and we could thereby add new features, expose private content or any number of other interesting social networking-like thing!

This lead me to start “sketching” ideas for WordPress plugins that would be useful in a distributed social network, and eventually Steve came up with the name, registered the domain, and we were off!

Since then, Stephen Paul Weber has jumped in and released additional plugins for OAuth, XRDS-Simple, actionstreams and profile import, and this was when the project was just a side project.

What’s this mean?

Working full time on this means that Will and I should be able to make much more progress, much more quickly, and to work with other projects and representatives from efforts like Drupal, BuddyPress and MovableType to get interop happening (eventually) between each project’s implementation.

Will and I will eventually be setting up an office in San Francisco, likely a shared office space (hybrid coworking), so if you’re a small company looking for a space in the city, let’s talk.

Meanwhile, if you want to know more about DiSo in particular, you should probably just check out the interview I did with myself about DiSo to get caught up to speed.

. . .

I’ll probably post more details later on, but for now I’m stoked to have the opportunity to work with a really talented and energized group of folks to work on the social layer of the open web.

Thoughts on DataPortability

Introduction

Over the last several days I’ve started and abandoned four drafts of this post. Usually it doesn’t take me this long to write out my thoughts, or to go through so many different approaches, but I wanted to express myself as clearly as I could given the amount and overlapping texture of what I wanted to say. I ended up gutting a lot, and tried to focus on some basics, making as few assumptions about the reader (you) as possible.

The reality is that I’m eyeballs-deep in this stuff, and realized that in earlier drafts, I had included a lot of subtext that just wasn’t helping me get my message across and that really only made sense to other folks similarly in the thick of it.

So I got rid of the subterfuge and divided this up into four sections, inspired by a conversation I had with Brynn.

I encourage and invite feedback, but I would prefer to discuss the substance of what I’m arguing, rather than focusing on tit-for-tat squabbly disagreements.

What is data portability?
How does DataPortability (DP) relate to OpenID?
Are there risks associated with DataPortability?
What’s good about DataPortability?

What is data portability?

Contrary to what some folks have argued, I think that the semantics and meaning of the phrase “data portability” are important. To me data portability denotes the act of moving data from one place to another, and that the data should, therefore, be thought of like a physical thing, with physical properties.

Let me draw an analogy here to illustrate the problem with this model.

Take an iPod. With an iPod, you literally copy files from one device to another — for example, from your laptop to your iPod. This is, on the one hand, a limitation imposed by a lack of connectivity and restrictions in copyright law, but on the other, is actually by design. This scenario is not altogether unmanageable unless you have dozens of iPods that you want to sync up with your music, especially if you don’t typically think to connect your iPod every time you add new music, create new playlists or otherwise change your music library.

Now take an always-connected player, like Pandora Mobile, where the model works by federating continuous access from a central source — to consuming devices that play back music. Ignoring the DMCA restrictions that make it impossible for Pandora to let you listen to what you want on demand, the point is that, rather than making numerous copies across many unaffiliated and disconnected devices, Pandora affords a consistent experience and uniform access by streaming live data to any device that is authorized (and is online).

The former model (the iPod) is what you might call the “desktop model of data portability”. Certainly you can copy your data and take it with you, but it doesn’t reflect a model where always-on connectivity is assumed, which is the situation with online social networks. The offline model works well for physical devices that don’t require an internet connection to function — but it is a model that fails for services like Pandora, that requires connectivity, and whose value derives from ready access to up-to-date and current information, streamed and accessible from anywhere (well, except in Canada).

It’s nuance, but it’s critical to conceptualizing the value and import of this shift, and it’s nuance which I think is often left out of the explanation of “DataPortability” (whose official definition is the option to share or move your personal data between trusted applications and vendors (emphasis added)). In my mind, when the arena of application is the open, always-on, hyper-connected web, constructing best practices using an offline model of data is fraught with fundamental problems and distractions and is ultimately destined to fail, since the phrase is immediately obsolete, unable to capture in its essence contemporary developments in the cloud concept of computing (which consists of follow-your-nose URIs and URLs rather than discreet harddrives), and in the move towards push-based subscription models that are real-time and addressable.

So if you ask me what is “data portability”, I’ll concede that it’s a symbol for starting a conversation about what’s wrong with the state of social networks. Beyond that, I think there’s a great danger that, as a result of framing the current opportunity around “data portability”, the story that will get picked up and retold will be the about copying data between social networks, rather than the more compelling, more future-facing, and frankly more likely situation of data streaming from trusted brokered sources to downstream authorized consumers. But, I guess “copying” and “moving” data is easier to grasp conceptually, and so that’s what I think a lot of people will think when they hear the phrase. In any case, it gets the conversation started, and from there, where it goes, is anyone’s guess.

How does DataPortability (DP) relate to OpenID?

OpenID, along with OAuth, microformats, RSS, OPML, RDF, APML and XMPP are all open and non-proprietary technologies — formats and protocols — that grace the DataPortability homepage. How they ended up on the homepage, or what selection criteria is used to pick them, is beyond me (for example, I would have added ATOM to the list). So the best way that I can describe the relationship between any of these technologies and DataPortability is that, at some point, the powers that be within the group decided to throw a logo on their homepage and add it to their “social software stack”.

To reiterate (and I won’t speak for the OpenID Foundation since I’m unfamiliar with any conversations that they might have had with DP), no one necessarily asked if it would be okay to put the OAuth or microformats logos on the homepage of DP, or to include those technologies in the DP stack. They just did it. It wasn’t like DP had been around for awhile with a mandate to develop best practices for the future of social networks, and groups like the microformats community petitioned or was nominated to be included. They simply were. There was no process, as far as I’m aware, as to what was included, and what was not.

So while OpenID and the other technologies may be part of the technologies recommended by DP, it should be known that there really is no official relationship between these efforts and DP (though it is true that many members of each group coordinate, meet and discuss related topics, for example, at tomorrow’s Internet Identity Workshop, and at events like the Data Sharing Summit).

Beyond that, it should be noted that OpenID, OAuth, microformats et al have been in development for the last several years, and have been building up momentum and communities all on their own, without and prior to the existence of the DP initiative. In fact, the DP project really only got its start last November with an idea presented by Josh Patterson and Josh Lewis called WRFS, or the “Web Relational File System”. At the time, the WRFS was intended to serve as a “reference design” for describing how data portability should work and this was to serve as the foundation of the DP recommendations.

In January, after ongoing discussions, Josh decided that it would be best to spin WRFS off into its own project and started a separate mailing list, leaving DP to focus exclusively on evangelizing existing technologies and communities and, in the oft-repeated words of Chris Saad, to invent nothing new (a mantra inherited from the OAuth and microformats efforts).

Are there risks associated with DataPortability?

If you accept that DP is primarily a symbol for starting the conversation about transforming social networks from walled gardens into interoperating, seamful web services, then no, not really. If you believe or buy into the hype, or blindly follow the forthcoming “technical specifications“, I see significant risks that need to addressed.

First, DP does not speak for the community as a whole, for any specific social network (except, perhaps, MySpace), or for any individuals except those who publicly align themselves with the group. On too many occasions to feel comfortable about, I’ve seen or read members of the DP project claim authority far beyond any reasonable mandate, which to me have read like attempts to seize control and influence that not only isn’t justified, but that shouldn’t be ascribed to any individual or organization. I worry that this hubris (conceivably a result of proximity to certain A-Listers) is leading them to take more credit than they’re due, and in consequence, folks interested but previously uninitiated with any of the core technologies will be lead to believe that the DataPortability group is responsible and in control of those technologies. Furthermore, if it is the case that people are mislead, I have little faith that folks from the DP project will prevent themselves from speaking on behalf of (or pseudo-knowledgeably about) those technologies, leading to confusion and potential damage.

Second, I have a great deal of concern about the experiences and priorities that are playing into the group’s approach to privacy, security, publicity and disclosure. These are concerns that I would have with any effort that aims to bridge different social or commercial contexts where norms and expectations have already been established, and where there exists few examples (apart from Beacon) of how people actually respond to semi-automatic social network cross-fertilization. Not that privacy isn’t a hot topic on the DP mailing lists, it’s just that statements like this one reflects fishtailing in the definition and approach to privacy from a leader of the group, and that I worry could skid wildly out of control if clarity on how to achieve these dictims isn’t developed very soon:

The thing is that while Privacy is certainly important, in the end these are *social* platforms. By definition they are about sharing. The problem with Facebook Beacon was not that it was sharing, but rather it was sharing the WRONG information in the WRONG way.

Also again, don’t forget, just because data is portable or accessible does NOT mean it is public or ‘open’. This is why I stayed away from the ‘Open Data’ terminology when thinking up DataPortability. Just like a Hard Drive and a PC that runs certain applications, ultimately the applications that USE the data that need to ensure they treat the data with respect – or users will simply stop using them.

[. . .]

You are right that DP should NOT be positioned that Privacy is not important – that is certainly not my intention with my answers. But being important and being a major sticking point is two different things.

Again I tend to think of this as one big Hard Disk. While you provide read/write permissions to folders on a network (for privacy) it is ultimately up to the people and applications you trust to respect your privacy and not just start emailing your word docs to your friends.

So if the second risk is that an unrealistic, naive or incomplete model of privacy [coupled with a lack of effective enforcement mechanisms in the case of fraud or abuse] will be promoted by the DP group, the third risk is that groups or communities that are roped into the DP initiative may open themselves up to a latent social backlash should something go wrong with specific implementations of DataPortability best practices. Specifically, if the final privacy model demands certain approaches to user data, and companies or organizations go along with them by adopting the provided “social technology stack” (i.e. libraries offered that implement the DP data model), the technical implementation may be flawless, but if people’s data starts showing up in places where they didn’t expect it to, they may reject the whole notion of “data portability” and seek to retreat back to the days of “safe” walled gardens of today. And it may be that, because of the emphasis on specific technologies in the DP group’s propaganda, that brands like OpenID and OAuth will become associated with negative experiences, like downloadable .exes in email are today. It’s not a foregone conclusion in my mind that this future is inevitable, but it’s one that the individual groups affected should avoid at all costs, if only because of the significant progress we’ve made to date on our own, and it would be a shame if ignorance or lack of clear communication about the proper methods of adoption and implementation of these technologies lead people to blame the technology means instead of particular instances of its application.

What’s good about DataPortability?

I don’t want to just be a negative creep, so I do think that there is a silver lining to the DP initiative, which I mentioned earlier: it provides a token phrase that we can throw around to tease out some of the more gnarly issues involved in developing future social applications. It is about having a conversation.

While OpenID and OAuth have actual technology and implementations behind them, they also serve as symbols for having conversations about identity and authorization, respectively. Similarly, microformats helps us to think about lightweight semantic markup that we can embed in human-friendly web pages that are also compatible with today’s web browsers, and that additionally make those pages easier for machines to parse. And before these symbols, we had AJAX and Web 2.0, both of which, during their inception, were equally controversial and offensive to the folks who knew the details of the underlying technological innovation behind the terms but who also stood to lose their shamanic positions if simpler language were adopted as the conversations migrated into the mainstream.

Now, is there a risk that we might lose some of the nuance and sophistication with which we data junkies and user-centric identity advocates communicate if we adopt a less precise term to describe the present trends towards interoperable social networks? Absolutely. But this also means that, as the phrase “data portability” makes its way into common conversation, people can begin to think about their social networking activities and what they take for granted (“Wait, you mean that I wouldn’t have to sign up for a new account on my friend’s social network just to send them a photo? Really?”), and to realize that the way things are today not only aren’t the way that they have to be, but that there is a better way for social applications to be designed, architected and presented, that give the enthusiasts and customers of these services greater choice and greater latitude to actually pick services that — what else? — serve them best!

So just as Firefox gave rise to a generation of web developers that take web standards much more seriously, and have in turn recognized and capitalized on the power of having a “rectangle” that actually behaves in a way that they expect (meaning that it fully complies with the standards as they’ve been defined), I think the next evolution of the social web is going to be one where we take certain things, like identity, like portable contact lists, like better and more consistent permissioning systems as givens, and as a result, will lead to much more interesting, more compelling, and, perhaps even more lucrative, uses of the open social web.