microformats – Factory Joe

Stowe Boyd launches Microsyntax.org

Stowe Boyd launched Microsyntax.org this morning and announced that I will be the first member of his advisory board.

Stowe and I have batted around a number of ideas for making posts on Twitter contain more information than what is superficially presented, and this new effort should create a space in which ideas, research, proposals and experiments can be made and discussed.

Ultimately, my hope is that Microsyntax.org will reach beyond Twitter and provide a forum for thinking through how we encapsulate data in channels that don’t natively support metadata by using conventions that express as much meaning as much as they encode.

Since I originally proposed hashtags in August of 2007, I’ve thought a lot about what these conventions mean, and how wide adoption of something can radically elevate the field of competition.

There is a similar opportunity here, where, if the discourse is developed properly, such conventions can actually enable a greater range of expression over narrow channels, allowing for wider participation in and understanding of conversations.

Take, for instance, Stowe’s “GeoSlash” (as christened by Ross Mayfield) proposal. Whether his syntax is the right one (or even necessary!) isn’t something that can be argued rationally. It’s only something that can be investigated through experimentation and observation. To this point, there has been no central convening context in which such a proposal could be brought up, debated, discussed, considered, tinkered with, improved, championed and evaluated.

As a result, countless proposals have been made for baking more “meta” into Twitter’s data stream, but few have really taken off (compared with the relative success of hashtags and @replies).

While I’m sympathetic to arguments (and pleas!) against adding additional structure or formatting to tweets, I think that the bigger opportunity here extends beyond Twitter (which is primarily a public broadcast channel) to other applications, regardless of whether they use Twitter as the message routing infrastructure or not. Indeed, given my recent (and very positive) experience where @AlaskaAir checked me in to my flight over Twitter, you can imagine an opportunity developing where, say, forward-thinking airlines actually collaborate to develop a syntax for expressing checkin requests via some sort of direct SMS-based channel.

The situation of multiple competing-yet-overlapping SMS syntaxes lead me, somewhat mockingly, to start documenting what I called “picoformats“. If I’ve learned anything from the microformats process, it’s that anyone can invent a schema or a format, but getting adoption is the hard part (and also the most valuable). So, in order to promote adoption, you should always try to model behavior that already exists in the wild, and then work to make the intensions of the behavior more clear, repeatable and memorable.

Most microsyntax efforts fail to follow this process, and as a result, fail in the wild. Efforts that employ the scientific method tend to see more success: hashtags modeled the convention started by IRC channels and Jaiku (Joshua Schachter also used the hash to denote tags in the early days of Delicious); the $ticker convention (from StockTwits) follows how many financial trade publications denote stock symbols. And so on.

So when it comes to proposing new behaviors that don’t yet exist in the wild, I think that the Microsyntax.org project will be an excellent place to convene and host conversations and experiments, many of which will admittedly fail. But at minimum, there will be a record of what’s been tried, what the thinking and goals were, and where, hopefully, some modest successes have been achieved.

I’m looking forward to contributing to this effort and helping to stand up the community infrastructure with Stowe. While I’m not eager to see the Twitter stream polluted with characters intended only for computers, I think that there is still much explored ground in what can be accomplished through modest modifications of the way that we communicate over these kinds of narrow, unidimensional channels.

Google Profiles, namespace lock-in & social search

I’d originally intended to respond to Joshua Schacter’s post about URL shorteners and how they’re merely the tip of the data iceberg, but since I missed that debate, Google has fortuitously plied me with an even better example by releasing custom profile URLs today.

My point is to reiterate one of Tim O’Reilly’s ever-prescient admonishments about Web 2.0: lock-in can be achieved through owning a namespace. In full:

5. Chief among the future sources of lock in and competitive advantage will be data, whether through increasing returns from user-generated data (eBay, Amazon reviews, audioscrobbler info in last.fm, email/IM/phone traffic data as soon as someone who owns a lot of that data figures out that’s how to use it to enable social networking apps, GPS and other location data), through owning a namespace (Gracenote/CDDB, Network Solutions), or through proprietary file formats (Microsoft Office, iTunes). (“Data is the Intel Inside”)

(I’ll note that the process of getting advantage from data isn’t necessary a case of companies being “evil.” It’s a natural outcome of network effects applied to user contribution. Being first or best, you will attract the most users, and if your application truly harnesses network effects to get better the more people use it, you will eventually build barriers to entry based purely on the difficulty of building another such database from the ground up when there’s already so much value somewhere else. (This is why no one has yet succeeded in displacing eBay. Once someone is at critical mass, it’s really hard to get people to try something else, even if the software is better.) The question of “don’t be evil” will come up when it’s clear that someone who has amassed this kind of market position has to decide what to do with it, and whether or not they stay open at that point.)

Consider two things:

Google’s mission is to organize the world’s information and make it universally accessible and useful.
Facebook’s mission is to make it easier for people to share and connect.

Owning the “people” namespace will determine whether people see the web through Google’s technicolor glasses or Facebook’s more nuanced and monochrome blue hues.

Curiously, it has been (correctly) argued that Google “doesn’t get social”, a criticism that I generally support. And yet, with their move to more convenient profile URLs that point to profiles that aggregate content from across the web (beating Facebook to the punch), a bigger (albeit incomplete) picture begins to emerge.

When I blogged that my name is not a URL, I wasn’t so much arguing against vanity or custom profile URLs but instead making the point that such things really should go away over time, from a usability perspective.

Let me put it this way: at one point, if you weren’t in the Yellow Pages, you basically didn’t exist. Now imagine there being several competitors to the Yellow Pages — the Red, Green and Blue Pages — each maintaining overlapping but incomplete listings of people. You’re going to want to use the one that has the most complete, exhaustive and easy-to-use list of names, right? And, I bet beyond that, if one of them was able to make the people that you know and actually care about more accessible to you, you’d pick that one over all the others. And this is where owning — and getting people to “live in” — a namespace begins to reveal its significance.

So, it’s telling thing to look at Google and Facebook’s respective approaches to their people search engines and indexes. Indeed, having a readily accessible index of living persons — structured by their connections to one another — will become a necessary precondition to getting social search right (see Aardvark for a related approach, which connects to the Facebook and IM portions of your social graph to facilitate question answering).

As social search and living through your social graph becomes “the norm” (i.e. with increasing reliance on social filtering), Google and Facebook’s ability to create compelling experiences on top of data about you and who you know will come to define and differentiate them.

To date, Google’s profile search has been rather unloved and passed over, but with the new, more convenient profile URLs and the location of profile search at google.com/profiles, I suspect that Google is finally getting serious about social.

Compare Facebook and Google’s search results for my buddy, Dave Morin:

Facebook logged out:

Facebook logged in:

Google results (there’s no difference between logged in and logged out views):

Notice the difference? See how much better Facebook’s search is because it knows which “Dave Morin” is my friend?

Now, consider the profile result when you click through:

Dave’s Facebook profile (logged out):

(Facebook’s logged in profile view is as you’d expect — a typical Facebook profile with stream and wall.)

Now, here’s the clincher. Take a look at Google’s profile for Dave:

Google is able to provide a much richer and simpler profile, that’s much more accessible (without requiring any kind of sign in) because they’ve radically simplified their privacy model on this page (show what you want, and nothing more). Indeed, Google’s made it easier for people to be open — at least with static information — than Facebook!

So much for Facebook’s claim to openness! 😉

Of course, default Google profiles are pretty sparse, but this is just the beginning. (Bonus: both Facebook and Google public profiles support the hcard microformat!)

And the point is: where will you build your online identity? Under whose namespace do you want to exist? (Personally, I choose my own.)

Clearly the battle for the future of the social web is heating up in subtle but significant ways, and Google’s move today shouldn’t be thought of anything less than the opening salvo in moving the battle back to its turf: search.

Where data goes when it dies and other musings

I’ve been wanting to write about Ma.gnolia’s catastrophic data loss last week ever since it happened, but wasn’t quite sure how I wanted to approach it. Larry (Ma.gnolia founder and the sole person who maintained the site) is a good friend of mine, and Ma.gnolia was one of Citizen Agency’s first clients. It’s been painful to see him struggle through this, both personally and professionally, and it’s about the worst possible [preventable] thing that can happen to a Web 2.0 service.

Still, kept in context, it’s made me reconsider some things about the nature and value of open, networked data.

I. How I Learned to Stop Worrying and Love the Bomb

According to Google’s cache of my profile on Ma.gnolia, I had accrued 5758 bookmarks and 6162 tags since I first started using the service August 08, 2004. That’s a lot of data capital to have instantly wiped out. You might think that I’d be angry, or disappointed. But I’m surprising zen about the whole thing. Even if I never got any of my bookmarks back, I don’t think I’d be that upset, and I’m not sure why.

If Flickr went down, I’d be pretty pissed. But Ma.gnolia for me was primarily a tool for publishing — something that I used to broadcast pointers to things that I took a momentary fancy in. There’s a lot of history in my bookmarks, no doubt. In some ways, it’s a record of all the things that I’ve read that I thought might be worth someone else reading (hence why my bookmarks are public), and clearly is a list of things that have affected and informed my thinking on a broad array of topics.

But, the beauty of bookmarks is that they’re secondary references to other things. The payload is elsewhere and distributed. So in some ways, yeah, I mean, there’s a lot of good data there that’s been lost (at least for the moment). But, the reality is that the legacy of my bookmarks are forever imbued in my brain as changes in how my synapses fire. The things that I can’t remember, well, perhaps they weren’t that important to begin with.

II. Start over; the blank slate.

With the money I won from the Google/O’Reilly Open Source award last summer, I decided I’d break down and by myself a new MacBook Pro. As I was initially setting it up, I figured I’d transfer my previous system setup over from my Time Machine backup and just pick up from where I left off.

I did this, but once I logged in, the new MacBook lost it’s feeling of newness, and I felt encumbered. What amounted to bit-for-bit data portability left me feeling claustrophobic and restricted. I wanted the freedom of a clean system back; somehow buying a new machine wasn’t just about better performance, but about giving myself license to forget and to start over and to make new mistakes.

I wiped the hard drive and reinstalled OS X with the minimum options. I’ve installed about ten apps so far, and I intend to hold off on anything that I don’t feel an absolute need to install, taking a hint from Ethan Kaplan:

III. And the band played on

While I love the form-factor of my MacBook Air (now my previous system), the first generation just isn’t fast enough or beefy enough for the way that I use a Mac. It’s great for email and traveling and it really is the machine that I want to be using — just with better performance (though I hear the new models are much better).

Because the hard drive on the thing is pretty miniscule by today’s standards (80GB), I quickly maxed it out with music, videos, photos and screenshots. I was down to about 6GB of space, and OS X crawls when it can’t cache the shit out of everything so I decided to take aggressive action and deleted my entire 30GB iTunes library.

Command-A. Command-Delete. Empty Trash.

And then it was done.

Now, I still need iTunes for iPhone syncing, but now I have no local music store. With the combination of Spotify, SimplifyMedia and Pandora (using PandoraJam or PandoraBoy), I’ve got a good selection of music wherever I’ve got wifi.

The act of deleting my entire music library (okay fine, I do have a complete backup on my Mac Mini media center) was cathartic. All that data… in an instant, gone. All those ratings, all that metadata, all those play counts revealing my accumulated listening habits. Gone (well, except for my Last.fm’s profile).

Of course, it’s not like I had original, irreplaceable copies of these tracks. There are copies upon copies out there. And knowing this, I intentionally destroyed all this data without really worrying about whether I’d ever be able to re-experience or relive my music again. In fact, I didn’t even give it a thought.

But my system sure seems a bit faster now.

IV. Microformats are the vinyl of the web

The first thing that I thought about when I heard that Ma.gnolia had had “catastrophic data loss” was that Google and Yahoo probably had pretty good caches of the site, especially given its historically high PageRank. The second thing that I thought about was that, since the site was microformatted with XFN, xFolk and other formats, recovering structured data from these caches would likely be most reliable way of externally reconstituting Ma.gnolia, in lieu of other, more conventional data retrieval methods.

Though Larry is still engaged in a full out recovery process, it gave me some sense of pride and optimism that we had had the forethought to mark up Ma.gnolia with microformats. Indeed, this kind of archival purpose was something that Tantek had presaged in 2006:

Microformats from the beginning in my mind are serving two very important purposes.

Microformats provide simple ways of identifying larger chunks of information on the Web for easily and immediately publishing, sharing, moving, aggregating, and republishing.

Microformats are perhaps a step forward in providing building blocks for the longevity of higher fidelity information as well.

In talking with Tantek about this, he pointed out some interesting things about many modern web services, lamenting their apparent lack of concern over longevity. For example, clearly there is a great deal of movement afoot to advance the state of distributed social networking, as evidenced by XML and JSON-based protocols like Portable Contacts and Activity Streams. But these are primarily transaction-based protocols, and archive poorly (another argument for RESTful architectural, certainly).

I would therefore agree with Tantek’s oft-repeated admonishment that services that are serious about their data should always start by marking up their sites with microformats and then add additional APIs to provide functionality (as TripIt did). It’s simply good data hygiene. It’s also about the separation between form and function (or data and interactivity). And with emerging technologies like YQL, people can now build arbitrary mashups from the HTML on your homepage, without even having to know about your custom API.

It also means that, in the event of catastrophe (Ma.gnolia’s case) or dissolution of a service (as in the cases of Pownce, Journalspace or Consumating), there is some hope for data refugees left out in the cold.

When APIs go dark, how do you do a data backup? (Answer: you often can’t.) With public, microformatted content, there will likely be a public archive that can be used to reconstitute at least portions of the service. With dynamic APIs and proprietary data formats, all bets are off.

V. Death and data reincarnation

With both the intentional and unintentional destruction of data recently, it’s given me lots to ponder about in terms of the value, relevance, importance and longevity of data.

I talk about “data capital” like it matters, because I suppose I want it to, and hope that someday it does make a difference just how much of yourself you share with the world, simply because it’s better to share than not to.

And now I’m in this funny situation where, because I did share, and shared openly (specifically on Ma.gnolia), there is the very real possibility of reincarnating my data from the ether of the web. It could just be that all the private data, including messages, private bookmarks and thanks are forever gone, because they were kept private. But those things which were made available to anyone and everyone, through that simple aspect, can be reconstituted by extracting their essence from the caches of the internet’s memory banks.

You think about photographs of people who have died, and of videos and other media. In the past several years we’ve had to start thinking about what happens to social networking profiles on Facebook, MySpace and Twitter of people who are no longer with us. Over time, societies have invented symbols and rituals to commemorate the dead, and often use items imbued with the deceased’s social residue to help them remember and recall and relive.

How do that work when those items are locked away in incompatible and proprietary data stores? How do we cope when technology gets between humans and their humanity?

The web is a fragile place it turns out, in spite of its redundancy and distributed design.

Efforts that threaten to close it up, lock it down or wall it into proprietary gardens are turning the web against us, against history and against civilization and the collective memory. This is perhaps one reason of the primary reasons why the open web is so important to me, and factors in so centrally to my work. As I grow older, perhaps I won’t always have perspective on which things will be the most important to me, but it’s critical that in the future, I don’t inhibit my and my progeny’s ability to access my digital legacy.

Ma.gnolia logo I find it fitting that Ma.gnolia uses an organic symbol as its logo. It has, for all intents and purposes, died.

But there is a silver lining here, and I think Larry intuitively understands: in the Ma.gnolia Open Source (M2) project, he had already sowed the seeds for Ma.gnolia’s rebirth. Though it is lamentable that a such disaster would occur, I believe that creative destruction is absolutely necessary to natural systems, as forest fires are critical to the lifecycle of forests.

I also believe that things happen for a reason and that the soil of this tragedy will lead to a new start and new growth. It’s not accidental that the design of M2 called for a distributed, redundant mesh of independent bookmarking service endpoints. If anything, this situation provides Larry license to start anew, proving the necessity of death, and the wisdom of genetic inheritance and variation.

Inventing contact schemas for fun and profit! (Ugh)

And then there were three.

Today, Yahoo! announced the public availability of their own Address Book API. Though Plaxo and LinkedIn have been using this API behind the scenes for a short while, today marks the first time the API is available for anyone who registers for an App ID to make use of the bi-directional protocol.

The API is shielded behind Yahoo! proprietary BBAuth protocol, which obviates the need to request Yahoo! member credentials at the time of import initiation, as seen in this screenshot from LinkedIn (from April):

Now, like Joseph, I applaud the release of this API, as it provides one more means for individuals to have utter control and access to their friends, colleagues and contacts using a robust protocol.

However, I have to lament yet more needless reinvention of contact schema. Why is this a problem? Well, as I pointed out about Facebook’s approach to developing their own platform methods and formats, having to write and debug against yet another contact schema makes the “tax” of adding support for contact syncing and export increasingly onerous for sites and web services that want to better serve their customers by letting them host and maintain their address book elsewhere.

This isn’t just a problem that I have with Yahoo!. It’s something that I encountered last November with the SREG and proposed Attribute Exchange profile definition. And yet again when Google announced their Contacts API. And then again when Microsoft released theirs! Over and over again we’re seeing better ways of fighting the password anti-pattern flow of inviting friends to new social services, but having to implement support for countless contact schemas. What we need is one common contacts interchange format and I strongly suggest that it inherit from vcard with allowances or extension points for contemporary trends in social networking profile data.

I’ve gone ahead and whipped up a comparison matrix between the primary contact schemas to demonstrate the mess we’re in.

Below, I have a subset of the complete matrix to give you a sense for where we’re at with OpenSocial (né GData), Yahoo Address Book API and Microsoft’s Windows Live Contacts API, and include vcard (RFC 2426) as the cardinal format towards which subsequent schemas should converge:

	vcard	OpenSocial 0.8	Windows Live Contacts API	Yahoo Address Book API
UID	uid url	id	cid	cid
Nickname	nickname	nickname	NickName	nickname
Full Name	n or fn	name	NameTitle, FirstName, MiddleName, LastName, Suffix	name
First name	n (given-name)	given_name	FirstName	name (first)
Last name	n (family-name)	family_name	LastName	name (last)
Birthday	bday	date_of_birth	Birthdate	birthday (day, month, year)
Anniversary			Anniversary	anniversary (day, month, year)
Gender	gender	gender	gender
Email	email	email	Email (ID, EmailType, Address, IsIMEnabled, IsDefault)	email
Street	street-address	street-address	StreetLine	street
Postal Code	postal-code	postal-code	PostalCode	zip
City	locality	locality
State	region	region	PrimaryCity	state
Country	country-name	country	CountryRegion	country
Latitutude	geo (latitude)	latitude	latitude
Longitude	geo (longitude)	longitude	longitude
Language	N/A			N/A
Phone	tel (type, value)	phone (number, type)	Phone (ID, PhoneType, Number, IsIMEnabled, IsDefault)	phone
Timezone	tz	time_zone	TimeZone	N/A
Photo	photo	thumbnail_url		N/A
Company	org	organization.name	CompanyName	company
Job Title	title, role	organization.title	JobTitle	jobtitle
Biography	note	about_me		notes
URL	url	url	URI (ID, URIType, Name, Address)	link
Category	category, rel-tag	tags	Tag (ID, Name, ContactIDs)

Machine tagging relationships

I’ve been doing quite a bit of thinking about how to represent relationships in portable contact lists. Many of my concerns stem from two basic problems:

Relationships in one context don’t necessarily translate directly into new contexts. When we talk about making relationships “portable”, we can’t forget that a friend on one system isn’t necessarily the same kind of friend on another system (if at all) even if the other context uses the same label.
The semantics of a relationship should not form the basis for globally setting permissions. That is, just because someone is marked (perhaps accurately) as a family member does not always mean that that individual should be granted elevated permissions just because they’re “family”. While this approach works for Flickr, where how you classify a relationship (Contact, Friend, Family) determines what that contact can (or can’t) see, semantics alone shouldn’t determine how permissions are assigned.

Now, stepping back, it’s worth pointing out that I’m going on a basic presumption here that moving relationships from one site to another is valuable and beneficial. I also presume that the more convenient it is to find or connect with people who I already know (or have established acquaintance with) on a site will lead me to explore and discover that site’s actual features faster, rather than getting bogged down in finding, inviting and adding friends, which in and of itself has no marginal utility.

Beyond just bringing my friends with me is the opportunity to leverage the categorization I’ve done elsewhere, but that’s where existing formats like XFN and FOAF appear to fall short. On the one hand, we have overlapping terms for relationships that might not mean the same thing in different places, and on the other, we have unique relationship descriptions that might not apply elsewhere (e.g. fellow travelers on Dopplr). This was one of the reasons why I proposed focusing on the “contact” and “me” relationships in XFN (I mean really, what can you actually do if you know that a particular contact is a “muse” or “kin”?). Still, if metadata about a relationship exists, we shouldn’t just discard it, so how then might we express it?

Well, to keep the solution as simple and generalizable as possible, we’d see that the kinds of relationships and the semantics which we use to describe relationships can be reduced to tags. Given a context, it’s fair to infer that other relationships of the same class in the same context are equivalent. So, if I mark two people as “friends” on Flickr, they are equally “Flickr friends”. Likewise on Twitter, all people who I follow are equally “followed”. Now, take the link-rel approach from HTML, and we have a shorthand attribute (“rel”) that we can use to create a machine tag that follows the standard namespace:predicate=value format, like so:


flickr:rel=friend
flickr:rel=family
twitter:rel=followed
dopplr:rel=fellow-traveler
xfn:rel=friend
foaf:rel=knows

Imagine being able to pass your relationships between sites as a series of machine tagged URLs, where you can now say “I want to share this content with all my [contacts|friends|family members] from [Flickr]” or “Share all my restaurant reviews from this trip with my [fellow travelers] from [Dopplr|TripIt].” By machine tagging relationships, not only do we maintain the fidelity of the relationship with context, but we inherit a means of querying against this dataset in a way that maps to the origin of the relationship.

Furthermore, this would enable sites to use relationship classification models from other sites. For example, a site like Pownce could use the “Twitter model” of followers and followed; SmugMug could use Flickr’s model of contacts, friends and family; Basecamp could use Plaxo’s model of business, friend and family.

Dumping this data into a JSON-based format like jCard would also be straight-forward:


{
  "uid": "plaxo-12345",
  "fn": "Joseph Smarr",
  "url": [
    { "value": "http://josephsmarr.com", "type": "home" },
    { "value": "http://josephsmarr.com", "type": "blog" },
  ],
  "category": [ 
    { "value": "favorite" },
    { "value": "plaxo employee" }, 
    { "value": "xfn:rel=met" },
    { "value": "xfn:rel=friend" },
    { "value": "xfn:rel=colleague" },
    { "value": "flickr:rel=friend" },
    { "value": "dopplr:rel=fellow-traveler" },
    { "value": "twitter:rel=follower" } 
  ],
  "created": "2008-05-24T12:00:00Z",
  "modified": "2008-05-25T12:34:56Z"
}

I’m curious to know whether this approach would be useful, or what other possibilities might result from having this kind of data. I like it because it’s simple, it uses a prior convention (most widely supported on Flickr and Upcoming), it maintains original context and semantics. It also means that, rather than having to list every account for a contact as a serialized list with associated rel-values, we’re only dealing in highly portable tags.

I’m thinking that this would be very useful for DiSo, and when importing friends from remote sites, we’ll be sure to index this kind of information.

I’m joining Vidoop to work on DiSo full time

Well, Twitter, along with Marshall and his post on ReadWriteWeb, beat me to it, but I’m pretty excited to announce that, yes, I am joining Vidoop, along with Will Norris, to work full time on the DiSo (distributed social) Project.

For quite some time I’ve wanted to get the chance to get back to focusing on the work that I started with Flock — and that I’ve continued, more or less, with my involvement and advocacy of projects like microformats, OpenID and OAuth. These projects don’t accidentally relate to people using technology to behave socially: they exist to make it easier, and better, for people to use the web (and related technologies) to connect with one another safely, confidently, and without the need to to sign up with any particular network just to talk to their friends and people that they care about.

The reality is that people have long been able to connect to one another using technology — what was the first telegraph transmission if not the earliest poke heard round the world? The problem that we have today is that, with the proliferation of fairly large, non-interoperable social networks, it’s not as easy as email or telephones have been to connect to people, and so, the next generation of social networks are invariably going to need to make the process of connecting over the divides easier, safer and with less friction if people really are going to, as expected, continue to increase their use of the web for communication and social interaction.

So what is the DiSo Project?

The DiSo Project has humble roots. Basically Steve Ivy and I started hacking on a plugin that I’d written that added hcards to your contact list or blogroll. It was really stupidly simple, but when we combined it with Will Norris’ OpenID plugin, we realized that we were on to something — since contact lists were already represented as URLs, we now had a way to verify whether the person who ostensibly owned one of those URLs was leaving a comment, or signing in, and we could thereby add new features, expose private content or any number of other interesting social networking-like thing!

This lead me to start “sketching” ideas for WordPress plugins that would be useful in a distributed social network, and eventually Steve came up with the name, registered the domain, and we were off!

Since then, Stephen Paul Weber has jumped in and released additional plugins for OAuth, XRDS-Simple, actionstreams and profile import, and this was when the project was just a side project.

What’s this mean?

Working full time on this means that Will and I should be able to make much more progress, much more quickly, and to work with other projects and representatives from efforts like Drupal, BuddyPress and MovableType to get interop happening (eventually) between each project’s implementation.

Will and I will eventually be setting up an office in San Francisco, likely a shared office space (hybrid coworking), so if you’re a small company looking for a space in the city, let’s talk.

Meanwhile, if you want to know more about DiSo in particular, you should probably just check out the interview I did with myself about DiSo to get caught up to speed.

. . .

I’ll probably post more details later on, but for now I’m stoked to have the opportunity to work with a really talented and energized group of folks to work on the social layer of the open web.

Thoughts on DataPortability

Introduction

Over the last several days I’ve started and abandoned four drafts of this post. Usually it doesn’t take me this long to write out my thoughts, or to go through so many different approaches, but I wanted to express myself as clearly as I could given the amount and overlapping texture of what I wanted to say. I ended up gutting a lot, and tried to focus on some basics, making as few assumptions about the reader (you) as possible.

The reality is that I’m eyeballs-deep in this stuff, and realized that in earlier drafts, I had included a lot of subtext that just wasn’t helping me get my message across and that really only made sense to other folks similarly in the thick of it.

So I got rid of the subterfuge and divided this up into four sections, inspired by a conversation I had with Brynn.

I encourage and invite feedback, but I would prefer to discuss the substance of what I’m arguing, rather than focusing on tit-for-tat squabbly disagreements.

What is data portability?
How does DataPortability (DP) relate to OpenID?
Are there risks associated with DataPortability?
What’s good about DataPortability?

What is data portability?

Contrary to what some folks have argued, I think that the semantics and meaning of the phrase “data portability” are important. To me data portability denotes the act of moving data from one place to another, and that the data should, therefore, be thought of like a physical thing, with physical properties.

Let me draw an analogy here to illustrate the problem with this model.

Take an iPod. With an iPod, you literally copy files from one device to another — for example, from your laptop to your iPod. This is, on the one hand, a limitation imposed by a lack of connectivity and restrictions in copyright law, but on the other, is actually by design. This scenario is not altogether unmanageable unless you have dozens of iPods that you want to sync up with your music, especially if you don’t typically think to connect your iPod every time you add new music, create new playlists or otherwise change your music library.

Now take an always-connected player, like Pandora Mobile, where the model works by federating continuous access from a central source — to consuming devices that play back music. Ignoring the DMCA restrictions that make it impossible for Pandora to let you listen to what you want on demand, the point is that, rather than making numerous copies across many unaffiliated and disconnected devices, Pandora affords a consistent experience and uniform access by streaming live data to any device that is authorized (and is online).

The former model (the iPod) is what you might call the “desktop model of data portability”. Certainly you can copy your data and take it with you, but it doesn’t reflect a model where always-on connectivity is assumed, which is the situation with online social networks. The offline model works well for physical devices that don’t require an internet connection to function — but it is a model that fails for services like Pandora, that requires connectivity, and whose value derives from ready access to up-to-date and current information, streamed and accessible from anywhere (well, except in Canada).

It’s nuance, but it’s critical to conceptualizing the value and import of this shift, and it’s nuance which I think is often left out of the explanation of “DataPortability” (whose official definition is the option to share or move your personal data between trusted applications and vendors (emphasis added)). In my mind, when the arena of application is the open, always-on, hyper-connected web, constructing best practices using an offline model of data is fraught with fundamental problems and distractions and is ultimately destined to fail, since the phrase is immediately obsolete, unable to capture in its essence contemporary developments in the cloud concept of computing (which consists of follow-your-nose URIs and URLs rather than discreet harddrives), and in the move towards push-based subscription models that are real-time and addressable.

So if you ask me what is “data portability”, I’ll concede that it’s a symbol for starting a conversation about what’s wrong with the state of social networks. Beyond that, I think there’s a great danger that, as a result of framing the current opportunity around “data portability”, the story that will get picked up and retold will be the about copying data between social networks, rather than the more compelling, more future-facing, and frankly more likely situation of data streaming from trusted brokered sources to downstream authorized consumers. But, I guess “copying” and “moving” data is easier to grasp conceptually, and so that’s what I think a lot of people will think when they hear the phrase. In any case, it gets the conversation started, and from there, where it goes, is anyone’s guess.

How does DataPortability (DP) relate to OpenID?

OpenID, along with OAuth, microformats, RSS, OPML, RDF, APML and XMPP are all open and non-proprietary technologies — formats and protocols — that grace the DataPortability homepage. How they ended up on the homepage, or what selection criteria is used to pick them, is beyond me (for example, I would have added ATOM to the list). So the best way that I can describe the relationship between any of these technologies and DataPortability is that, at some point, the powers that be within the group decided to throw a logo on their homepage and add it to their “social software stack”.

To reiterate (and I won’t speak for the OpenID Foundation since I’m unfamiliar with any conversations that they might have had with DP), no one necessarily asked if it would be okay to put the OAuth or microformats logos on the homepage of DP, or to include those technologies in the DP stack. They just did it. It wasn’t like DP had been around for awhile with a mandate to develop best practices for the future of social networks, and groups like the microformats community petitioned or was nominated to be included. They simply were. There was no process, as far as I’m aware, as to what was included, and what was not.

So while OpenID and the other technologies may be part of the technologies recommended by DP, it should be known that there really is no official relationship between these efforts and DP (though it is true that many members of each group coordinate, meet and discuss related topics, for example, at tomorrow’s Internet Identity Workshop, and at events like the Data Sharing Summit).

Beyond that, it should be noted that OpenID, OAuth, microformats et al have been in development for the last several years, and have been building up momentum and communities all on their own, without and prior to the existence of the DP initiative. In fact, the DP project really only got its start last November with an idea presented by Josh Patterson and Josh Lewis called WRFS, or the “Web Relational File System”. At the time, the WRFS was intended to serve as a “reference design” for describing how data portability should work and this was to serve as the foundation of the DP recommendations.

In January, after ongoing discussions, Josh decided that it would be best to spin WRFS off into its own project and started a separate mailing list, leaving DP to focus exclusively on evangelizing existing technologies and communities and, in the oft-repeated words of Chris Saad, to invent nothing new (a mantra inherited from the OAuth and microformats efforts).

Are there risks associated with DataPortability?

If you accept that DP is primarily a symbol for starting the conversation about transforming social networks from walled gardens into interoperating, seamful web services, then no, not really. If you believe or buy into the hype, or blindly follow the forthcoming “technical specifications“, I see significant risks that need to addressed.

First, DP does not speak for the community as a whole, for any specific social network (except, perhaps, MySpace), or for any individuals except those who publicly align themselves with the group. On too many occasions to feel comfortable about, I’ve seen or read members of the DP project claim authority far beyond any reasonable mandate, which to me have read like attempts to seize control and influence that not only isn’t justified, but that shouldn’t be ascribed to any individual or organization. I worry that this hubris (conceivably a result of proximity to certain A-Listers) is leading them to take more credit than they’re due, and in consequence, folks interested but previously uninitiated with any of the core technologies will be lead to believe that the DataPortability group is responsible and in control of those technologies. Furthermore, if it is the case that people are mislead, I have little faith that folks from the DP project will prevent themselves from speaking on behalf of (or pseudo-knowledgeably about) those technologies, leading to confusion and potential damage.

Second, I have a great deal of concern about the experiences and priorities that are playing into the group’s approach to privacy, security, publicity and disclosure. These are concerns that I would have with any effort that aims to bridge different social or commercial contexts where norms and expectations have already been established, and where there exists few examples (apart from Beacon) of how people actually respond to semi-automatic social network cross-fertilization. Not that privacy isn’t a hot topic on the DP mailing lists, it’s just that statements like this one reflects fishtailing in the definition and approach to privacy from a leader of the group, and that I worry could skid wildly out of control if clarity on how to achieve these dictims isn’t developed very soon:

The thing is that while Privacy is certainly important, in the end these are *social* platforms. By definition they are about sharing. The problem with Facebook Beacon was not that it was sharing, but rather it was sharing the WRONG information in the WRONG way.

Also again, don’t forget, just because data is portable or accessible does NOT mean it is public or ‘open’. This is why I stayed away from the ‘Open Data’ terminology when thinking up DataPortability. Just like a Hard Drive and a PC that runs certain applications, ultimately the applications that USE the data that need to ensure they treat the data with respect – or users will simply stop using them.

[. . .]

You are right that DP should NOT be positioned that Privacy is not important – that is certainly not my intention with my answers. But being important and being a major sticking point is two different things.

Again I tend to think of this as one big Hard Disk. While you provide read/write permissions to folders on a network (for privacy) it is ultimately up to the people and applications you trust to respect your privacy and not just start emailing your word docs to your friends.

So if the second risk is that an unrealistic, naive or incomplete model of privacy [coupled with a lack of effective enforcement mechanisms in the case of fraud or abuse] will be promoted by the DP group, the third risk is that groups or communities that are roped into the DP initiative may open themselves up to a latent social backlash should something go wrong with specific implementations of DataPortability best practices. Specifically, if the final privacy model demands certain approaches to user data, and companies or organizations go along with them by adopting the provided “social technology stack” (i.e. libraries offered that implement the DP data model), the technical implementation may be flawless, but if people’s data starts showing up in places where they didn’t expect it to, they may reject the whole notion of “data portability” and seek to retreat back to the days of “safe” walled gardens of today. And it may be that, because of the emphasis on specific technologies in the DP group’s propaganda, that brands like OpenID and OAuth will become associated with negative experiences, like downloadable .exes in email are today. It’s not a foregone conclusion in my mind that this future is inevitable, but it’s one that the individual groups affected should avoid at all costs, if only because of the significant progress we’ve made to date on our own, and it would be a shame if ignorance or lack of clear communication about the proper methods of adoption and implementation of these technologies lead people to blame the technology means instead of particular instances of its application.

What’s good about DataPortability?

I don’t want to just be a negative creep, so I do think that there is a silver lining to the DP initiative, which I mentioned earlier: it provides a token phrase that we can throw around to tease out some of the more gnarly issues involved in developing future social applications. It is about having a conversation.

While OpenID and OAuth have actual technology and implementations behind them, they also serve as symbols for having conversations about identity and authorization, respectively. Similarly, microformats helps us to think about lightweight semantic markup that we can embed in human-friendly web pages that are also compatible with today’s web browsers, and that additionally make those pages easier for machines to parse. And before these symbols, we had AJAX and Web 2.0, both of which, during their inception, were equally controversial and offensive to the folks who knew the details of the underlying technological innovation behind the terms but who also stood to lose their shamanic positions if simpler language were adopted as the conversations migrated into the mainstream.

Now, is there a risk that we might lose some of the nuance and sophistication with which we data junkies and user-centric identity advocates communicate if we adopt a less precise term to describe the present trends towards interoperable social networks? Absolutely. But this also means that, as the phrase “data portability” makes its way into common conversation, people can begin to think about their social networking activities and what they take for granted (“Wait, you mean that I wouldn’t have to sign up for a new account on my friend’s social network just to send them a photo? Really?”), and to realize that the way things are today not only aren’t the way that they have to be, but that there is a better way for social applications to be designed, architected and presented, that give the enthusiasts and customers of these services greater choice and greater latitude to actually pick services that — what else? — serve them best!

So just as Firefox gave rise to a generation of web developers that take web standards much more seriously, and have in turn recognized and capitalized on the power of having a “rectangle” that actually behaves in a way that they expect (meaning that it fully complies with the standards as they’ve been defined), I think the next evolution of the social web is going to be one where we take certain things, like identity, like portable contact lists, like better and more consistent permissioning systems as givens, and as a result, will lead to much more interesting, more compelling, and, perhaps even more lucrative, uses of the open social web.

Relationships are complicated

I’ve noticed a few interesting responses to my post on simplifying XFN. While my intended audiences were primarily fellow microformat enthusiasts and “lower case semantic web” types, there seems to be a larger conversation underway that I’d missed — one that both Adam “Everyware” Greenfield and Tim Berners-Lee have commented on.

In a treatise against XFN (and similarly reductive expressions of human relationships) from December of last year, Greenfield said a number of profound things:

…one of my primary concerns has always been that we not accede to the heedless restructuring of everyday human relations on inappropriate and clumsy models derived from technical systems – and yet, that’s a precise definition of social networking as currently instantiated.
All social-networking systems constrain, by design and intention, any expression of the full band of human relationship types to a very few crude options — and those static!
…it’s impossible to use XFN to model anything that even remotely resembles an organic human community. I passionately believe that this reductive stance is not merely wrong, but profoundly wrong, in that it deliberately aims to bleed away all the nuance, complication and complexity that makes any real relationship what it is.
I believe that technically-mediated social networking at any level beyond very simple, local applications is fundamentally, and probably persistently, a bad idea. From where I stand, the only sane response is to keep our conceptions of friendship and affinity from being polluted by technical metaphors and constraints to begin with.

Whew! Strong stuff, but useful, challenging and insightful.

Meanwhile, TBL defended a semi-autistic perspective in describing the future of the Semantic Web (yes, the uppercase version):

At the moment, people are very excited about all these connections being made between people — for obvious reasons, because people are important — but I think after a while people will realise that there are many other things you can connect to via the web.

While my sympathies actually lie with Greenfield (especially after a weekend getting my mom setup on Facebook so she could send me photos without clogging my inbox with 80MB emails… a deficiency in the design of the technology, not my mother mind you!), I also see the promise of a more self-aware, self-descriptive web. But, realistically, that web is a long way off, and more likely, that web is still going to need human intervention to make it work — at least for humans to benefit from it (oh sure, just get rid of the humans and the network will be just perfect — like planes without passengers, right?).

But in the meantime, there is a social web that needs to be improved, and that can be improved, in fairly simple and straight-forward ways, that will make it easier for regular folks who don’t (and shouldn’t have to) care about “data portability” and “password anti-patterns” and “portable contact lists” to benefit from the fact that the family and friends they care about are increasingly accessible online, and actually want to hear from them!

Even though Justin Smith takes another reductive look at the features Facebook is implementing, claiming that it wants to “own communications with your friends“, the reality is, people actually want to communicate with each other online! Therefore it follows that, if you’re a place where people connect and re-connect with one another, it’s not all that surprising that a site like Facebook would invest in and make improvements to facilitate interaction and communication between their members!

But let’s back up a minute.

If we take for granted that people do want to connect and to communicate on social networks (they seem to do it a lot, so much to that one might could even argue that people enjoy doing it!), what role should so-called “portable contact lists” play in this situation? I buy Greenfield’s assertion that attempts by technologists to reduce human relationships to a predefined schema (based on prior behavior or not) is a failing proposition, but that seems to ignore the opportunity presented by the fact that people are having to maintain many several lists of their friends in many different places, for no other reason than an omission from the design of the social internetwork.

Put another way, it’s not good enough to simply dismiss the trend of social networking because our primitive technological expressions don’t reflect the complexity of real human relationships, or because humans are just one of kind of “object” to be “semantified” in TBL’s “Giant Global Graph“… instead, people are connecting today, and they’re wanting to connect to people outside of their chosen “home” network and frankly the experience sucks and it’s confusing. It’s not good enough to get all prissy about it; the reality is that there are solutions out there today, and there are people working on these things, and we need smart people like Greenfield and Berners-Lee to see that solutions that enable the humanist web (however semantic it needs to be) are being prioritized and built… and that we [need] not accede to the heedless restructuring of everyday human relations on inappropriate and clumsy models derived from technical systems.

I can say that, from what I’ve observed so far, these are things that computers can do for us, to make the social computing experience more humane, should we establish simple and straightforward means to express a basic list of contacts between contexts:

help us find and connect to people that we’ve already indicated that we know
introduce us to people who we might know, or based on social proximity, should know (with no obligation to make friends, of course!)
help us from accidently bumping into people we’d rather not interact with (see block-list portability)
helping us to segment our friendships in ways that make sense to us (rather than the semi-arbitrary ways that social networks define)
helping us to confidently share things with just the people with whom we intend to share

There may be others here, but off the top of my head, I think satisfying these basic tasks is a good start for any social network that thinks allowing you to connect and interact with people who you might know, but who may not have already signed up for the service, is useful.

I should make one last point: when thinking about importing contacts from one context to another, I do not think that it should be an unthinking act. That is, just because it’s merely data being copied between servers, the reality is that those bits represent things much more sacred and complicated than any computer might ever be programmed to imagine. Therefore, just because we can facilitate and lower the friction of “bringing your friends with you” from one place to another doesn’t mean that it should be an automatic process and that all your friends in one place should be made to be your friends in the new place.

And regardless of how often good ol’ Mark Zuckerberg claims that the end game is to make communications more efficient, when it comes to relationships, every connection transposed from one context to another should have to be reconsidered (hmm, a great argument for tagging of contacts…)! We can and should not make assumptions about the nature of people’s relationships, no matter what kind of semantics they’ve used to describe them in a given context. Human relationships are simply far too complicated to be left up to assumptions and inferences made by technologists whose affinity oftentimes lies closer to the data than to the makers of the data.

Portable contact lists and the case against XFN

XFN I suppose it might come as a surprise that I’ve decided to question, if not reject, XFN as the format for expressing portable friends or contact lists. I’m not throwing out the baby in the bathwater here, but rather focusing on the problem that needs to be solved and choosing to redouble my efforts on an elegant solution that builds on existing work and implementations.

My thinking on this crystalized yesterday during the Building Portable Social Networks panel that I shared with Jeremy Keith, Leslie Chicoine, Joseph Smarr and David Recordon. I further defined my realization last night on Twitter and when Anders Conbere pinged me about a post he’d written more or less on the subject, I knew that I was on to something.

The idea itself is pretty simple, but insomuch as it reduces both complexity and helps narrow the scope of evangelism work needed to push for further adoption, I think the change is a necessary one.

→ Quite simply, contact list portability can be achieved with only rel-contact and rel-me. All the rest is gravy.

Here’s the deal: as it is, we have a pretty nasty anti-pattern that a number of us have been railing against for some time (and, as it turns out, with good friggin’ reason). As I pointed out on the panel yesterday, people shouldn’t be penalized for the fact that the technology allows them to be promiscuous with their account credentials; after all, their desire to connect with people that they know is a valid one and has been shown to increase engagement on social sites. The problem is that, heretofore, importing your list of contacts from various webmail address books required you to provide your account credentials to an untrusted third party. On top of that, your contact list is delivered as email addresses, which I call “resource deficient” (what else can you do with an email address but send messages to it or use it as a key to identify someone? URLs are much richer).

The whole mechanism for bringing with your friends to new social sites is broken.

Enter microformats and XFN

The solution we’ve been harping on for the last couple years is a web-friendly solution for marking up existing and (predominantly) public lists of friends, using 18 pre-defined rel values. WordPress supports XFN natively and is one of the primary reasons we started with WordPress as the foundation of the DiSo Project:

Reading up on the background of XFN, you realize that one of the primary goals of XFN was simplicity. Simplicity is relative however, and you have to remember that XFN’s simplicity was in contrast to FOAF, a much denser and complex format based on RDF.

Given all the values (that is, the existing XFN terms) and the generally semantic specificity of XFN, I decided to contrast the adoption of XFN by publishers and by consumers with the competing (and more ubiquitous) solution for contact list portability (i.e. email address import).

If you use Google’s new Social Graph API and actually go looking for XFN data (for example, on Twitter or Flickr or others), you’ll find that, by and large, the majority of XFN links on the web are using either rel-contact or rel-me.

If you’re lucky, you might find some rel-friends in there, but after rel-me and rel-contact, the use of the other 16 terms falls off considerably. Compound that fact with the minor semantic distinction between “contacts” and “friends” on different sites (sites like Dopplr dispense altogether with these terms, opting for “fellow travelers”) and you quickly begin to wonder if the “semantic richness” of XFN is really just “semantic deadweight”.

And, in terms of evangelism and potential adoption, this is critical. If 16 of the 18 XFN terms are just cruft, how can we maintain our credibility, especially when arguing against the email import approach, in which there are little to no semantic descriptors at the time of import (instead, you basically get a dumb list of email addresses — with no clues whatsoever as to which addresses are “sweethearts”, “crushes”, “kin” or the like). It’s not that XFN in and of itself is bad, it’s that, when compared with the reigning tactic of email import, we look as complicated and convoluted as FOAF did. The reality is, even if it’s “heinous” to data purists or pragmatists, email import works today, and what works, wins.

Defining Contact List Portability

The more I talk to Leslie (of Satisfaction), the more sensitive I become to the language that we use when we talk about the technologies that we work on. I mean, what the fuck is an “XFN”? Even “social network portability” probably causes rational people to break out in hives when they hear the phrase (not like we’ve hit mainstream or anything). I mean, from a usability perspective, the words we use to describe this stuff is about as usable as Drupal was five years ago (zing!). I can only imagine that when we technologists open our mouths, this is what goes through most people’s heads:

SO, I’m not advocating ditching XFN altogether; on the contrary, compared with FOAF, I think we’ve achieved a great deal of mindshare, at least in gaining the support of technologists who work on fairly large social sites (though that’s apparently being disputed). The next stage of the process should be to simplify, and to focus on what people are already doing and on what’s working. If we simply want to defeat the email import approach (which I think is a good idea, albeit with the caveat that we still need a notification mechanism — perhaps something easily ignorable like Facebook-app invites?), then I think we need to consolidate our efforts on rel-contact and rel-me and let people discover (and optionally implement) the remaining 16 values if they’re bored. Or have free time. As far as I’m concerned, they offer little to no actual utility when it comes to contact list portability.

So to the definition of contact list portability, I would suggest that it’s the ability to take a list of identifiers (read: URLs, formerly email addresses) that represent people that you know and connect with them in a new context (bonus points if by “taking” you read that as “subscribing” (but not “syncing”)).

This is consistent with Joseph’s Practical Vision for Friends-List Portability. It also importantly ignores the non-overlapping problems of groupings/relationship semantics and permissioning (things which should not be conflated!).

What’s next

Kveton agrees with me; Recordon dissents, wanting more extensibility.

I get Dave’s point, but before we worry about extensibility, we have to look at what minimal bits of XFN are being picked up. By only specifying that an outgoing link is either a “contact” or “another link of mine”, we greatly reduce the cognitive tax of grokking the problem that XFN set out to solve and minimize the implementation tax of rolling out the necessary logic and template changes. Ultimately, it also simplifies the dataset, and pushes the semantics of relationships deeper into applications where I’d argue they belong (again, looking at the Dopplr model as well as Pownce (friends, fans, fan of) and Twitter (following, followers). While the other 16 XFN values are certainly not off limits, their marginal value is negligible compared with the cost of explaining why anyone should care of about them (let along understand them — i.e. “muse”??). And, compared with emails for identifiers, URLs are definitely the future.

So, with that, I’m no longer going to both with advocating for the complete adoption of XFN. Instead, I’m going to advocate for supporting Contact List Portability by implementing rel-me and rel-contact (a “subset” of XFN). And that’s it.

This won’t solve the problems that Anders is talking about, but I think it’s radical simplification that’s been long overdue in the effort towards social network portability.

The Existential DiSo Interview

The Existential DiSo Interview from Chris Messina on Vimeo.

Here’s what I asked myself:

how are you?

we’re going to talk about diso today? is that right?

what is diso?

you say it’s a social network, so how would it work with wordpress?

how is this different from myspace or facebook?

so who’s involved in this project?

so what comes next?

how is this different than opensocial?

what’s going to be the big win for diso?

so do you see this model applying in any other domain on the web?

what kind of support do you need?

are you talking to any of the bigger social networks? like facebook or myspace?

so who cares?

how will you draw customers away from myspace or facebook?

any last thoughts?