Google Profiles, namespace lock-in & social search

I’d originally intended to respond to Joshua Schacter’s post about URL shorteners and how they’re merely the tip of the data iceberg, but since I missed that debate, Google has fortuitously plied me with an even better example by releasing custom profile URLs today.

My point is to reiterate one of Tim O’Reilly’s ever-prescient admonishments about Web 2.0: lock-in can be achieved through owning a namespace. In full:

5. Chief among the future sources of lock in and competitive advantage will be data, whether through increasing returns from user-generated data (eBay, Amazon reviews, audioscrobbler info in last.fm, email/IM/phone traffic data as soon as someone who owns a lot of that data figures out that’s how to use it to enable social networking apps, GPS and other location data), through owning a namespace (Gracenote/CDDB, Network Solutions), or through proprietary file formats (Microsoft Office, iTunes). (“Data is the Intel Inside”)

(I’ll note that the process of getting advantage from data isn’t necessary a case of companies being “evil.” It’s a natural outcome of network effects applied to user contribution. Being first or best, you will attract the most users, and if your application truly harnesses network effects to get better the more people use it, you will eventually build barriers to entry based purely on the difficulty of building another such database from the ground up when there’s already so much value somewhere else. (This is why no one has yet succeeded in displacing eBay. Once someone is at critical mass, it’s really hard to get people to try something else, even if the software is better.) The question of “don’t be evil” will come up when it’s clear that someone who has amassed this kind of market position has to decide what to do with it, and whether or not they stay open at that point.)

Consider two things:

Owning the “people” namespace will determine whether people see the web through Google’s technicolor glasses or Facebook’s more nuanced and monochrome blue hues.

Curiously, it has been (correctly) argued that Google “doesn’t get social”, a criticism that I generally support. And yet, with their move to more convenient profile URLs that point to profiles that aggregate content from across the web (beating Facebook to the punch), a bigger (albeit incomplete) picture begins to emerge.

When I blogged that my name is not a URL, I wasn’t so much arguing against vanity or custom profile URLs but instead making the point that such things really should go away over time, from a usability perspective.

Let me put it this way: at one point, if you weren’t in the Yellow Pages, you basically didn’t exist. Now imagine there being several competitors to the Yellow Pages — the Red, Green and Blue Pages — each maintaining overlapping but incomplete listings of people. You’re going to want to use the one that has the most complete, exhaustive and easy-to-use list of names, right? And, I bet beyond that, if one of them was able to make the people that you know and actually care about more accessible to you, you’d pick that one over all the others. And this is where owning — and getting people to “live in” — a namespace begins to reveal its significance.

Google Profile Search

So, it’s telling thing to look at Google and Facebook’s respective approaches to their people search engines and indexes. Indeed, having a readily accessible index of living persons — structured by their connections to one another — will become a necessary precondition to getting social search right (see Aardvark for a related approach, which connects to the Facebook and IM portions of your social graph to facilitate question answering).

As social search and living through your social graph becomes “the norm” (i.e. with increasing reliance on social filtering), Google and Facebook’s ability to create compelling experiences on top of data about you and who you know will come to define and differentiate them.

To date, Google’s profile search has been rather unloved and passed over, but with the new, more convenient profile URLs and the location of profile search at google.com/profiles, I suspect that Google is finally getting serious about social.

Compare Facebook and Google’s search results for my buddy, Dave Morin:

Facebook logged out:

Search Names: dave morin | Facebook

Facebook logged in:

Facebook | Search: dave morin

Google results (there’s no difference between logged in and logged out views):

Dave Morin - Google Profile Search

Notice the difference? See how much better Facebook’s search is because it knows which “Dave Morin” is my friend?

Now, consider the profile result when you click through:

Dave’s Facebook profile (logged out):

Dave Morin - San Francisco, CA | Facebook

(Facebook’s logged in profile view is as you’d expect — a typical Facebook profile with stream and wall.)

Now, here’s the clincher. Take a look at Google’s profile for Dave:

Dave Morin - Google Profile

Google is able to provide a much richer and simpler profile, that’s much more accessible (without requiring any kind of sign in) because they’ve radically simplified their privacy model on this page (show what you want, and nothing more). Indeed, Google’s made it easier for people to be open — at least with static information — than Facebook!

So much for Facebook’s claim to openness! 😉

Of course, default Google profiles are pretty sparse, but this is just the beginning. (Bonus: both Facebook and Google public profiles support the microformat!)

And the point is: where will you build your online identity? Under whose namespace do you want to exist? (Personally, I choose my own.)

Clearly the battle for the future of the social web is heating up in subtle but significant ways, and Google’s move today shouldn’t be thought of anything less than the opening salvo in moving the battle back to its turf: search.

Portable Profiles & Preferences on the Citizen-Centric Web

Loyalty Cards by Joe LoongLet me state the problem plainly: in order to provide better service, it helps to know more about your customer, so that you can more effectively anticipate and meet her needs.

But, pray tell, how do you learn about or solicit such information over the course of your first interaction? Moreover, how do you go about learning as much as you can, as quickly as you can, without making the request itself burdensome and off-putting?

Well, as obvious as it seems, the answer is to let her tell you.

The less obvious thing is how.

And that’s where user-centric (or citizen-centric) technologies offer the most promise.

It’s like this:

  • If you let someone use an account or ID that they already use regularly elsewhere, you will save them the hassle of having to create yet another account that works solely with your service;
  • following on that, an account that is reusable is more valuable, and its value can be further increased by attaching certain types of profile attributes to it that are commonly requested;
  • the more common it becomes to reuse an account, the more people will expect this convenience during new sign up experiences, ideally to the point of knowing how to ask for support for their preferred sign-in mechanism from the services that they use;
  • presuming that service providers’ desire for profile information and preferences will not decrease, it will become an added byproduct of user-centric authentication to be able to import such data from identity providers as it is available;
  • as customers realize the convenience of portable profile and preference data, savvy identity providers will make it easier to store and express a wider array of this data, and will subsequently work with relying parties to develop interoperable sign up flows and on ramps (see Google and Plaxo).

For this to work, the individual must be motivated to manage her profile information and preferences, which shouldn’t be hard as her data becomes increasingly reusable (sort once, reuse everywhere). Additionally, organizing, maintaining, and accruing this information becomes less onerous when it’s all in one place (or conveniently accessible through one central customer-picked source), as opposed to sharded across many accounts and unaffiliated services.

You can get similar functionality with form-filling software like 1Password except in the model I’m describing, the data travels with you — beyond the browser and off the desktop — to wherever you need it — because it is stored in the cloud.

As it becomes easier to store and share this information, I think more people will do this as a happenstance of using more social software — and will become acclimated to providing their friends and service providers with varying degrees of access to increasing amounts of personally describing data.

Companies that jump on this and make it easier for people to manage their profile and preference data will benefit — having access to more accurate, timely, and better-maintained information, leading to more personalized user experiences and accelerating the path to satisfaction.

Companies that do get this right will benefit from what is emerging as a new social contract. As a citizen of the web, if you let me manage my relationship with you, and make it easy for me to do so, giving me the choice of how and where I store my profile and preference data, I’ll be more likely, more willing, and more able to share it with you, in an ongoing fashion, increasingly as you use it to improve my experiences with you.

My name is not a URL

Twitter / Mark Zuckerberg: Also just created a public ...

Arrington has a post that claims that Facebook is getting wise to something MySpace has known from the start – users love vanity URLs.

I don’t buy it. In fact, I’m pretty sure that the omission of vanity URLs on Facebook is an intentional design decision from the beginning, and one that I’ve learned to appreciate over time.

From what I’ve gathered, it was co-founder Dustin Moskovitz’s stubbornness that kept Facebook from allowing the use of pseudonymic usernames common on previous-generation social networks like AOL. Considering that Mark Zuckerberg’s plan is to build an online version of the relationships we have in real life, it only makes sense that we should, therefore, call our friends by their IRL names — not the ones left over or suggested by a computer.

But there’s actually something deeper going on here — something that I talked about at DrupalCon — because there are at least two good uses for letting people set their own vanity URLs — three if your service somehow surfaces usernames as an interface handle:

  1. Uniqueness and remembering
  2. Search engine optimization
  3. Facilitating member-to-member communication (as in the case of Twitter’s @replies)

For my own sake, I’ve lately begun decreasing the distance between my real identity and my online persona, switching from @factoryjoe to @chrismessina on Twitter. While there are plenty of folks who know me by my digital moniker, there are far more who don’t and shouldn’t need to in order to interact with me.

When considering SEO, it’s quite obvious that Google has already picked up on the correlation:

chris messina - Google Search

Ironically, in Dustin’s case (intentionally or not) he is not an authority for his own name on Google (despite the uniqueness of his name). Instead, semi-nefarious sites like Spock use SEO to get prominent placement for Dustin’s name (whether he likes it or not):

Dustin Moskovitz - Google Search

Finally, in cases like Twitter, IM or IRC, nicknames or handles are used explicitly to refer to other people on the system, even if (or especially if!) real identities are never revealed. While this separation can afford a number of perceived benefits, long-term it’s hard to quantify the net value of pseudonymity when most assholes on the web seem to act out most aggressively when shrouding their real names.

By shunning vanity URLs for its members, Facebook has achieved three things:

  1. Establishes a new baseline for transparent online identity
  2. Avoids the naming collision problem by scoping relationships within a person’s [reciprocal] social graph
  3. Upgrades expectations for human interaction on social websites

That everyone on Facebook has to use their real name (and Facebook will root out and disable accounts with pseudonyms), there’s a higher degree of accountability because legitimate users are forced to reveal who they are offline. No more “funnybunny345” or “daveman692” creeping around and leaving harassing wall posts on your profile; you know exactly who left the comment because their name is attached to their account.

Go through the comments on TechCrunch and compare those left by Facebook users with those left by everyone else. In my brief analysis, Facebook commenters tend to take their commenting more seriously. It’s not a guarantee, but there is definitely a correlation between durable identity and higher quality participation.

Now, one might point out that, without unique usernames, you’d end up with a bunch of name collisions — and you’d be right. However, combining search-by-email with profile photos largely eliminates this problem, and since Facebook requires bidirectional friendship confirmation, it’s going to be hard to get the wrong “Mike Smith” showing up in your social graph. So instead of futzing with (and probably forgetting) what strange username your friend uses, you can just search by (concept!) their real name using Facebook’s type-ahead find. And with autocompletion, you’ll never spell it wrong (of course Gmail has had this for ages as well).

Let me make a logical leap here and point out here that this is the new namespace — the human-friendly namespace — that Tim O’Reilly observed emerging when he defined Web 2.0, pointing out that a future source of lock-in would be “owning a namespace”. This is why location-based services are so hot. This is also why it matters who gets out in front first by developing a database of places named by humans — rather than by their official names. When it comes to search, search will get better when you can bound it — to the confluence of your known world and the known/colloquial world of your social graph.

When I was in San Diego a couple weeks back, it dawned on me that if I searched for “Joe’s Crab Shack”, no search engine on earth would be able to give me a satisfying result… unless it knew where I was. Or where I had been. Or, where my friends had been. This is where social search and computer-augmented social search becomes powerful (see Aardvark). Not just that, but this is where owning a database of given names tied to real things becomes hugely powerful (see Foursquare). This is where social objects with human-given names become the spimatic web.

So, as this plays out, success will find the designer who most nearly replicates the world offline online. Consider:

Twitter / Rear Adm. Monteiro: @mat and I are in the back ...

vs:

Facebook | @replies

and:

iChat

vs.

Facebook Chat

Ignoring content, it seems to me that the latter examples are much easier to grok without knowing anything about Facebook or Twitter — and are much closer approximations of real life.

Moreover, in EventBox, there is evidence that we truly are in a transitional period, where a large number of people still identity themselves or know their friends by usernames, but an increasing number of newcomers are more comfortable using real names (click to enlarge):

Eventbox Preferences

We’re only going to see more of this kind of thing, where the data-driven design approach will give way to a more overall humane aesthetic. It begins by calling people by the names we humans prefer to — and will always — use. And I think Facebook got it right by leaving out the vanity URLs.

Generation Open

I spent the weekend in DC at TransparencyCamp, an event modeled after BarCamp focused on government transparency and open access to sources of federal data (largely through APIs and web services). Down the street, a social-media savvy conference called PowerShift convened over 12,000 of the nation’s youth to march on Congress to have their concerns about the environment heard. They were largely brought together on social networks.

Last week, after an imbroglio about a change to their terms of service, Facebook published two plain-language documents setting the course for “governing Facebook in an Open and Transparent way“: a Statement of Rights and Responsibilities coupled with a list of ten guiding principles.

The week before last, the Association for Computing Machinery (ACM) released a set of recommendations for open government that, among other things, called for government data to be available in formats that promote reuse and are available via public APIs.

WTF is going on?

Clearly something has happened since I worked on the Spread Firefox project in 2004 — a time when Mozilla was an easily dismissed outpost for “modern communists” (since meritocracy and sharing equals Communism, apparently).

Seemingly, the culture of “open” has infused even the most conservative and blood-thirsty organizations with companies falling over each other to claim the mantle of being the most open of them all.

So we won, right?

I wouldn’t say that. In fact, I think it’s now when the hard work begins.

. . .

The people within Facebook not only believe in what they’re doing but are on the leading edge of Generation Open. It’s not merely an age thing; it’s a mindset thing. It’s about having all your references come from the land of the internet rather than TV and becoming accustomed to — and taking for granted — bilateral communications in place of unidirectional broadcast forms. Where authority figures used to be able to get away with telling you not to talk back, Generation Open just turns to Twitter and lets the whole world know what they think.

But it’s not just that the means of publishing have been democratized and the new medium is being mastered; change is flowing from the events that have shaped my generation’s understanding of economics, identity, and freedom.

Maybe it started with Pearl Jam (it did for me!). Or perhaps witnessing AOL incinerate Netscape, only to see a vast network emerge to champion the rise of Firefox from its ashes. Maybe being bombarded by stinking piles of Flash and Real Player one too many times lead to a realization that, “yeah, those advertisers ain’t so cool. They’re fuckin’ up my web!” Of course watching Google become a residue on the web itself, imbuing its colorful primaries on HTTP, as a lichen seduces a redwood, becoming inseparable from the host, also suggests a more organic approach to business as usual.

Talking to people who hack on Drupal or Mozilla, I’m not surprised when they presume openness as matter of course. They thrive on the work of those who have come before and in turn, pay it forward. Why wouldn’t their work be open?

Talking to people at Facebook (in light of the arc of their brief history) you might not expect openness to come culturally. Similarly, talking to Microsoft you could presume the same. In the latter case, you’d be right; in the former, I’m not so sure.

See, the people who populate Facebook are largely from Generation Open. They grew up in an era where open source wasn’t just a bygone conclusion, but it was central to how many of them learned to code. It wasn’t in computer science classes at top universities — those folks ended up at Arthur Anderson, Accenture or Oracle (and probably became equally boring). Instead, the hobbyist kids cut their teeth writing WordPress plugins, Firefox extensions, or Greasemonkey scripts. They found success because of openness.

ShareThat Zuckerberg et al talk about making the web a more “open and social place” where it’s easy to “share and connect” is no surprise: it’s the open, social nature of the web that has brought them such success, and will be the domain in which they achieve their magnum opus. They are the original progeny of the open web, and its natural heirs.

. . .

Obama is running smack against the legacy of the baby boomers — the generation whose parents defeated the Nazis. More relevant is that the boomers fought the Nazis. Their children, in turn, inherited a visceral fear of machinery, in large part thanks to IBM’s contributions to the near-extermination of an entire race of people. If you want to know why privacy is important — look to the power of aggregate knowledge in the hands of xenophobes 70 years ago.

But who was alive 70 years ago? Better: who was six years old and terribly impressionable fifty years ago? Our parents, that’s who.

And it’s no wonder why the Facebook newsfeed (now stream) and Twitter make these folks uneasy. The potential for abuse is so great and our generation — our open, open generation — is so beautifully naive.

. . .

We are the generation that will meet Al Qaeda not “head on”, but by the length of each of its tentacles. Unlike our parents’ enemies, ours are not centralized supernations anymore. Our enemies act like malware, infecting people’s brains, and thus behave like a decentralized zombie-bot horde that cannot be stopped unless you shift the environment or shut off the grid.

We are also the generation that watched our government fail to protect the victims of Katrina — before, during and after the event. The emperor’s safety net — sworn nemesis of fiscal conservatives — turned out not to exist despite all their persistent whining. Stranded, hundreds took to their roofs while helicopters hovered over head, broadcasting FEMA’s failure on the nightly news. While Old Media gawked, the open source community solved problems, delivering the Katrina PeopleFinder database, meticulously culled from public records and disparate resources that, at the time, lacked usable APIs.

But that wasn’t the first time “privacy” worked against us. On September 11, 2001 we flooded the cell networks, just wanting to know whether our friends and family were safe. The network, controlled by a few megacorporations, failed under the weight of our anxiety and calls; those supposed consumer protections designed to keep us safe… didn’t, turning technology and secrecy against us.

. . .

Back to this weekend in DC.

You put TransparencyCamp in context — and think about all the abuses that have been perpetrated by humans against humans — throughout time… you have to stop and wonder: “Geez, what on earth will make this generation any different than the ones that have come before? What’s to say that Zuckerberg — once he assembles a mass of personally identifying information on his peers on an order of magnitude never achieved since humans started counting time — won’t he do what everyone in his position has done before?”

Oddly enough, the answer is probably not. The reason is the web. Even weirder is that Facebook, as I write this, seems to be taking steps to embrace the web, seeking to become a part of it — rather than competing against it. It seems, at least in my interactions with folks at Facebook, that a good portion of them genuinely want to work with the web as it today, as they recognize the power that they themselves have derived from it. As they benefitted from it, they shall benefit it in turn.

Seems counterproductive to all those MBAs who study Microsoft as the masterstroke of the 21st century, but to the citizens of the web — we get it.

What Facebook is attempting — like the Obama administration in parallel — is nothing short of a revolution; you simply can’t evolve out of a culture of fear and paranoia that was passed down to us. You have to disrupt the ecosystem, and create a new equilibrium.

If we are Generation Open, then we are the optimistic generation. Ours only comes around every several generations with the resurgence of pure human spirit coupled with the resplendent realization of intent.

There are, however, still plenty who reject this attitude and approach, suffering from the combined malaise of “proprietariness”, “materialism”, and “consumerism”.

But — I shit you not — as the world turns, things are changing. Sharing and giving away all that you can are the best defenses against fear, obsolescence, growing old, and, even, wrinkles. It isn’t always easy, but it’s how we outlive the shackles of biology and transcend the physicality of gravity.

To transcend is to become transparent, clear, open.

RIP @factoryjoe

Twitter / Mr Messina: Oh, and in case you missed ...

Sometime last week, after two Manhattans, I decided to change my Twitter username from @factoryjoe to @chrismessina. In the scheme of things, not a big deal (yeah, okay, so I broke a couple thousand hyperlinks…). And yet, I can’t but feel like I’ve shed a skin or changed identities… at least to a specific audience.

I started using Twitter in 2006 as “factoryjoe”. Of course, this is the nick that I use everywhere —from Flickr to my personal homepage — so that choice was obvious. I essentially own factoryjoe on the web — people even occasionally call me “Joe” when we meet, such is their familiarity with my online persona. But that’s not my actual name.

When I talk in front of people and I introduce myself as “Chris Messina”, the disconnect between my real name and my online persona becomes distracting. And, over time, my motivations for having a separate online identity have waned.

But first, I suppose, I should provide some background.

Where did “factoryjoe” come from?

Every so often I’m asked where “factoryjoe” came from: “Kind of like ‘Joe the Plumber?’” “Kind of,” I say. “But not really.”

Growing up, I drew comic books for fun. In fact, for most of my formative years, it seemed pretty clear that I’d pursue a career in art. I worked in pastels, watercolor, pen and ink; I preferred pen and ink above all the others though, taking lessons from Rob Liefeld, Todd McFarlane, Jim Lee and others as Image Comics came on to the scene. It was a fond dream of mine to someday pen my own sequential art.
1984 PosterIn high school, I read Nineteen Eighty-Four and became enamored with the character of Winston Smith, Orwell’s “everyman” character. In Winston Smith, I found a confederate, struggling to assert his individual humanity against the massive, dehumanizing forces of groupthink and oligarchy. Similarly, I identified with Vonnegut’s Harrison Bergeron and his struggle against homogeneity and mediocrity. The contours of “factoryjoe” began to emerge against the backdrop of the metropolitan “FactoryCity”, where industrialism was proven a sham and one’s conspicuous pursuit of passion ruled over the shallow pursuit of material consumption.

Factory City

Factory Joe was the anonymous shell in which I could plant my aspirations and designs for the future. He served as a metaphorical vessel through which I could mold a broader narrative.

So… changing your Twitter username?

In every superhero’s journey, there comes a time when the mask grows bigger than its owner. Is it the mask that provides the wearer with his power, or is it something integral to the individual?

I once believed that I needed to have a deep separation between myself and my online persona — that they should be distinct; that I should distrust the web. Over time I’ve realized a great deal power by closing the gap between who I am offline and who I am online. I suppose this is the power of transparency, developed through consistency and demonstrated integrity.

@factoryjoe was, therefore, my first go at creating an online identity for myself. A kind of “home away from home” that I could experiment with before this whole social web thing caught on.

As it happened, this was fine when I had a small group of friends who used similar aliases for themselves, but more recently — inspired by Facebook’s allergy to pseudonyms and non-human friendly usernames — it seems that not only owning your own identity is in vogue, but using your real name is an act of assertiveness, inventiveness or establishment. Heck, if you’re willing to share your real name with 150+ million compatriots on Facebook, is there really that much to be gained from obfuscating your actual name on the open web anymore (that’s rhetorical)?

So, back to Future of Web Apps… following my workshop with Dave, I took a step back to think about how it must appear for me to be working on the social web and identity technologies while maintaining this dichotomy between my offline and online personas — in name only. C’mon, when people have feedback and I’m talking on stage — who do I want them addressing? — my assumed identity … or me? The friction that I invented is just no longer necessary.

So factoryjoe isn’t going away — not entirely at least. It’s a useful vessel to inhabit and I’ll continue to do so. But on Twitter, Facebook, and on my homepage, I’ll use my real name. There is simply no longer a good reason to differentiate between who I am online, and who I am off, if ever there was.

. . .

Postscript: I’m now @chrismessina on Twitter. If we were friends before — no need to make any changes — Twitter took care of that already. @factoryjoe‘s been retired, but now that I got it back from Recordon (he was just jealous, since he has the worst username ever), who knows, maybe he’ll return someday. We’ll see!

BBC Digital Planet podcast featuring OpenID

Update: The BBC has posted a write-up of the report called Easy login plans gather pace.

Digital Planet album artworkI was interviewed by Gareth Mitchell last week about OpenID for the BBC’s Digital Planet podcast.

Our conversation lasted about 10 minutes — of which only about two minutes survived (mirrored here as they currently do not keep an archive of previous episodes).

It was a familiar conversation for me, since the primary concerns Gareth expressed had to do with privacy, identity and the notion that “someone else” could “own” another’s identity on the web. His premise sounded familiar: “Won’t OpenID make my identity more hackable?”

The answer, of course, isn’t that straight-forward, and depends on a lot of mitigating factors. However, the fundamental take-away is that OpenID really is no more insecure than email, and even then, provides a future-facing design that that leads to many kinds of protection that email, in practice, does not.

. . .

I’ve also noticed over the past several years that Europeans harbor much greater sensitivities to privacy issues while Americans tend to concentrate on matters concerning “property” (physical, personal and intellectual). This is evidenced by yesterday’s blow up around Facebook’s changes to their Terms of Service. On the one hand, there’s this weird American outcry against Facebook owning your data (in common, at least) forever. From the European side, it seems like the concern is centered more around what the changes mean to one’s privacy, rather than whether Facebook can perpetually “make money” off your stuff.

I bring this up because it’s immensely relevant with regards to the conversation I had with Gareth (given that he’s based in the UK).

With the current case, I’m sympathetic to Facebook, because I know that this will be the year that people have their “mindframes” bent around new conceptions of personal privacy and control and ownership of data. I believe (as Facebook purports to) that people’s desire to share will overcome their desire for control over their personal data, and that they will gradually realize that sharing will require letting go. It is this reality — the reality of networked data in the cloud — that necessitated Facebook’s change to their terms of service — not some nefarious desire to steal your first born (or your data).

In other words, the conditions and kind of thinking that lead to the backlash against Plaxo known as Scoblegate will cease to exist in the future. Facebook’s change is merely a recognition of this new environment.

It remains unclear to me whether the pundits in this space realize that this shift will occur, and will occur naturally (as it has already begun — consider the integration of Facebook and Flickr in iPhoto ’09), or whether they just want to scream and holler when they notice something that seems astray.

. . .

Last December, I spent time talking to Boaz Sender of HTML Times at length about several of these topics (including discussing the intellectual property issues surrounding many of the technologies that are helping to ensure that the web remain an open playing field) in an interview about Identity in the Network. In juxtaposition to my interview with the BBC, I think this interview gets into some of the deeper issues at work here that must also be considered when it comes to the future of online identity, privacy and data control and (co)-ownership.

Where data goes when it dies and other musings

I’ve been wanting to write about Ma.gnolia’s catastrophic data loss last week ever since it happened, but wasn’t quite sure how I wanted to approach it. Larry (Ma.gnolia founder and the sole person who maintained the site) is a good friend of mine, and Ma.gnolia was one of Citizen Agency’s first clients. It’s been painful to see him struggle through this, both personally and professionally, and it’s about the worst possible [preventable] thing that can happen to a Web 2.0 service.

Still, kept in context, it’s made me reconsider some things about the nature and value of open, networked data.

I. How I Learned to Stop Worrying and Love the Bomb

According to Google’s cache of my profile on Ma.gnolia, I had accrued 5758 bookmarks and 6162 tags since I first started using the service August 08, 2004. That’s a lot of data capital to have instantly wiped out. You might think that I’d be angry, or disappointed. But I’m surprising zen about the whole thing. Even if I never got any of my bookmarks back, I don’t think I’d be that upset, and I’m not sure why.

If Flickr went down, I’d be pretty pissed. But Ma.gnolia for me was primarily a tool for publishing — something that I used to broadcast pointers to things that I took a momentary fancy in. There’s a lot of history in my bookmarks, no doubt. In some ways, it’s a record of all the things that I’ve read that I thought might be worth someone else reading (hence why my bookmarks are public), and clearly is a list of things that have affected and informed my thinking on a broad array of topics.

But, the beauty of bookmarks is that they’re secondary references to other things. The payload is elsewhere and distributed. So in some ways, yeah, I mean, there’s a lot of good data there that’s been lost (at least for the moment). But, the reality is that the legacy of my bookmarks are forever imbued in my brain as changes in how my synapses fire. The things that I can’t remember, well, perhaps they weren’t that important to begin with.

II. Start over; the blank slate.

Leopard Blank Slate

With the money I won from the Google/O’Reilly Open Source award last summer, I decided I’d break down and by myself a new MacBook Pro. As I was initially setting it up, I figured I’d transfer my previous system setup over from my Time Machine backup and just pick up from where I left off.

I did this, but once I logged in, the new MacBook lost it’s feeling of newness, and I felt encumbered. What amounted to bit-for-bit data portability left me feeling claustrophobic and restricted. I wanted the freedom of a clean system back; somehow buying a new machine wasn’t just about better performance, but about giving myself license to forget and to start over and to make new mistakes.

I wiped the hard drive and reinstalled OS X with the minimum options. I’ve installed about ten apps so far, and I intend to hold off on anything that I don’t feel an absolute need to install, taking a hint from Ethan Kaplan:

Twitter / Ethan Kaplan: @factoryjoe only install a ...

III. And the band played on

While I love the form-factor of my MacBook Air (now my previous system), the first generation just isn’t fast enough or beefy enough for the way that I use a Mac. It’s great for email and traveling and it really is the machine that I want to be using — just with better performance (though I hear the new models are much better).

Because the hard drive on the thing is pretty miniscule by today’s standards (80GB), I quickly maxed it out with music, videos, photos and screenshots. I was down to about 6GB of space, and OS X crawls when it can’t cache the shit out of everything so I decided to take aggressive action and deleted my entire 30GB iTunes library.

Command-A. Command-Delete. Empty Trash.

And then it was done.

Now, I still need iTunes for iPhone syncing, but now I have no local music store. With the combination of Spotify, SimplifyMedia and Pandora (using PandoraJam or PandoraBoy), I’ve got a good selection of music wherever I’ve got wifi.

The act of deleting my entire music library (okay fine, I do have a complete backup on my Mac Mini media center) was cathartic. All that data… in an instant, gone. All those ratings, all that metadata, all those play counts revealing my accumulated listening habits. Gone (well, except for my Last.fm’s profile).

Of course, it’s not like I had original, irreplaceable copies of these tracks. There are copies upon copies out there. And knowing this, I intentionally destroyed all this data without really worrying about whether I’d ever be able to re-experience or relive my music again. In fact, I didn’t even give it a thought.

But my system sure seems a bit faster now.

IV. Microformats are the vinyl of the web

Vinyl is 4 Ever by Bruce Berrien

The first thing that I thought about when I heard that Ma.gnolia had had “catastrophic data loss” was that Google and Yahoo probably had pretty good caches of the site, especially given its historically high PageRank. The second thing that I thought about was that, since the site was microformatted with XFN, xFolk and other formats, recovering structured data from these caches would likely be most reliable way of externally reconstituting Ma.gnolia, in lieu of other, more conventional data retrieval methods.

Though Larry is still engaged in a full out recovery process, it gave me some sense of pride and optimism that we had had the forethought to mark up Ma.gnolia with microformats. Indeed, this kind of archival purpose was something that Tantek had presaged in 2006:

Microformats from the beginning in my mind are serving two very important purposes.

  1. Microformats provide simple ways of identifying larger chunks of information on the Web for easily and immediately publishing, sharing, moving, aggregating, and republishing.
  2. Microformats are perhaps a step forward in providing building blocks for the longevity of higher fidelity information as well.

In talking with Tantek about this, he pointed out some interesting things about many modern web services, lamenting their apparent lack of concern over longevity. For example, clearly there is a great deal of movement afoot to advance the state of distributed social networking, as evidenced by XML and JSON-based protocols like Portable Contacts and Activity Streams. But these are primarily transaction-based protocols, and archive poorly (another argument for RESTful architectural, certainly).

I would therefore agree with Tantek’s oft-repeated admonishment that services that are serious about their data should always start by marking up their sites with microformats and then add additional APIs to provide functionality (as TripIt did). It’s simply good data hygiene. It’s also about the separation between form and function (or data and interactivity). And with emerging technologies like , people can now build arbitrary mashups from the HTML on your homepage, without even having to know about your custom API.

It also means that, in the event of catastrophe (Ma.gnolia’s case) or dissolution of a service (as in the cases of Pownce, Journalspace or Consumating), there is some hope for data refugees left out in the cold.

When APIs go dark, how do you do a data backup? (Answer: you often can’t.) With public, microformatted content, there will likely be a public archive that can be used to reconstitute at least portions of the service. With dynamic APIs and proprietary data formats, all bets are off.

V. Death and data reincarnation

With both the intentional and unintentional destruction of data recently, it’s given me lots to ponder about in terms of the value, relevance, importance and longevity of data.

I talk about “data capital” like it matters, because I suppose I want it to, and hope that someday it does make a difference just how much of yourself you share with the world, simply because it’s better to share than not to.

And now I’m in this funny situation where, because I did share, and shared openly (specifically on Ma.gnolia), there is the very real possibility of reincarnating my data from the ether of the web. It could just be that all the private data, including messages, private bookmarks and thanks are forever gone, because they were kept private. But those things which were made available to anyone and everyone, through that simple aspect, can be reconstituted by extracting their essence from the caches of the internet’s memory banks.

You think about photographs of people who have died, and of videos and other media. In the past several years we’ve had to start thinking about what happens to social networking profiles on Facebook, MySpace and Twitter of people who are no longer with us. Over time, societies have invented symbols and rituals to commemorate the dead, and often use items imbued with the deceased’s social residue to help them remember and recall and relive.

How do that work when those items are locked away in incompatible and proprietary data stores? How do we cope when technology gets between humans and their humanity?

The web is a fragile place it turns out, in spite of its redundancy and distributed design.

Efforts that threaten to close it up, lock it down or wall it into proprietary gardens are turning the web against us, against history and against civilization and the collective memory. This is perhaps one reason of the primary reasons why the open web is so important to me, and factors in so centrally to my work. As I grow older, perhaps I won’t always have perspective on which things will be the most important to me, but it’s critical that in the future, I don’t inhibit my and my progeny’s ability to access my digital legacy.

Ma.gnolia logoI find it fitting that Ma.gnolia uses an organic symbol as its logo. It has, for all intents and purposes, died.

But there is a silver lining here, and I think Larry intuitively understands: in the Ma.gnolia Open Source (M2) project, he had already sowed the seeds for Ma.gnolia’s rebirth. Though it is lamentable that a such disaster would occur, I believe that creative destruction is absolutely necessary to natural systems, as forest fires are critical to the lifecycle of forests.

I also believe that things happen for a reason and that the soil of this tragedy will lead to a new start and new growth. It’s not accidental that the design of M2 called for a distributed, redundant mesh of independent bookmarking service endpoints. If anything, this situation provides Larry license to start anew, proving the necessity of death, and the wisdom of genetic inheritance and variation.