Google Profiles, namespace lock-in & social search

I’d originally intended to respond to Joshua Schacter’s post about URL shorteners and how they’re merely the tip of the data iceberg, but since I missed that debate, Google has fortuitously plied me with an even better example by releasing custom profile URLs today.

My point is to reiterate one of Tim O’Reilly’s ever-prescient admonishments about Web 2.0: lock-in can be achieved through owning a namespace. In full:

5. Chief among the future sources of lock in and competitive advantage will be data, whether through increasing returns from user-generated data (eBay, Amazon reviews, audioscrobbler info in last.fm, email/IM/phone traffic data as soon as someone who owns a lot of that data figures out that’s how to use it to enable social networking apps, GPS and other location data), through owning a namespace (Gracenote/CDDB, Network Solutions), or through proprietary file formats (Microsoft Office, iTunes). (“Data is the Intel Inside”)

(I’ll note that the process of getting advantage from data isn’t necessary a case of companies being “evil.” It’s a natural outcome of network effects applied to user contribution. Being first or best, you will attract the most users, and if your application truly harnesses network effects to get better the more people use it, you will eventually build barriers to entry based purely on the difficulty of building another such database from the ground up when there’s already so much value somewhere else. (This is why no one has yet succeeded in displacing eBay. Once someone is at critical mass, it’s really hard to get people to try something else, even if the software is better.) The question of “don’t be evil” will come up when it’s clear that someone who has amassed this kind of market position has to decide what to do with it, and whether or not they stay open at that point.)

Consider two things:

Owning the “people” namespace will determine whether people see the web through Google’s technicolor glasses or Facebook’s more nuanced and monochrome blue hues.

Curiously, it has been (correctly) argued that Google “doesn’t get social”, a criticism that I generally support. And yet, with their move to more convenient profile URLs that point to profiles that aggregate content from across the web (beating Facebook to the punch), a bigger (albeit incomplete) picture begins to emerge.

When I blogged that my name is not a URL, I wasn’t so much arguing against vanity or custom profile URLs but instead making the point that such things really should go away over time, from a usability perspective.

Let me put it this way: at one point, if you weren’t in the Yellow Pages, you basically didn’t exist. Now imagine there being several competitors to the Yellow Pages — the Red, Green and Blue Pages — each maintaining overlapping but incomplete listings of people. You’re going to want to use the one that has the most complete, exhaustive and easy-to-use list of names, right? And, I bet beyond that, if one of them was able to make the people that you know and actually care about more accessible to you, you’d pick that one over all the others. And this is where owning — and getting people to “live in” — a namespace begins to reveal its significance.

Google Profile Search

So, it’s telling thing to look at Google and Facebook’s respective approaches to their people search engines and indexes. Indeed, having a readily accessible index of living persons — structured by their connections to one another — will become a necessary precondition to getting social search right (see Aardvark for a related approach, which connects to the Facebook and IM portions of your social graph to facilitate question answering).

As social search and living through your social graph becomes “the norm” (i.e. with increasing reliance on social filtering), Google and Facebook’s ability to create compelling experiences on top of data about you and who you know will come to define and differentiate them.

To date, Google’s profile search has been rather unloved and passed over, but with the new, more convenient profile URLs and the location of profile search at google.com/profiles, I suspect that Google is finally getting serious about social.

Compare Facebook and Google’s search results for my buddy, Dave Morin:

Facebook logged out:

Search Names: dave morin | Facebook

Facebook logged in:

Facebook | Search: dave morin

Google results (there’s no difference between logged in and logged out views):

Dave Morin - Google Profile Search

Notice the difference? See how much better Facebook’s search is because it knows which “Dave Morin” is my friend?

Now, consider the profile result when you click through:

Dave’s Facebook profile (logged out):

Dave Morin - San Francisco, CA | Facebook

(Facebook’s logged in profile view is as you’d expect — a typical Facebook profile with stream and wall.)

Now, here’s the clincher. Take a look at Google’s profile for Dave:

Dave Morin - Google Profile

Google is able to provide a much richer and simpler profile, that’s much more accessible (without requiring any kind of sign in) because they’ve radically simplified their privacy model on this page (show what you want, and nothing more). Indeed, Google’s made it easier for people to be open — at least with static information — than Facebook!

So much for Facebook’s claim to openness! 😉

Of course, default Google profiles are pretty sparse, but this is just the beginning. (Bonus: both Facebook and Google public profiles support the microformat!)

And the point is: where will you build your online identity? Under whose namespace do you want to exist? (Personally, I choose my own.)

Clearly the battle for the future of the social web is heating up in subtle but significant ways, and Google’s move today shouldn’t be thought of anything less than the opening salvo in moving the battle back to its turf: search.

Portable Profiles & Preferences on the Citizen-Centric Web

Loyalty Cards by Joe LoongLet me state the problem plainly: in order to provide better service, it helps to know more about your customer, so that you can more effectively anticipate and meet her needs.

But, pray tell, how do you learn about or solicit such information over the course of your first interaction? Moreover, how do you go about learning as much as you can, as quickly as you can, without making the request itself burdensome and off-putting?

Well, as obvious as it seems, the answer is to let her tell you.

The less obvious thing is how.

And that’s where user-centric (or citizen-centric) technologies offer the most promise.

It’s like this:

  • If you let someone use an account or ID that they already use regularly elsewhere, you will save them the hassle of having to create yet another account that works solely with your service;
  • following on that, an account that is reusable is more valuable, and its value can be further increased by attaching certain types of profile attributes to it that are commonly requested;
  • the more common it becomes to reuse an account, the more people will expect this convenience during new sign up experiences, ideally to the point of knowing how to ask for support for their preferred sign-in mechanism from the services that they use;
  • presuming that service providers’ desire for profile information and preferences will not decrease, it will become an added byproduct of user-centric authentication to be able to import such data from identity providers as it is available;
  • as customers realize the convenience of portable profile and preference data, savvy identity providers will make it easier to store and express a wider array of this data, and will subsequently work with relying parties to develop interoperable sign up flows and on ramps (see Google and Plaxo).

For this to work, the individual must be motivated to manage her profile information and preferences, which shouldn’t be hard as her data becomes increasingly reusable (sort once, reuse everywhere). Additionally, organizing, maintaining, and accruing this information becomes less onerous when it’s all in one place (or conveniently accessible through one central customer-picked source), as opposed to sharded across many accounts and unaffiliated services.

You can get similar functionality with form-filling software like 1Password except in the model I’m describing, the data travels with you — beyond the browser and off the desktop — to wherever you need it — because it is stored in the cloud.

As it becomes easier to store and share this information, I think more people will do this as a happenstance of using more social software — and will become acclimated to providing their friends and service providers with varying degrees of access to increasing amounts of personally describing data.

Companies that jump on this and make it easier for people to manage their profile and preference data will benefit — having access to more accurate, timely, and better-maintained information, leading to more personalized user experiences and accelerating the path to satisfaction.

Companies that do get this right will benefit from what is emerging as a new social contract. As a citizen of the web, if you let me manage my relationship with you, and make it easy for me to do so, giving me the choice of how and where I store my profile and preference data, I’ll be more likely, more willing, and more able to share it with you, in an ongoing fashion, increasingly as you use it to improve my experiences with you.

My name is not a URL

Twitter / Mark Zuckerberg: Also just created a public ...

Arrington has a post that claims that Facebook is getting wise to something MySpace has known from the start – users love vanity URLs.

I don’t buy it. In fact, I’m pretty sure that the omission of vanity URLs on Facebook is an intentional design decision from the beginning, and one that I’ve learned to appreciate over time.

From what I’ve gathered, it was co-founder Dustin Moskovitz’s stubbornness that kept Facebook from allowing the use of pseudonymic usernames common on previous-generation social networks like AOL. Considering that Mark Zuckerberg’s plan is to build an online version of the relationships we have in real life, it only makes sense that we should, therefore, call our friends by their IRL names — not the ones left over or suggested by a computer.

But there’s actually something deeper going on here — something that I talked about at DrupalCon — because there are at least two good uses for letting people set their own vanity URLs — three if your service somehow surfaces usernames as an interface handle:

  1. Uniqueness and remembering
  2. Search engine optimization
  3. Facilitating member-to-member communication (as in the case of Twitter’s @replies)

For my own sake, I’ve lately begun decreasing the distance between my real identity and my online persona, switching from @factoryjoe to @chrismessina on Twitter. While there are plenty of folks who know me by my digital moniker, there are far more who don’t and shouldn’t need to in order to interact with me.

When considering SEO, it’s quite obvious that Google has already picked up on the correlation:

chris messina - Google Search

Ironically, in Dustin’s case (intentionally or not) he is not an authority for his own name on Google (despite the uniqueness of his name). Instead, semi-nefarious sites like Spock use SEO to get prominent placement for Dustin’s name (whether he likes it or not):

Dustin Moskovitz - Google Search

Finally, in cases like Twitter, IM or IRC, nicknames or handles are used explicitly to refer to other people on the system, even if (or especially if!) real identities are never revealed. While this separation can afford a number of perceived benefits, long-term it’s hard to quantify the net value of pseudonymity when most assholes on the web seem to act out most aggressively when shrouding their real names.

By shunning vanity URLs for its members, Facebook has achieved three things:

  1. Establishes a new baseline for transparent online identity
  2. Avoids the naming collision problem by scoping relationships within a person’s [reciprocal] social graph
  3. Upgrades expectations for human interaction on social websites

That everyone on Facebook has to use their real name (and Facebook will root out and disable accounts with pseudonyms), there’s a higher degree of accountability because legitimate users are forced to reveal who they are offline. No more “funnybunny345” or “daveman692” creeping around and leaving harassing wall posts on your profile; you know exactly who left the comment because their name is attached to their account.

Go through the comments on TechCrunch and compare those left by Facebook users with those left by everyone else. In my brief analysis, Facebook commenters tend to take their commenting more seriously. It’s not a guarantee, but there is definitely a correlation between durable identity and higher quality participation.

Now, one might point out that, without unique usernames, you’d end up with a bunch of name collisions — and you’d be right. However, combining search-by-email with profile photos largely eliminates this problem, and since Facebook requires bidirectional friendship confirmation, it’s going to be hard to get the wrong “Mike Smith” showing up in your social graph. So instead of futzing with (and probably forgetting) what strange username your friend uses, you can just search by (concept!) their real name using Facebook’s type-ahead find. And with autocompletion, you’ll never spell it wrong (of course Gmail has had this for ages as well).

Let me make a logical leap here and point out here that this is the new namespace — the human-friendly namespace — that Tim O’Reilly observed emerging when he defined Web 2.0, pointing out that a future source of lock-in would be “owning a namespace”. This is why location-based services are so hot. This is also why it matters who gets out in front first by developing a database of places named by humans — rather than by their official names. When it comes to search, search will get better when you can bound it — to the confluence of your known world and the known/colloquial world of your social graph.

When I was in San Diego a couple weeks back, it dawned on me that if I searched for “Joe’s Crab Shack”, no search engine on earth would be able to give me a satisfying result… unless it knew where I was. Or where I had been. Or, where my friends had been. This is where social search and computer-augmented social search becomes powerful (see Aardvark). Not just that, but this is where owning a database of given names tied to real things becomes hugely powerful (see Foursquare). This is where social objects with human-given names become the spimatic web.

So, as this plays out, success will find the designer who most nearly replicates the world offline online. Consider:

Twitter / Rear Adm. Monteiro: @mat and I are in the back ...

vs:

Facebook | @replies

and:

iChat

vs.

Facebook Chat

Ignoring content, it seems to me that the latter examples are much easier to grok without knowing anything about Facebook or Twitter — and are much closer approximations of real life.

Moreover, in EventBox, there is evidence that we truly are in a transitional period, where a large number of people still identity themselves or know their friends by usernames, but an increasing number of newcomers are more comfortable using real names (click to enlarge):

Eventbox Preferences

We’re only going to see more of this kind of thing, where the data-driven design approach will give way to a more overall humane aesthetic. It begins by calling people by the names we humans prefer to — and will always — use. And I think Facebook got it right by leaving out the vanity URLs.

Generation Open

I spent the weekend in DC at TransparencyCamp, an event modeled after BarCamp focused on government transparency and open access to sources of federal data (largely through APIs and web services). Down the street, a social-media savvy conference called PowerShift convened over 12,000 of the nation’s youth to march on Congress to have their concerns about the environment heard. They were largely brought together on social networks.

Last week, after an imbroglio about a change to their terms of service, Facebook published two plain-language documents setting the course for “governing Facebook in an Open and Transparent way“: a Statement of Rights and Responsibilities coupled with a list of ten guiding principles.

The week before last, the Association for Computing Machinery (ACM) released a set of recommendations for open government that, among other things, called for government data to be available in formats that promote reuse and are available via public APIs.

WTF is going on?

Clearly something has happened since I worked on the Spread Firefox project in 2004 — a time when Mozilla was an easily dismissed outpost for “modern communists” (since meritocracy and sharing equals Communism, apparently).

Seemingly, the culture of “open” has infused even the most conservative and blood-thirsty organizations with companies falling over each other to claim the mantle of being the most open of them all.

So we won, right?

I wouldn’t say that. In fact, I think it’s now when the hard work begins.

. . .

The people within Facebook not only believe in what they’re doing but are on the leading edge of Generation Open. It’s not merely an age thing; it’s a mindset thing. It’s about having all your references come from the land of the internet rather than TV and becoming accustomed to — and taking for granted — bilateral communications in place of unidirectional broadcast forms. Where authority figures used to be able to get away with telling you not to talk back, Generation Open just turns to Twitter and lets the whole world know what they think.

But it’s not just that the means of publishing have been democratized and the new medium is being mastered; change is flowing from the events that have shaped my generation’s understanding of economics, identity, and freedom.

Maybe it started with Pearl Jam (it did for me!). Or perhaps witnessing AOL incinerate Netscape, only to see a vast network emerge to champion the rise of Firefox from its ashes. Maybe being bombarded by stinking piles of Flash and Real Player one too many times lead to a realization that, “yeah, those advertisers ain’t so cool. They’re fuckin’ up my web!” Of course watching Google become a residue on the web itself, imbuing its colorful primaries on HTTP, as a lichen seduces a redwood, becoming inseparable from the host, also suggests a more organic approach to business as usual.

Talking to people who hack on Drupal or Mozilla, I’m not surprised when they presume openness as matter of course. They thrive on the work of those who have come before and in turn, pay it forward. Why wouldn’t their work be open?

Talking to people at Facebook (in light of the arc of their brief history) you might not expect openness to come culturally. Similarly, talking to Microsoft you could presume the same. In the latter case, you’d be right; in the former, I’m not so sure.

See, the people who populate Facebook are largely from Generation Open. They grew up in an era where open source wasn’t just a bygone conclusion, but it was central to how many of them learned to code. It wasn’t in computer science classes at top universities — those folks ended up at Arthur Anderson, Accenture or Oracle (and probably became equally boring). Instead, the hobbyist kids cut their teeth writing WordPress plugins, Firefox extensions, or Greasemonkey scripts. They found success because of openness.

ShareThat Zuckerberg et al talk about making the web a more “open and social place” where it’s easy to “share and connect” is no surprise: it’s the open, social nature of the web that has brought them such success, and will be the domain in which they achieve their magnum opus. They are the original progeny of the open web, and its natural heirs.

. . .

Obama is running smack against the legacy of the baby boomers — the generation whose parents defeated the Nazis. More relevant is that the boomers fought the Nazis. Their children, in turn, inherited a visceral fear of machinery, in large part thanks to IBM’s contributions to the near-extermination of an entire race of people. If you want to know why privacy is important — look to the power of aggregate knowledge in the hands of xenophobes 70 years ago.

But who was alive 70 years ago? Better: who was six years old and terribly impressionable fifty years ago? Our parents, that’s who.

And it’s no wonder why the Facebook newsfeed (now stream) and Twitter make these folks uneasy. The potential for abuse is so great and our generation — our open, open generation — is so beautifully naive.

. . .

We are the generation that will meet Al Qaeda not “head on”, but by the length of each of its tentacles. Unlike our parents’ enemies, ours are not centralized supernations anymore. Our enemies act like malware, infecting people’s brains, and thus behave like a decentralized zombie-bot horde that cannot be stopped unless you shift the environment or shut off the grid.

We are also the generation that watched our government fail to protect the victims of Katrina — before, during and after the event. The emperor’s safety net — sworn nemesis of fiscal conservatives — turned out not to exist despite all their persistent whining. Stranded, hundreds took to their roofs while helicopters hovered over head, broadcasting FEMA’s failure on the nightly news. While Old Media gawked, the open source community solved problems, delivering the Katrina PeopleFinder database, meticulously culled from public records and disparate resources that, at the time, lacked usable APIs.

But that wasn’t the first time “privacy” worked against us. On September 11, 2001 we flooded the cell networks, just wanting to know whether our friends and family were safe. The network, controlled by a few megacorporations, failed under the weight of our anxiety and calls; those supposed consumer protections designed to keep us safe… didn’t, turning technology and secrecy against us.

. . .

Back to this weekend in DC.

You put TransparencyCamp in context — and think about all the abuses that have been perpetrated by humans against humans — throughout time… you have to stop and wonder: “Geez, what on earth will make this generation any different than the ones that have come before? What’s to say that Zuckerberg — once he assembles a mass of personally identifying information on his peers on an order of magnitude never achieved since humans started counting time — won’t he do what everyone in his position has done before?”

Oddly enough, the answer is probably not. The reason is the web. Even weirder is that Facebook, as I write this, seems to be taking steps to embrace the web, seeking to become a part of it — rather than competing against it. It seems, at least in my interactions with folks at Facebook, that a good portion of them genuinely want to work with the web as it today, as they recognize the power that they themselves have derived from it. As they benefitted from it, they shall benefit it in turn.

Seems counterproductive to all those MBAs who study Microsoft as the masterstroke of the 21st century, but to the citizens of the web — we get it.

What Facebook is attempting — like the Obama administration in parallel — is nothing short of a revolution; you simply can’t evolve out of a culture of fear and paranoia that was passed down to us. You have to disrupt the ecosystem, and create a new equilibrium.

If we are Generation Open, then we are the optimistic generation. Ours only comes around every several generations with the resurgence of pure human spirit coupled with the resplendent realization of intent.

There are, however, still plenty who reject this attitude and approach, suffering from the combined malaise of “proprietariness”, “materialism”, and “consumerism”.

But — I shit you not — as the world turns, things are changing. Sharing and giving away all that you can are the best defenses against fear, obsolescence, growing old, and, even, wrinkles. It isn’t always easy, but it’s how we outlive the shackles of biology and transcend the physicality of gravity.

To transcend is to become transparent, clear, open.

RIP @factoryjoe

Twitter / Mr Messina: Oh, and in case you missed ...

Sometime last week, after two Manhattans, I decided to change my Twitter username from @factoryjoe to @chrismessina. In the scheme of things, not a big deal (yeah, okay, so I broke a couple thousand hyperlinks…). And yet, I can’t but feel like I’ve shed a skin or changed identities… at least to a specific audience.

I started using Twitter in 2006 as “factoryjoe”. Of course, this is the nick that I use everywhere —from Flickr to my personal homepage — so that choice was obvious. I essentially own factoryjoe on the web — people even occasionally call me “Joe” when we meet, such is their familiarity with my online persona. But that’s not my actual name.

When I talk in front of people and I introduce myself as “Chris Messina”, the disconnect between my real name and my online persona becomes distracting. And, over time, my motivations for having a separate online identity have waned.

But first, I suppose, I should provide some background.

Where did “factoryjoe” come from?

Every so often I’m asked where “factoryjoe” came from: “Kind of like ‘Joe the Plumber?’” “Kind of,” I say. “But not really.”

Growing up, I drew comic books for fun. In fact, for most of my formative years, it seemed pretty clear that I’d pursue a career in art. I worked in pastels, watercolor, pen and ink; I preferred pen and ink above all the others though, taking lessons from Rob Liefeld, Todd McFarlane, Jim Lee and others as Image Comics came on to the scene. It was a fond dream of mine to someday pen my own sequential art.
1984 PosterIn high school, I read Nineteen Eighty-Four and became enamored with the character of Winston Smith, Orwell’s “everyman” character. In Winston Smith, I found a confederate, struggling to assert his individual humanity against the massive, dehumanizing forces of groupthink and oligarchy. Similarly, I identified with Vonnegut’s Harrison Bergeron and his struggle against homogeneity and mediocrity. The contours of “factoryjoe” began to emerge against the backdrop of the metropolitan “FactoryCity”, where industrialism was proven a sham and one’s conspicuous pursuit of passion ruled over the shallow pursuit of material consumption.

Factory City

Factory Joe was the anonymous shell in which I could plant my aspirations and designs for the future. He served as a metaphorical vessel through which I could mold a broader narrative.

So… changing your Twitter username?

In every superhero’s journey, there comes a time when the mask grows bigger than its owner. Is it the mask that provides the wearer with his power, or is it something integral to the individual?

I once believed that I needed to have a deep separation between myself and my online persona — that they should be distinct; that I should distrust the web. Over time I’ve realized a great deal power by closing the gap between who I am offline and who I am online. I suppose this is the power of transparency, developed through consistency and demonstrated integrity.

@factoryjoe was, therefore, my first go at creating an online identity for myself. A kind of “home away from home” that I could experiment with before this whole social web thing caught on.

As it happened, this was fine when I had a small group of friends who used similar aliases for themselves, but more recently — inspired by Facebook’s allergy to pseudonyms and non-human friendly usernames — it seems that not only owning your own identity is in vogue, but using your real name is an act of assertiveness, inventiveness or establishment. Heck, if you’re willing to share your real name with 150+ million compatriots on Facebook, is there really that much to be gained from obfuscating your actual name on the open web anymore (that’s rhetorical)?

So, back to Future of Web Apps… following my workshop with Dave, I took a step back to think about how it must appear for me to be working on the social web and identity technologies while maintaining this dichotomy between my offline and online personas — in name only. C’mon, when people have feedback and I’m talking on stage — who do I want them addressing? — my assumed identity … or me? The friction that I invented is just no longer necessary.

So factoryjoe isn’t going away — not entirely at least. It’s a useful vessel to inhabit and I’ll continue to do so. But on Twitter, Facebook, and on my homepage, I’ll use my real name. There is simply no longer a good reason to differentiate between who I am online, and who I am off, if ever there was.

. . .

Postscript: I’m now @chrismessina on Twitter. If we were friends before — no need to make any changes — Twitter took care of that already. @factoryjoe‘s been retired, but now that I got it back from Recordon (he was just jealous, since he has the worst username ever), who knows, maybe he’ll return someday. We’ll see!

BBC Digital Planet podcast featuring OpenID

Update: The BBC has posted a write-up of the report called Easy login plans gather pace.

Digital Planet album artworkI was interviewed by Gareth Mitchell last week about OpenID for the BBC’s Digital Planet podcast.

Our conversation lasted about 10 minutes — of which only about two minutes survived (mirrored here as they currently do not keep an archive of previous episodes).

It was a familiar conversation for me, since the primary concerns Gareth expressed had to do with privacy, identity and the notion that “someone else” could “own” another’s identity on the web. His premise sounded familiar: “Won’t OpenID make my identity more hackable?”

The answer, of course, isn’t that straight-forward, and depends on a lot of mitigating factors. However, the fundamental take-away is that OpenID really is no more insecure than email, and even then, provides a future-facing design that that leads to many kinds of protection that email, in practice, does not.

. . .

I’ve also noticed over the past several years that Europeans harbor much greater sensitivities to privacy issues while Americans tend to concentrate on matters concerning “property” (physical, personal and intellectual). This is evidenced by yesterday’s blow up around Facebook’s changes to their Terms of Service. On the one hand, there’s this weird American outcry against Facebook owning your data (in common, at least) forever. From the European side, it seems like the concern is centered more around what the changes mean to one’s privacy, rather than whether Facebook can perpetually “make money” off your stuff.

I bring this up because it’s immensely relevant with regards to the conversation I had with Gareth (given that he’s based in the UK).

With the current case, I’m sympathetic to Facebook, because I know that this will be the year that people have their “mindframes” bent around new conceptions of personal privacy and control and ownership of data. I believe (as Facebook purports to) that people’s desire to share will overcome their desire for control over their personal data, and that they will gradually realize that sharing will require letting go. It is this reality — the reality of networked data in the cloud — that necessitated Facebook’s change to their terms of service — not some nefarious desire to steal your first born (or your data).

In other words, the conditions and kind of thinking that lead to the backlash against Plaxo known as Scoblegate will cease to exist in the future. Facebook’s change is merely a recognition of this new environment.

It remains unclear to me whether the pundits in this space realize that this shift will occur, and will occur naturally (as it has already begun — consider the integration of Facebook and Flickr in iPhoto ’09), or whether they just want to scream and holler when they notice something that seems astray.

. . .

Last December, I spent time talking to Boaz Sender of HTML Times at length about several of these topics (including discussing the intellectual property issues surrounding many of the technologies that are helping to ensure that the web remain an open playing field) in an interview about Identity in the Network. In juxtaposition to my interview with the BBC, I think this interview gets into some of the deeper issues at work here that must also be considered when it comes to the future of online identity, privacy and data control and (co)-ownership.

Where data goes when it dies and other musings

I’ve been wanting to write about Ma.gnolia’s catastrophic data loss last week ever since it happened, but wasn’t quite sure how I wanted to approach it. Larry (Ma.gnolia founder and the sole person who maintained the site) is a good friend of mine, and Ma.gnolia was one of Citizen Agency’s first clients. It’s been painful to see him struggle through this, both personally and professionally, and it’s about the worst possible [preventable] thing that can happen to a Web 2.0 service.

Still, kept in context, it’s made me reconsider some things about the nature and value of open, networked data.

I. How I Learned to Stop Worrying and Love the Bomb

According to Google’s cache of my profile on Ma.gnolia, I had accrued 5758 bookmarks and 6162 tags since I first started using the service August 08, 2004. That’s a lot of data capital to have instantly wiped out. You might think that I’d be angry, or disappointed. But I’m surprising zen about the whole thing. Even if I never got any of my bookmarks back, I don’t think I’d be that upset, and I’m not sure why.

If Flickr went down, I’d be pretty pissed. But Ma.gnolia for me was primarily a tool for publishing — something that I used to broadcast pointers to things that I took a momentary fancy in. There’s a lot of history in my bookmarks, no doubt. In some ways, it’s a record of all the things that I’ve read that I thought might be worth someone else reading (hence why my bookmarks are public), and clearly is a list of things that have affected and informed my thinking on a broad array of topics.

But, the beauty of bookmarks is that they’re secondary references to other things. The payload is elsewhere and distributed. So in some ways, yeah, I mean, there’s a lot of good data there that’s been lost (at least for the moment). But, the reality is that the legacy of my bookmarks are forever imbued in my brain as changes in how my synapses fire. The things that I can’t remember, well, perhaps they weren’t that important to begin with.

II. Start over; the blank slate.

Leopard Blank Slate

With the money I won from the Google/O’Reilly Open Source award last summer, I decided I’d break down and by myself a new MacBook Pro. As I was initially setting it up, I figured I’d transfer my previous system setup over from my Time Machine backup and just pick up from where I left off.

I did this, but once I logged in, the new MacBook lost it’s feeling of newness, and I felt encumbered. What amounted to bit-for-bit data portability left me feeling claustrophobic and restricted. I wanted the freedom of a clean system back; somehow buying a new machine wasn’t just about better performance, but about giving myself license to forget and to start over and to make new mistakes.

I wiped the hard drive and reinstalled OS X with the minimum options. I’ve installed about ten apps so far, and I intend to hold off on anything that I don’t feel an absolute need to install, taking a hint from Ethan Kaplan:

Twitter / Ethan Kaplan: @factoryjoe only install a ...

III. And the band played on

While I love the form-factor of my MacBook Air (now my previous system), the first generation just isn’t fast enough or beefy enough for the way that I use a Mac. It’s great for email and traveling and it really is the machine that I want to be using — just with better performance (though I hear the new models are much better).

Because the hard drive on the thing is pretty miniscule by today’s standards (80GB), I quickly maxed it out with music, videos, photos and screenshots. I was down to about 6GB of space, and OS X crawls when it can’t cache the shit out of everything so I decided to take aggressive action and deleted my entire 30GB iTunes library.

Command-A. Command-Delete. Empty Trash.

And then it was done.

Now, I still need iTunes for iPhone syncing, but now I have no local music store. With the combination of Spotify, SimplifyMedia and Pandora (using PandoraJam or PandoraBoy), I’ve got a good selection of music wherever I’ve got wifi.

The act of deleting my entire music library (okay fine, I do have a complete backup on my Mac Mini media center) was cathartic. All that data… in an instant, gone. All those ratings, all that metadata, all those play counts revealing my accumulated listening habits. Gone (well, except for my Last.fm’s profile).

Of course, it’s not like I had original, irreplaceable copies of these tracks. There are copies upon copies out there. And knowing this, I intentionally destroyed all this data without really worrying about whether I’d ever be able to re-experience or relive my music again. In fact, I didn’t even give it a thought.

But my system sure seems a bit faster now.

IV. Microformats are the vinyl of the web

Vinyl is 4 Ever by Bruce Berrien

The first thing that I thought about when I heard that Ma.gnolia had had “catastrophic data loss” was that Google and Yahoo probably had pretty good caches of the site, especially given its historically high PageRank. The second thing that I thought about was that, since the site was microformatted with XFN, xFolk and other formats, recovering structured data from these caches would likely be most reliable way of externally reconstituting Ma.gnolia, in lieu of other, more conventional data retrieval methods.

Though Larry is still engaged in a full out recovery process, it gave me some sense of pride and optimism that we had had the forethought to mark up Ma.gnolia with microformats. Indeed, this kind of archival purpose was something that Tantek had presaged in 2006:

Microformats from the beginning in my mind are serving two very important purposes.

  1. Microformats provide simple ways of identifying larger chunks of information on the Web for easily and immediately publishing, sharing, moving, aggregating, and republishing.
  2. Microformats are perhaps a step forward in providing building blocks for the longevity of higher fidelity information as well.

In talking with Tantek about this, he pointed out some interesting things about many modern web services, lamenting their apparent lack of concern over longevity. For example, clearly there is a great deal of movement afoot to advance the state of distributed social networking, as evidenced by XML and JSON-based protocols like Portable Contacts and Activity Streams. But these are primarily transaction-based protocols, and archive poorly (another argument for RESTful architectural, certainly).

I would therefore agree with Tantek’s oft-repeated admonishment that services that are serious about their data should always start by marking up their sites with microformats and then add additional APIs to provide functionality (as TripIt did). It’s simply good data hygiene. It’s also about the separation between form and function (or data and interactivity). And with emerging technologies like , people can now build arbitrary mashups from the HTML on your homepage, without even having to know about your custom API.

It also means that, in the event of catastrophe (Ma.gnolia’s case) or dissolution of a service (as in the cases of Pownce, Journalspace or Consumating), there is some hope for data refugees left out in the cold.

When APIs go dark, how do you do a data backup? (Answer: you often can’t.) With public, microformatted content, there will likely be a public archive that can be used to reconstitute at least portions of the service. With dynamic APIs and proprietary data formats, all bets are off.

V. Death and data reincarnation

With both the intentional and unintentional destruction of data recently, it’s given me lots to ponder about in terms of the value, relevance, importance and longevity of data.

I talk about “data capital” like it matters, because I suppose I want it to, and hope that someday it does make a difference just how much of yourself you share with the world, simply because it’s better to share than not to.

And now I’m in this funny situation where, because I did share, and shared openly (specifically on Ma.gnolia), there is the very real possibility of reincarnating my data from the ether of the web. It could just be that all the private data, including messages, private bookmarks and thanks are forever gone, because they were kept private. But those things which were made available to anyone and everyone, through that simple aspect, can be reconstituted by extracting their essence from the caches of the internet’s memory banks.

You think about photographs of people who have died, and of videos and other media. In the past several years we’ve had to start thinking about what happens to social networking profiles on Facebook, MySpace and Twitter of people who are no longer with us. Over time, societies have invented symbols and rituals to commemorate the dead, and often use items imbued with the deceased’s social residue to help them remember and recall and relive.

How do that work when those items are locked away in incompatible and proprietary data stores? How do we cope when technology gets between humans and their humanity?

The web is a fragile place it turns out, in spite of its redundancy and distributed design.

Efforts that threaten to close it up, lock it down or wall it into proprietary gardens are turning the web against us, against history and against civilization and the collective memory. This is perhaps one reason of the primary reasons why the open web is so important to me, and factors in so centrally to my work. As I grow older, perhaps I won’t always have perspective on which things will be the most important to me, but it’s critical that in the future, I don’t inhibit my and my progeny’s ability to access my digital legacy.

Ma.gnolia logoI find it fitting that Ma.gnolia uses an organic symbol as its logo. It has, for all intents and purposes, died.

But there is a silver lining here, and I think Larry intuitively understands: in the Ma.gnolia Open Source (M2) project, he had already sowed the seeds for Ma.gnolia’s rebirth. Though it is lamentable that a such disaster would occur, I believe that creative destruction is absolutely necessary to natural systems, as forest fires are critical to the lifecycle of forests.

I also believe that things happen for a reason and that the soil of this tragedy will lead to a new start and new growth. It’s not accidental that the design of M2 called for a distributed, redundant mesh of independent bookmarking service endpoints. If anything, this situation provides Larry license to start anew, proving the necessity of death, and the wisdom of genetic inheritance and variation.

TheSocialWeb.tv #25: “An ‘Open’ Letter to the Obama Administration”

http://www.viddler.com/player/95214990/

Last Friday, Joseph, John and I recorded episode #25 of TheSocialWeb.tv.

Besides shout outs to 97bottles.com and Janrain for their stats on third-party account login usage, we discussed how the Obama administration might better make use of or leverage elements of the Open Stack — specifically OpenID.

Why YouTube should support Creative Commons now

YouTube should support Creative Commons

I was in Miami last week to meet with my fellow screeners from the Knight News Challenge and Jay Dedman and Ryanne Hodson, two vlogger friends whom I met through coworking, started talking about content licensing, specifically as related to President-Elect Barack Obama’s weekly address, which, if things go according to plan, will continue to be broadcast on YouTube.

The question came up: what license should Barack Obama use for his content? This, in turn, revealed a more fundamental question: why doesn’t YouTube let you pick a license for the work that you upload (and must, given the terms of the site, own the rights to in the first place)? And if this omission isn’t intentional (that is, no one decided against such a feature, it just hasn’t bubbled up in the priority queue yet), then what can be done to facilitate the adoption of Creative Commons on the site?

To date, few video sharing sites, save Blip.tv and Flickr (even if they only deal with long photos), have actually embraced Creative Commons to any appreciable degree. Ironically, of all sites, YouTube seems the most likely candidate to adopt Creative Commons, given its rampant remix and republish culture (a culture which continues to vex major movie studies and other fastidious copyright owners).

One might make the argument that, considering the history of illegally shared copyrighted material on YouTube, enabling Creative Commons would simply lead to people mislicensing work that they don’t own… but I think that’s a strawman argument that falls down in practice for a number of reasons:

  • First of all, all sites that enable the use of CC licenses offer the scheme as opt-in, defaulting to the traditional all rights reserved use of copyright. Enabling the choice of Creative Commons wouldn’t necessarily affect this default.
  • Second, unauthorized sharing of content or digital media under any license is still illegal, whether the relicensed work is licensed under Creative Commons or copyright.
  • Third, YouTube, and any other media sharing site, bears some responsibility for the content published on their site, and, regardless of license, reserves the right to remove any material that fails to comply completely with its Terms of Service.
  • Fourth, the choice of a Creative Commons license is usually a deliberate act (going back to my first point) intended to convey an intention. The value of this intention — specifically, to enable the lawful reuse and republishing of content or media by others without prior per-instance consent — is a net positive to the health of a social ecosystem insomuch as this choice enables a specific form of freedom: that is, the freedom to give away one’s work under certain, less-restrictive stipulations than the law allows, to aid in establishing a positive culture of sharing and creativity (as we’ve seen on , SoundCloud and CC Mixter).

Preventing people from choosing a more liberal license conceivably restricts expression, insomuch as it restricts an “efficient, content-enriching value chain” from forming within a legal framework. Or, because all material is currently licensed under the most restrictive regime on YouTube, every re-use of a portion of media must therefore be licensed on a per-instance basis, considerably impeding the legal reuse of other people’s work.

. . .

Now, I want to point out something interesting here… as specifically related to both this moment in time and about government ownership of media. A recently released report from the GAO on Energy Efficiency carried with it the following statement on copyright:

This is a work of the U.S. government and is not subject to copyright protection in the United States. The published product may be reproduced and distributed in its entirety without further permission from GAO. However, because this work may contain copyrighted images or other material, permission from the copyright holder may be necessary if you wish to reproduce this material separately.

Though it can’t simply put this work into the public domain because of the potential copyrighted materials embedded therein, this statement is about as close as you can get for an assembled work produced by the government.

Now consider that Obama’s weekly “radio address” is self-contained media, not contingent upon the use or reuse of any other copyrighted work. It bears considering what license (if any) should apply (keeping in mind that the government is funded by tax-payer dollars). If not the public domain, under what license should Obama’s weekly addresses be shared? Certainly not all rights reserved! — unfortunately, YouTube offers no other option and thus, regardless of what Obama or the Change.gov folks would prefer, they’re stuck with a single, monolithic licensing scheme.

Interestingly, Google, YouTube’s owner, has supported Creative Commons in the past, notably with their collaboration with Radiohead on the House of Cards open source initiative and with the licensing of the Summer of Code documentation (Yahoo has a similar project with Flickr’s hosting of the Library of Congress’ photo archive under a liberal license).

I think that it’s critical for YouTube to adopt the Creative Commons licensing scheme now, as Barack Obama begins to use the site for his weekly address, because of the powerful signal it would send, in the context of what I imagine will be a steady increase and importance of the use of social media and web video by government agencies.

Don Norman recently wrote an essay on the importance of social signifiers, and I think it underscores my point as to why this issue is pressing now. In contrast to the popular concept of “affordances” in design and design thinking, Norman writes:

A “signifier” is some sort of indicator, some signal in the physical or social world that can be interpreted meaningfully. Signifiers signify critical information, even if the signifier itself is an accidental byproduct of the world. Social signifiers are those that are relevant to social usages. Some social indicators simply are the unintended but informative result of the behavior of others.

. . .

I call any physically perceivable cue a signifier, whether it is incidental or deliberate. A social signifier is one that is either created or interpreted by people or society, signifying social activity or appropriate social behavior.

The “appropriate social behavior”, or behavior that I think Obama should model in his weekly podcasts is that of open and free licensing, introducing the world of YouTube viewers to an alternative form of licensing, that would enable them to better understand and signal to others their intent and desire to share, and to have their creative works reused, without the need to ask for permission first.

For Obama media to be offered under a CC license (with the licensed embedded in the media itself) would signal his seriousness about embracing openness, transparency and the nature of discourse on the web. It would also signify a shift towards the type of collaboration typified by Web 2.0 social sites, enabling a modern dialectic relationship between the citizenry and its government.

I believe that now is the time for this change to happen, and for YouTube to prioritize the choice of Creative Commons licensing for the entire YouTube community.

On invite-only betas

Fred Wilson wrote about the value of blogging and building social capital, demonstrated by the hundred requests for invites he received on his post on his recent investment, Boxee, an invite-only service.

Now, while I find the behavior of public invite-requesting curious, I understand it.

I also think there’s another side to this equation that I’d like to point out, being one of the fortunate early adopters who happens to get invited to a lot of early alphas and betas… and that’s understanding the relationship between the creator of the beta and the testers. Or, to put it another way, requesting an invite to a service for one’s own benefit is one thing; understanding that an invite is a privilege given in exchange for feedback and suggestions provided is another. And the secret to getting early access to beta programs is, perhaps obviously, to be a good beta tester.

There are any number of ways to demonstrate that you’re worthy of an invite to an invite-only alpha or beta program. One problem is that a lot of beta feedback is submitted privately, outside of public forums. Whenever I can, I attempt to use more public forums, both for my own recollection, but also for the benefit or other testers, developers and later users.

In other cases, I’ll use Flickr or Twitter, leading to interesting phenomena, similar to what Fred describes.

SpotifyIn particular, I’ve been alpha testing a music player called Spotify for some time. It’s an incredible service and recently opened up with three levels of service, although it’s sadly not available in the US yet owing to licensing issues. Now, the only way to get an account with the service is to request an invitation.

It just so happens that I screenshotted an element of the new interface, uploaded it to Flickr and titled the photo “Spotify Invites“. That photo is now the second result for that phrase on Google and people have noticed, quickly exhausting my supply of invites.

The problem with this scenario, and with Fred’s, is that many folks seem eager to get access solely for their own benefit, without thought to the quid pro quo that makes beta programs successful (and ultimately benefit both the developer and subsequent users!). And I think it’s worth it to point out that beta programs aren’t just freebie give-aways: the gate is there for a reason!

I wrote this post in 2005, back when Gmail was an invite-only service (!!) and I was thinking about the relationships we were attempting to cultivate with the Flock alpha tester program:

So what of all these invite-only (or formally invite-only) services where you have to know someone on the inside to get a golden ticket? Does it artificially increase desire? Does it help services grow organically and cut down on trolls and spam, creating more value for invitees? Does it create more investment from the user community and perhaps establish even minor connections between invitor and invitee? Or does it create a false hierarchy around an inner circle of well-connected geeks?

Who knows?

What I do know is that it’s a curious trend and happening rather profusely across the web. Good or bad? I can’t quite say — except that in the case of Flock, we’re using the invite system to start out slowly on purpose. We want to not only be able to scale up organically, but we also want to cultivate relationships with our brave early adopters so that we can build the best experience possible over time. And to that end — we want to make sure that when we do launch publicly, we’ve hammered out all the glaring issues — as well as minor ones — so that sum total Flock makes you more productive, more explorative, and more voraciously social on the web. So for now, Flock will remain available to few kindred souls with enough courage to shove through our bugs and dodge the sharp edges. In the meantime, do add yourself to our invite lottery so that your name will be there when the next round of invites go out.

Not much has changed in terms of the structure of invite-only betas (even though the tools for managing them have improved), but I think something of the intimacy and purpose of these programs have been missed as more of the mainstream have gotten used to handing out just their email address for access to such initiatives.

As Fred points out that there’s value in building up social capital so that you can help stoke interest in new projects and draw the interest of potentially valuable contributors and testers, but it’s just as important to highlight the value of diligent and hard-working testers who have an interest in improving products and becoming partners in the potential success of such projects. I think there’s the potential for mutually reinforcing and ongoing relationships in the execution of a productive beta program, and that those longer-term relationships should not be overlooked.

. . .

To that end, I’m looking for some highly motivated and qualified testers for , Real Mac Software’s new webpage screenshot utility. Be one of the first ten to leave a comment with your proper email address and a description of how you approach beta testing and I’ll send you info on where you can sign up. As I’m eager to see LittleSnapper mature, I won’t settle for just anyone — prove to me that you’d add value to the alpha tester program! 😉