ma.gnolia – Factory Joe

Where data goes when it dies and other musings

I’ve been wanting to write about Ma.gnolia’s catastrophic data loss last week ever since it happened, but wasn’t quite sure how I wanted to approach it. Larry (Ma.gnolia founder and the sole person who maintained the site) is a good friend of mine, and Ma.gnolia was one of Citizen Agency’s first clients. It’s been painful to see him struggle through this, both personally and professionally, and it’s about the worst possible [preventable] thing that can happen to a Web 2.0 service.

Still, kept in context, it’s made me reconsider some things about the nature and value of open, networked data.

I. How I Learned to Stop Worrying and Love the Bomb

According to Google’s cache of my profile on Ma.gnolia, I had accrued 5758 bookmarks and 6162 tags since I first started using the service August 08, 2004. That’s a lot of data capital to have instantly wiped out. You might think that I’d be angry, or disappointed. But I’m surprising zen about the whole thing. Even if I never got any of my bookmarks back, I don’t think I’d be that upset, and I’m not sure why.

If Flickr went down, I’d be pretty pissed. But Ma.gnolia for me was primarily a tool for publishing — something that I used to broadcast pointers to things that I took a momentary fancy in. There’s a lot of history in my bookmarks, no doubt. In some ways, it’s a record of all the things that I’ve read that I thought might be worth someone else reading (hence why my bookmarks are public), and clearly is a list of things that have affected and informed my thinking on a broad array of topics.

But, the beauty of bookmarks is that they’re secondary references to other things. The payload is elsewhere and distributed. So in some ways, yeah, I mean, there’s a lot of good data there that’s been lost (at least for the moment). But, the reality is that the legacy of my bookmarks are forever imbued in my brain as changes in how my synapses fire. The things that I can’t remember, well, perhaps they weren’t that important to begin with.

II. Start over; the blank slate.

With the money I won from the Google/O’Reilly Open Source award last summer, I decided I’d break down and by myself a new MacBook Pro. As I was initially setting it up, I figured I’d transfer my previous system setup over from my Time Machine backup and just pick up from where I left off.

I did this, but once I logged in, the new MacBook lost it’s feeling of newness, and I felt encumbered. What amounted to bit-for-bit data portability left me feeling claustrophobic and restricted. I wanted the freedom of a clean system back; somehow buying a new machine wasn’t just about better performance, but about giving myself license to forget and to start over and to make new mistakes.

I wiped the hard drive and reinstalled OS X with the minimum options. I’ve installed about ten apps so far, and I intend to hold off on anything that I don’t feel an absolute need to install, taking a hint from Ethan Kaplan:

III. And the band played on

While I love the form-factor of my MacBook Air (now my previous system), the first generation just isn’t fast enough or beefy enough for the way that I use a Mac. It’s great for email and traveling and it really is the machine that I want to be using — just with better performance (though I hear the new models are much better).

Because the hard drive on the thing is pretty miniscule by today’s standards (80GB), I quickly maxed it out with music, videos, photos and screenshots. I was down to about 6GB of space, and OS X crawls when it can’t cache the shit out of everything so I decided to take aggressive action and deleted my entire 30GB iTunes library.

Command-A. Command-Delete. Empty Trash.

And then it was done.

Now, I still need iTunes for iPhone syncing, but now I have no local music store. With the combination of Spotify, SimplifyMedia and Pandora (using PandoraJam or PandoraBoy), I’ve got a good selection of music wherever I’ve got wifi.

The act of deleting my entire music library (okay fine, I do have a complete backup on my Mac Mini media center) was cathartic. All that data… in an instant, gone. All those ratings, all that metadata, all those play counts revealing my accumulated listening habits. Gone (well, except for my Last.fm’s profile).

Of course, it’s not like I had original, irreplaceable copies of these tracks. There are copies upon copies out there. And knowing this, I intentionally destroyed all this data without really worrying about whether I’d ever be able to re-experience or relive my music again. In fact, I didn’t even give it a thought.

But my system sure seems a bit faster now.

IV. Microformats are the vinyl of the web

The first thing that I thought about when I heard that Ma.gnolia had had “catastrophic data loss” was that Google and Yahoo probably had pretty good caches of the site, especially given its historically high PageRank. The second thing that I thought about was that, since the site was microformatted with XFN, xFolk and other formats, recovering structured data from these caches would likely be most reliable way of externally reconstituting Ma.gnolia, in lieu of other, more conventional data retrieval methods.

Though Larry is still engaged in a full out recovery process, it gave me some sense of pride and optimism that we had had the forethought to mark up Ma.gnolia with microformats. Indeed, this kind of archival purpose was something that Tantek had presaged in 2006:

Microformats from the beginning in my mind are serving two very important purposes.

Microformats provide simple ways of identifying larger chunks of information on the Web for easily and immediately publishing, sharing, moving, aggregating, and republishing.

Microformats are perhaps a step forward in providing building blocks for the longevity of higher fidelity information as well.

In talking with Tantek about this, he pointed out some interesting things about many modern web services, lamenting their apparent lack of concern over longevity. For example, clearly there is a great deal of movement afoot to advance the state of distributed social networking, as evidenced by XML and JSON-based protocols like Portable Contacts and Activity Streams. But these are primarily transaction-based protocols, and archive poorly (another argument for RESTful architectural, certainly).

I would therefore agree with Tantek’s oft-repeated admonishment that services that are serious about their data should always start by marking up their sites with microformats and then add additional APIs to provide functionality (as TripIt did). It’s simply good data hygiene. It’s also about the separation between form and function (or data and interactivity). And with emerging technologies like YQL, people can now build arbitrary mashups from the HTML on your homepage, without even having to know about your custom API.

It also means that, in the event of catastrophe (Ma.gnolia’s case) or dissolution of a service (as in the cases of Pownce, Journalspace or Consumating), there is some hope for data refugees left out in the cold.

When APIs go dark, how do you do a data backup? (Answer: you often can’t.) With public, microformatted content, there will likely be a public archive that can be used to reconstitute at least portions of the service. With dynamic APIs and proprietary data formats, all bets are off.

V. Death and data reincarnation

With both the intentional and unintentional destruction of data recently, it’s given me lots to ponder about in terms of the value, relevance, importance and longevity of data.

I talk about “data capital” like it matters, because I suppose I want it to, and hope that someday it does make a difference just how much of yourself you share with the world, simply because it’s better to share than not to.

And now I’m in this funny situation where, because I did share, and shared openly (specifically on Ma.gnolia), there is the very real possibility of reincarnating my data from the ether of the web. It could just be that all the private data, including messages, private bookmarks and thanks are forever gone, because they were kept private. But those things which were made available to anyone and everyone, through that simple aspect, can be reconstituted by extracting their essence from the caches of the internet’s memory banks.

You think about photographs of people who have died, and of videos and other media. In the past several years we’ve had to start thinking about what happens to social networking profiles on Facebook, MySpace and Twitter of people who are no longer with us. Over time, societies have invented symbols and rituals to commemorate the dead, and often use items imbued with the deceased’s social residue to help them remember and recall and relive.

How do that work when those items are locked away in incompatible and proprietary data stores? How do we cope when technology gets between humans and their humanity?

The web is a fragile place it turns out, in spite of its redundancy and distributed design.

Efforts that threaten to close it up, lock it down or wall it into proprietary gardens are turning the web against us, against history and against civilization and the collective memory. This is perhaps one reason of the primary reasons why the open web is so important to me, and factors in so centrally to my work. As I grow older, perhaps I won’t always have perspective on which things will be the most important to me, but it’s critical that in the future, I don’t inhibit my and my progeny’s ability to access my digital legacy.

Ma.gnolia logo I find it fitting that Ma.gnolia uses an organic symbol as its logo. It has, for all intents and purposes, died.

But there is a silver lining here, and I think Larry intuitively understands: in the Ma.gnolia Open Source (M2) project, he had already sowed the seeds for Ma.gnolia’s rebirth. Though it is lamentable that a such disaster would occur, I believe that creative destruction is absolutely necessary to natural systems, as forest fires are critical to the lifecycle of forests.

I also believe that things happen for a reason and that the soil of this tragedy will lead to a new start and new growth. It’s not accidental that the design of M2 called for a distributed, redundant mesh of independent bookmarking service endpoints. If anything, this situation provides Larry license to start anew, proving the necessity of death, and the wisdom of genetic inheritance and variation.

The OpenID mobile experience

Two days ago, Ma.gnolia launched their mobile version, and it’s pretty awesome (disclosure: Ma.gnolia is a former client and current friend/partner of Citizen Agency).

In the course of development, Larry asked me what he thought he should do about adding OpenID sign-in to the mobile version. He was reluctant to do so because, he reasoned, the experience of logging in sucks, not just because of the OpenID round-trip dance, but because most identity providers don’t actually support a mobile-friendly interface.

Indeed, if you take a look at the flow from the Ma.gnolia mobile UI to my OpenID provider (using the iPhone simulator app), you can see that it does suck.

I strongly encourage Larry to go ahead and add OpenID even if the flow isn’t ideal. As it is, you can sign up to Ma.gnolia with only an OpenID (without a need for creating yet another username and password) and so without offering this login option, the mobile site would be off-limits to folks in this situation.

So there’s clearly an opportunity here, and I’m hoping that out of OpenIDDevCamp today, we can start to develop some best practices and interface guidelines for OpenID providers for the mobile flow (not to mention more generally).

If you’ve seen a good example of an OpenID (or roundtrip authentication flow) for mobile, leave a comment here and let me know. It’s hard to get screenshots of this stuff, so any pointers would be appreciated!

Announcing OAuth 1.0 Public Draft 1

Well, it’s been a long time coming, and if you’ve been following my Twitters at all, you’ll know that I’ve been working on an open, community-driven authorization protocol called OAuth for the past few months. Today we released the first Public Draft for review.

The idea started as a humble effort to accomplish two goals: first, to enable Ma.gnolia members who created their accounts with OpenIDs (and therefore don’t have traditional usernames and passwords) to be able to use Dashboard Widgets; and second, to enable Twitter to adopt OpenID when its current API requires a username and password to authorize access to protected status feeds.

In any case, both of these use cases were part of the same problem: the lack of a uniform and open protocol for what’s called “delegated authentication”. Another useful metaphor that I’ve come to like is what John Panzer and Eran Hammer-Lahav used before him, that of a valet key:

OAuth is like a valet key for all your web services. A valet key lets you give a valet the ability to park your car, but not the ability to get into the trunk or drive more than 2 miles or limit the RPMs on your high end German automobile. In the same way, an OAuth key lets you give a web agent the ability to check your web mail but NOT the ability to pretend to be you and send mail to everybody in your address book.

Arguably the value of OAuth as a technological innovation goes beyond that. After all, anyone can implement their own valet key system that works in their own universe of vehicles. The harder part is actually the social and political work of getting everyone to buy in and follow the same design pattern, leading to interoperability between systems.

In fact that’s where we were before OAuth: Google had AuthSub, AOL had OpenAuth (OAuth’s former name, by the way), Yahoo had BBAuth and Flickr had FlickrAuth (not to mention Facebook Auth and Windows Live ID Web Authentication). Which meant that if you were an independent developer (like Matt Biddulph from Dopplr) you had to pick which auth system you wanted to support unless you had money and time coming out of your armpits, you’d code against all of them.

Of course, that’s not reality. And no one has the time or energy to maintain support for every protocol, so instead, most people take the easy way out and just ask for the veritable keys to all the different services you use:

Now, don’t get me wrong, this gets the job done. And it works. But it’s a really really really bad idea.

Not only are people being trained into thinking that it’s okay to fill in any form that looks like a Gmail login box on any old website (trusted or not) but it’s creating an untenable situation where, as a member of these various services, you have no way to control the access you’ve given away without changes your password — which in effect will disable every one of these sites that’s storing your credentials — forcing you to revisit every one of them and share with them your new username and password. What a crappy experience!

Fortunately, Flickr got it right a long time ago and set the bar for user experience. In their model, you can try out a bunch of tools that help you upload photos to the service or use off-site mashups that do cool things with your photos all without giving away your most valuable credentials: your username and password!

Instead, when you sign in to your account, Flickr will assign special keys called “tokens” to each application that wants to access your account. Flickr then lets you configure how much access you want to grant to each app and lets you revoke that access at any time. No changing your password, no running around to have to re-authenticate all the apps that you still want to use if you want to disable one of them.

OAuth takes that approach one step further and extracts the best practices from the popular authentication systems I mentioned above and turns it into one elegant, unified authentication protocol that anyone can implement. And, because it’s an open standard that we hope many people will adopt and replace their own proprietary authentication systems with, it should be a no-brainer for developers to use and to support, resulting in fewer sites that, with a straight face, continue to ask you for your username and password (oh, and yes, it is compatible with OpenID, with Google Accounts, with Yahoo Accounts and any other sign-in system — OAuth doesn’t dictate how you sign-in, only how you delegate authentication).

Even though we’re only releasing the first public draft today, we already have pledges from Ma.gnolia, Twitter, Pownce, Jaiku, Dopplr and others that they intend to implement the protocol.

If you want to get involved, join our mailing list, take a look at the OAuth libraries under development for PHP, Ruby, Python, C# and others. We plan to formally release the final version the OAuth Protocol v1.0 on Oct 1, so watch this space for more news until then.