After 1984

iTunes Genius

iTunes 8 has added a new feature called “Genius” that harnesses the collective behavior of iTunes Music Store shoppers to generate “perfect” playlists.

Had an interesting email exchange with my mom earlier today about Monica Hesse’s story Bytes of Life. The crux of the story is that more and more people are self-monitoring and collecting data about themselves, in many cases, because, well, it’s gotten so much easier, so, why not?

Well, yes, it is easier, but just because it is easier, doesn’t automatically mean that one should do it, so let’s look at this a little more deeply.

First, my mom asked about the amount of effort involved in tracking all this data:

I still have a hard time even considering all that time and effort spent in detailing every moment of one’s life, and then the other side of it which is that it all has to be read and processed in order to “know oneself”. I think I like the Jon Cabot Zinn philosophy better — just BE in the moment, being mindful of each second doesn’t require one to log or blog it, I don’t think. Just BE in it.

Monica didn’t really touch on too many tools that we use to self-monitor. It’s true that, depending on the kind of data we’re collecting, the effort will vary. But so will the benefits.

MyMileMarkerIf you take a look at MyMileMarker’s iPhone interface, you’ll see how quick and painless it is to record this information. Why bother? Well, for one thing, over time you get to see not only how much fuel you’re consuming, but how much it’s going to cost you to keep running your car in the future:

View my Honda Civic - My Mile Marker

Without collecting this data, you might guess at your MPG, or take the manufacturer’s rating as given, but when you record what actually is happening, you can prove to yourself whether filling up your tires really does save you money (or the planet).

On the topic of the environment, recording my trips on Dopplr gives me an actual view of my carbon footprint (pretty damning, indeed):

DOPPLR Carbon

As my mom pointed out, perhaps having access to this data will encourage me to cut back excess travel — or to consolidate my trips. Ross Mayfield suggests that he could potentially quit smoking if his habit were made more plainly visible to him.

What’s also interesting is how passive monitors, or semi-passive monitoring tools, can also inform, educate or predict — and on this point I’m thinking of Last.fm where of course my music taste is aggregated, or location-based sites like Brightkite, where my locative behavior is tracked (albeit, manually — though Fire Eagle + Spot changes that).

My mom’s other point about the ability to just BE in the moment is also important — because self-tracking should ideally be non-invasive. In other words, it shouldn’t be the tracking that changes your behavior, but your analysis and reflection after the fact.

One of the stronger points I might make about this is that data, especially when collected regularly and when the right indicators are recorded, you can reduce a great amount of distortion from your self-serving biases. Monica writes:

“We all have the tendency to see our behaviors in a little bit of a halo,” says Jayne Gackenbach, who researches the psychology of the Internet at Grant MacEwan College in Alberta, Canada. It’s why dieters underestimate their food intake, why smokers say they go through fewer cigarettes than they do. “If people can get at some objective criteria, it would be wonderfully informative.” That’s the brilliance, she says, of new technology.

big-brotherSo that’s great and all, but all of this, at least for my mom, raises the spectre of George Orwell’s ubiquitous and all-knowing “Big Brother” from Nineteen Eighty-Four and neo-Taylorism:

I do agree that people lie, or misperceive, and that data is a truer bearer of actualities. I guess I don’t care. Story telling is an art form, too. There’s something sort of 1984ish about all this data collection – – as if the accumulated data could eventually turn us all into robotic creatures too self-programmed to suck the real juice out of life.

I certainly am sympathetic to that view, especially because the characterization of life in 1984 was so compelling and visceral. The problem is that this analogy invariably falls short, especially in other conversations when you’re talking about the likes of Google and other web-based companies.

In 1984, Big Brother symbolized the encroachment of the government on the life of the private citizen. Since the government had the ability to lock you up or take you away based on your behavior, you can imagine that this kind of dystopic vision would resonate in a time when increasingly fewer people probably understand the guts of technology and yet increasingly rely on it, shoveling more and more of their data into online repositories, or having it collected about them as they visit various websites. Never before has the human race had so much data about itself, and yet (likely) so little understanding.

The difference, as I explained to my mom, comes down to access to — and leverage over — the data:

I want to write more about this, but I don’t think 1984 is an apt analogy here. In the book, the government knows everything about the citizenry, and makes decisions using that data, towards maximizing efficiency for some unknown — or spiritually void — end. In this case, we’re flipping 1984 on its head! In this case we’re collecting the data on OURSELVES — empowering ourselves to know more than the credit card companies and banks! It’s certainly a daunting and scary thought to realize how much data OTHER people have about us — but what better way to get a leg up then to start looking at ourselves, and collecting that information for our own benefit?

I used to be pretty skeptical of all this too… but since I’ve seen the tools, and I’ve seen the value of data — I just don’t want other people to profit off of my behaviors… I want to be able to benefit from it as well — in ways that I dictate — on my terms!

In any case, Tim O’Reilly is right: data is the new Intel inside. But shouldn’t we be getting a piece of the action if we’re talking about data about us? Shouldn’t we write the book on what 2014 is going to look like so we can put the tired 1984 analogies to rest for awhile and take advantage of what is unfolding today? I’m certainly weary of large corporate behemoths usurping the role the government played in 1984, but frankly, I think we’ve gone beyond that point.

Musings on Chrome, the rebirth of the location bar and privacy in the cloud

Imagine a browser of the web, by the web, and for the web. Not simply a thick client application that simply opens documents with the http:// protocol instead of file://, but one that runs web applications (efficiently!), that plays the web, that connects people across the boundaries of the silos and gives them local-like access to remote data.

It might not be Chrome, but it’s a damn near approximation, given what people today.

Take a step back. You can see the relics of desktop computing in our applications’ file menus… and we can intuit the assumptions that the original designer must have made about the user, her context and the interaction expectations she brought with her:

Firefox Menubar

This is not a start menu or a Dock. This is a document-driven menubar that’s barely changed since Netscape Communicator.

Indeed, the browser is a funny thing, because it’s really just a wrapper for someone else’s content or someone’s else’s application. That’s why it’s not about “features“. It’s all about which features, especially for developers.

It’s a hugely powerful place to insert oneself: between a person and the vast expanse that is the Open Web. Better yet: to be the conduit through which anyone projects herself on to the web, or reaches into the digital void to do something.

So if you were going to design a new browser, how would you handle the enormity of that responsibility? How would you seize the monument of that opportunity and create something great?

Well, for starters, you’d probably want to think about that first run experience — what it’s like to get behind the wheel for the very time with a newly minted driver’s permit — with the daunting realization that you can now go anywhere you please…! Which is of course awesome, until you realize that you have no idea where to go first!

Historically, the solution has been to flip-flop between portals and search boxes, and if we’ve learned anything from Google’s shockingly austere homepage, it comes down to recognizing that the first step of getting somewhere is expressing some notion of where you want to go:

Camino. Start

InquisitorThe problem is that the location field has, up until recently, been fairly inert and useless. With Spotlight-influenced interfaces creeping into the browser (like David Watanabe’s recently acquired Inquisitor Safari plugin — now powered by Yahoo! Search BOSS — or the flyout in Flock that was inspired by it) it’s clear that browsers can and should provide more direction and assistance to get people going. Not everyone’s got a penchant for remembering URLs (or RFCs) like Tantek’s.

This kind of predictive interface, however, has only slowly made its way into the location bar, like fish being washed ashore and gradually sprouting legs. Eventually they’ll learn to walk and breath normally, but until then, things might look a little awkward. But yes, dear reader, things do change.

So you can imagine, having recognized this trend, Google went ahead and combined the search box and the location field in Chrome and is now pushing the location bar as the starting place, as well as where to do your searching:

Chrome Start

This change to such a fundamental piece of real estate in the browser has profound consequences on both the typical use of the browser as well as security models that treat the visibility of the URL bar as sacrosanct (read: phishing):

Omnibox

The URL bar is dead! Long live the URL bar!

While cats like us know intuitively how to use the location bar in combination with URLs to gets us to where to we want to go, that practice is now outmoded. Instead we type anything into the “box” and have some likely chance that we’re going to end up close to something interesting. Feeling lucky?

But there’s something else behind all this that I think is super important to realize… and that’s that our fundamental notions and expectations of privacy on the web have to change or will be changed for us. Either we do without tools that augment our cognitive faculties or we embrace them, and in so doing, shim open a window on our behaviors and our habits so that computers, computing environments and web service agents can become more predictive and responsive to them, and in so doing, serve us better. So it goes.

Underlying these changes are new legal concepts and challenges, spelled out in Google’s updated EULA and Privacy Policy… heretofore places where few feared to go, least of all browser manufacturers:

5. Use of the Services by you

5.1 In order to access certain Services, you may be required to provide information about yourself (such as identification or contact details) as part of the registration process for the Service, or as part of your continued use of the Services. You agree that any registration information you give to Google will always be accurate, correct and up to date.

. . .

12. Software updates

12.1 The Software which you use may automatically download and install updates from time to time from Google. These updates are designed to improve, enhance and further develop the Services and may take the form of bug fixes, enhanced functions, new software modules and completely new versions. You agree to receive such updates (and permit Google to deliver these to you) as part of your use of the Services.

It’s not that any of this is unexpected or Draconian: it is what it is, if it weren’t like this already.

Each of us will eventually need to choose a data brokers or two in the future and agree to similar terms and conditions, just like we’ve done with banks and credit card providers; and if we haven’t already, just as we have as we’ve done in embracing webmail.

Hopefully visibility into Chrome’s source code will help keep things honest, and also provide the means to excise those features, or to redirect them to brokers or service providers of our choosing, but it’s inevitable that effective cloud computing will increasingly require more data from and about us than we’ve previously felt comfortable giving. And the crazy thing is that a great number of us (yes, including me!) will give it. Willingly. And eagerly.

But think one more second about the ramifications (see Matt Cutts) of Section 12 up there about Software Updates: by using Chrome, you agree to allow Google to update the browser. That’s it: end of story. You want to turn it off? Disconnect from the web… in the process, rendering Chrome nothing more than, well, chrome (pun intended).

Welcome to cloud computing. The future has arrived and is arriving.

OAuth for the iPhone: Pownce.app

Pownce OAuth flow Step 1

If you’re one of the lucky folks that’s been able to upgrade your iPhone (and activate it) to the 2.0 firmware, I encourage you to give the Pownce application a try, if only to see a real world example of OAuth in action (that link will open in iTunes).

Here’s how it goes in pictures:

Pownce OAuth flow Step 1 Pownce OAuth flow Step 2 Pownce OAuth flow Step 3 Pownce OAuth flow Step 4/Final

And the actual flow:

  1. Launch the Pownce app. You’ll be prompted to login in at Pownce.com
  2. Pownce.app launches Pownce.com via an initial OAuth request; here you signin to your Pownce account using your username or password (if Pownce supported OpenID, you could signin with OpenID as well).
  3. Once successfully signed in to your account, you can grant the Pownce iPhone app permission to access your account.
  4. Once you click Okay, which is basically a pownce:// protocol link that will fire up Pownce.app to complete the transaction.

There are three important aspects of this:

  • First, you’re not entering your username and password into the Pownce application — you’re only entering it into the website. This might not seem like a great distinction, but if a non-Pownce developed iPhone application wanted to access or post to your Pownce account, this flow could be reused, and you’d never need to expose your credentials to that third party app;
  • Second, it creates room for the adoption of OpenID — or something other single sign-on solution — to be implemented at Pownce later on, since OAuth doesn’t specify how you do authentication.
  • Third, if the iPhone is lost or stolen, the owner of the phone could visit Pownce.com and disable access to their account via the Pownce iPhone app — and not need to change their password and disrupt all the other services or applications that might already have been granted access.

Personally, as I’ve fired up an increasing number of native apps on the iPhone 2.0 software, I’ve been increasingly frustrated and annoyed at how many of them want my username and password, and how few of them support this kind of delegated authorization flow.

If you consider that there are already a few Twitter-based applications available, and none of them support OAuth (Twitter still has yet to implement OAuth), in order to even test these apps out, you have to give away your credentials over and over again. Worse, you can guarantee that a third-party will destroy your credentials once you’ve handed them over, even if you uninstall the application.

These are a few reasons to consider OAuth for iPhone application development and authorization. Better yet, Jon Crosby’s Objective-C library can even give you a head start!

Hat tip to Colin Devroe for the suggestion. Cross-posted to the OAuth blog.

Feature request: OAuth in WordPress

Twitter / photomatt: @factoryjoe I would like OA...

In the past couple days, there’s been a bit of a dust-up about some changes coming to WordPress in 2.6 — namely disabling ATOM and XML-RPC APIs by default.

The argument is that this will make WordPress more secure out of the box — but the question is at what cost? And, is there a better solution to this problem rather than disabling features and functionality (even if only a small subset of users currently make use of these APIs) if the changes end up being short-sighted?

This topic hit the wp-xmlrpc mailing list where the conversation quickly devolved into spattering about SSL and other security related topics.

Allan Odgaard (creator TextMate, as far as I can tell!) even proposed inventing another authorization protocol.

Sigh.

There are a number of reasons why WordPress should adopt OAuth — and not just because we’re going to require it for DiSo.

Heck, Stephen Paul Weber already got OAuth + AtomPub working for WordPress, and has completed a basic OAuth plugin for WordPress. The pieces are nearly in place, not to mention the fact that OAuth will pretty much be essential if WordPress is going to adopt OpenID at some point down the road. It’s also going to be quite useful if folks want to post from, say, a Google Gadget or OpenSocial application (or similar) to a WordPress blog if the XML-RPC APIs are going to be off by default (given Google’s wholesale embrace of OAuth).

Now, fortunately, folks within Automattic are supportive of OAuth, including Matt and Lloyd.

There are plenty of benefits to going down this path, not to mention the ability to scope third party applications to certain permissions — like letting Facebook see your private posts but not edit or create new ones — or authorizing desktop applications to post new entries or upload photos or videos without having to remember your username and password (instead you’d type in your blog address — and it would discover the authorization endpoints using XRDS-SimpleEran has more on discovery: Magic, People vs. Machines).

Anyway, WordPress and OAuth are natural complements, and with popular support and momentum behind the protocol, it’s tragic to see needless reinvention when so many modern applications have the same problem of delegated authorization.

I see this is a tremendous opportunity for both WordPress and OAuth and am looking forward to discussing this opportunity — at least consideration for WordPress 2.7 — and tonight’s meetup — for which I’m now late! Doh!

The Social Web TV pilot episode

http://www.viddler.com/player/2cf46be8/

My buddies John McCrea, Joseph Smarr have started up a show called The Social Web and have released the pilot episode, featuring David Recordon on the hubbub between Google and Facebook following last week’s Supernova Conference.

As they point out, things are changing and happening so fast in the industry that a show like this, that cuts through the FUD and marketing hype is really necessary. I hope to participate in future episodes — and would love to hear suggestions or recommendations for topics or guests for upcoming episodes.

Here’s the FriendFeed room Dave mentioned.

Announcing Emailtoid: mapping email addresses to OpenIDs

EmailtoidThe other night at Beer and Blog in Portland, fellow Vidooper Michael T Richardson announced and launched a new service that I’m both excited and a little apprehensive about.

The service is called Emailtoid, and while I prefer to pronounce is “email-toyed”, others might pronounce it “email two eye-dee”. And depending on your pronunciation, you might realize that this service is about using an email address as an ID — specifically an OpenID.

This is not a new idea, and it’s one that been debated and discussed in the OpenID community an awful lot, which culminated in a rough outline of how it might work by Brad Fitzpatrick following the Social Graph FOO Camp this past spring, and that David Fuelling turned into an early draft spec.

Well, we looked at this work and this discussion and felt that sooner or later, in spite of all the benefits of using actual URLs for identity, that someone needed to take a lead and actually build out this concept so we have something real to banter about.

The pragmatic reality is that many people are comfortable using email addresses as their identity online for signing up to new services; furthermore, many, many more people have email addresses who don’t also have URLs or homepages that they call their own (or can readily identify). And forcing people to learn yet another form of identifier for the web to satisfy the design of a protocol for arguably marginal value with a lesser user experience also doesn’t make sense. Put another way: the limitations of the technology should not be forced on end users, especially when it doesn’t need to be. And that’s why Emailtoid is a necessary experiment towards advancing identity on the web.

How it works

Emailtoid is a very simple service, and in fact is designed for obsolescence. It’s meant as a fallback for now, enabling relying parties to accept email addresses as identifiers without requiring the generation of a new local password and without requiring the address owner to give up or reveal their existing email credentials (otherwise known as the “password anti-pattern“).

Enter your email - Emailtoid

The flow works like this:

  1. Users enter either an OpenID or email address into a typical OpenID input field. For the purpose of this flow, we’ll presume an email address is used.
  2. The relying party splits email addresses at the ‘@’ symbol into the username and the domain, generating a directed identity request to the email domain. If an XRDS, YADIS or XRDS-Simple document is discovered at the domain, the typical OpenID flow is invoked.
  3. If no discovery document is found, the service falls back to Emailtoid (sending a request like http://emailtoid.net/mapper?email=jane@example.com), where users verify that they own the supplied email addresses by providing their one-time access token that Emailtoid mailed to them.
  4. At this point, users may optionally associate an existing OpenID with their email address, or use the OpenID auto-generated by Emailtoid. Emailtoid is not intended to serve as a full-featured OpenID provider, and we encourage using an OpenID from a third-party OpenID provider.
  5. In the case where users supply and verify their own OpenID, Emailtoid will create a 302 HTTP redirect removing Emailtoid from future interactions completely.

Should an email provider supply a discovery document after an Emailtoid mapping has been made, the new mapping will take precedence.

Opportunities and issues

The drive behind Emailtoid, again, is to reduce the friction of OpenID by reusing familiar identifiers (i.e. email addresses). Clearly the challenges of achieving OpenID adoption are not simply technological, and to a great degree rely on how the user experience needs to become more streamlined and deliver on the promise of greater security and convenience.

Therefore, if a service advertises that they support signing in with an email address, they must keep that promise.

Unfortunately, until all email providers do some kind of local resolution and OpenID authentication, we will need a centralized mapper such as Emailtoid to provide the fallback mapping. And therein lies the rub, defeating some of the distributed design of OpenID.

If anything, Emailtoid is intended to drive forward a conversation about the experience of OpenID, and about how we can make the protocol compatible with, or complementary to, existing and well-known means of identifying oneself on the web. Is it a final solution? Probably not — but it’s up, it’s running, it works and it forces us now to look critically at the question of emails as OpenIDs, now that we can actually experience the flow, and the feeling, of entering an email address into an OpenID box without ever having to enter, or create, another unnecessary password.

A conversation about social network interop and activity stream relevance

Brian Oberkirch captured some video today of a conversation between him, David Recordon and myself at GSP East about social network interop, among other things.

Hot on the heals on my last post, this conversation is rather timely!

Adding richness to activity streams

This is a post I’ve wanted to do for awhile but simply haven’t gotten around to it. Following my panel with Dave Recordon (Six Apart), Dave Morin (Facebook), Adam Nash (LinkedIn), Kevin Chou (Watercooler, Inc) and Sean Ammirati (ReadWriteWeb) on Social Networks and the NEED for FEEDs, it only seems appropriate that I would finally get this out.

The basic premise is this: lifestreams, alternatively known as “activity streams”, are great for discovering and exploring social media, as well as keeping up to date with friends (witness the main feature of Facebook and the rise of FriendFeed). I suggest that, with a little effort on the publishing side, activity streams could become much more valuable by being easier for web services to consume, interpret and to provide better filtering and weighting of shared activities to make it easier for people to get access to relevant information from people that they care about, as it happens.

By marking up social activities and social objects, delivered in standard feeds with microformats, I think we enable anyone to run a FriendFeed-like service that innovates and offers value based on how well it understands what’s going on and what’s relevant, rather than on its compatibility with any and every service.

Contemporary example activities

Here are the kinds of activities that I’m talking about (note that some services expand these with thumbnail previews):

  • Eddie updated his resume at LinkedIn.
  • Chris listened to “I Will Possess Your Heart” by Death Cab for Cutie on Pandora.
  • Brynn favorited a photo on Flickr.
  • Dave posted a message to Twitter via SMS.
  • Gary poked Kastner.
  • Leah bought The Matrix at Amazon.com.

Prior art

Both OpenSocial and Facebook provide APIs for creating new activities that will show up in someone’s activity stream or newsfeed.

Movable Type and the DiSo Project both have Action Stream plugins. And there are countless related efforts. Clearly there’s existing behavior out there… but should we go about improving it, where the primary requirement is a title of an action, and little, if any, guidance on how to provide more details on a given activity?

Components of an activity

Not surprisingly, a lot of activities provide what all good news stories provide: the who, what, when, where and sometimes, how.

Let’s take a look at an example, with these components called out:

e.g. Chris started listening to a station on Pandora 3 hours ago.

  • actor/subject (noun/pronoun)
  • action (verb)
  • social object (noun)
  • where (place)
  • when (time)
  • (how the object was created)
  • (expanded view of object)

Now, I’ll grant that not all activities follow this exact format, but the majority seem to.

I should point out one alternative: collective actions.

e.g. Chris and Dave Morin are now friends.

…but these might be better created as a post-processing step once we add the semantic salt to the original updates. Maybe.

Class actions

One of the assumptions I’m making is that there is some regularity and uniformity in activity streams. Moreover, there have emerged some basic classes of actions that appear routinely and that could be easily expressed with additional semantics.

To that end, I’ve started compiling such activities on the DiSo wiki.

Once we have settled on the base set of classes, we can start to develop common classnames and presentation templates. To start, we have: changed status or presence, posted messages or media, rated and favorited, friended/defriended, interacted with someone (i.e. “poking”), bookmarked, and consumed something (attended…, watched…, listened to…).

Combining activities with bundling

The concept of bundling is already present in OpenSocial and works for combining multiple activities of the same kind into a group:

FriendFeed Activity Bundling

This can also be used to bundle different kinds of activities for a single actor:

e.g. Chris watched The Matrix, uploaded five photos, attended an event and became friends with Dave.

From a technical perspective, bundling provides a mechanism for batching service-to-service operations, as defined in PaceBatch.

Bundling is also useful for presenting paged or “continued…” activities, as Facebook and FriendFeed do.

Advanced uses

I’d like to describe two advanced uses that inherit from my initial proposal for Twitter Hashtags: filtering and creating a distributed track-like service.

In the DiSo model, we use (will use) AtomPub (and someday XMPP) to push new activities to people who have decided to follow different people. Because the model is push-based, activities are delivered as they happen, to anyone who has subscribed to receive them. On the receiving end, this means that we can filter based on any number of criteria, such as actor, activity type, content of the activity (as in keywords or tags), age of the action, location or how an activity was created (was this message auto-generated from Brightkite or sent in by SMS?) or any combination therein.

This is useful if you want to follow certain activities of your friends more closely than others, or if you only care about, say, the screenshots I upload to Flickr but not the stuff I tweet about.

Tracking can work two ways: where your own self-hosted service knows how to elevate certain types of received activities which are then passed to your messaging hub and routed appropriately… for example, when Mom checks in using Brightkite at the airport (or within some distance radius).

On the other hand, individuals could choose to publish their activities to some third-party aggregator (like Summize) and do the tracking for individuals, pushing back activities that it discovers that matches criteria that you set, and then forwarding those activities to your messaging hub.

It might not have the legs that a centralized service like Twitter has, especially to start, but if Technorati were looking for a new raison d’etre, this might be it.

This is a 30,000 foot view

I was scant on code in this post, but given how long it was already, I’d rather just start throwing it into the output of the activity streams being generated from the Action Streams plugins and see how live code holds up in the wild.

I also don’t want to confuse too many implementation details with the broader concept and need, which again is to make activity streams richer by standardizing on some specific semantics based on actual trends.

I’d love feedback, more pointers to prior art, or alternative suggestions for how any of the above could be technically achieved using open technologies.

Thoughts on dynamic privacy

A highly touted aspect of Facebook Connect is the notion of “dynamic privacy“:

As a user moves around the open Web, their privacy settings will follow, ensuring that users’ information and privacy rules are always up-to-date. For example, if a user changes their profile picture, or removes a friend connection, this will be automatically updated in the external website.

Over the course of the Graphing Social Patterns East conference here in DC, Dave Morin and others from Facebook’s Developer Platform have made many a reference to this scheme but have provided frustratingly scant detail on how it will actually work.

Friend Connect - Disable by Facebook

In a conversation with Brian Oberkirch and David Recordon, it dawned on me that the pieces for Dynamic Privacy are already in place and that, to some degree, it seems that it’s really just a matter of figuring out how to effectively enforce policy across distributed systems in order to meet user expectations.

MySpace actually has made similar announcements in their Data Availability approach, and if you read carefully, you can spot the fundamental rift between the OpenSocial and Facebook platforms:

Additionally, rather than updating information across the Web (e.g. default photo, favorite movies or music) for each site where a user spends time, now a user can update their profile in one place and dynamically share that information with the other sites they care about. MySpace will be rolling out a centralized location within the site that allows users to manage how their content and data is made available to third party sites they have chosen to engage with.

Indeed, Recordon wrote about this on O’Reilly Radar last month (emphasis original):

He explained that MySpace said that due to their terms of service the participating sites (e.g. Twitter) would not be allowed to cache or store any of the profile information. In my mind this led to the Data Availability API being structured in one of two ways: 1) on each page load Twitter makes a request to MySpace fetching the protected profile information via OAuth to then display on their site or 2) Twitter includes JavaScript which the browser then uses to fill in the corresponding profile information when it renders the page. Either case is not an example of data portability no matter how you define the term!

Embedding vs sharing

So the major difference here is in the mechanism of data delivery and how the information is “leased” or “tethered” to the original source, such that, as Morin said, “when a user deletes an item on Facebook, it gets deleted everywhere else.”

The approach taken by Google Gadgets, and hence OpenSocial, for the most part, has been to tether data back to the source via embedded iframes. This means that if someone deletes or changes a social object, it will be deleted or changed across OpenSocial containers, though they won’t even notice the difference since they never had access to the data to begin with.

The approach that seems likely from Facebook can be intuited by scouring their developer’s terms of service (emphasis added):

You can only cache user information for up to 24 hours to assist with performance.

2.A.4) Except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any Facebook Properties not explicitly identified as being storable indefinitely in the Facebook Platform Documentation within 24 hours after the time at which you obtained the data, or such other time as Facebook may specify to you from time to time;

2.A.5) You may store and use indefinitely any Facebook Properties that are explicitly identified as being storable indefinitely in the Facebook Platform Documentation; provided, however, that except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any such Facebook Properties: (a) if Facebook ceases to explicitly identify the same as being storable indefinitely in the Facebook Platform Documentation; (b) upon notice from Facebook (including if we notify you that a particular Facebook User has requested that their information be made inaccessible to that Facebook Platform Application); or (c) upon any termination of this Agreement or of your use of or participation in Facebook Platform;

2.A.6) You may retain copies of Exportable Facebook Properties for such period of time (if any) as the Applicable Facebook User for such Exportable Facebook Properties may approve, if (and only if) such Applicable Facebook User expressly approves your doing so pursuant to an affirmative “opt-in” after receiving a prominent disclosure of (a) the uses you intend to make of such Exportable Facebook Properties, (b) the duration for which you will retain copies of such Exportable Facebook Properties and (c) any terms and conditions governing your use of such Exportable Facebook Properties (a “Full Disclosure Opt-In”);

2.B.8) Notwithstanding the provisions of Sections 2.B.1, 2.B.2 and 2.B.5 above, if (and only if) the Applicable Facebook User for any Exportable Facebook Properties expressly approves your doing so pursuant to a Full Disclosure Opt-In, you may additionally display, provide, edit, modify, sell, resell, lease, redistribute, license, sublicense or transfer such Exportable Facebook Properties in such manner as, and only to the extent that, such Applicable Facebook User may approve.

This is further expanded in the platform documentation on Storable Information:

Per the Developer Terms of Service, you may not cache any user data for more than 24 hours, with the exception of information that is explicitly “storable indefinitely.” Only the following parameters are storable indefinitely; all other information must be requested from Facebook each time.

The storable IDs enable you to keep unique identifiers for Facebook elements that correspond to data gathered by your application. For instance, if you collected information about a user’s musical tastes, you could associate that data with a user’s Facebook uid.

However, note that you cannot store any relations between these IDs, such as whether a user is attending an event. The only exception is the user-to-network relation.

I imagine that Facebook Connect will work by “leasing” or “sharing” information to remote sites and require, through agreement and compliance with their terms, to check in periodically (or to receive directives through a push mechanism) for changes to data, and then to flush caches of stored data every 24 hours or less.

In either model there is still a central provider and store of the data, but the question for implementation really comes down to whether a remote site ever has direct access to the data, and if so, how long it is allowed to store it.

Of note is the OpenSocial RESTful API, which provides a web-friendly mechanism for addressing and defining resources. Recordon pointed out to me that this API affords all the mechanisms necessary to implement the “leased” model of data access (rather than the embedded model), but leaves it up to the OpenSocial applications and containers to set and enforce their own data access policies.

…Which is a world of a difference from Facebook’s approach to date, for which there is neither code nor a spec nor an open discussion about how they’re thinking through the tenuous issues imbued in making decisions around data access, data control, “tethering” and “portability“. While folks like Plaxo and Yahoo are actually shipping code, Facebook is still posturing, assuring us to “wait and see”. With something so central and so important, it’s disheartening that Facebook’s “Open” strategy is anything but open, and everything less than transparent.

Inventing contact schemas for fun and profit! (Ugh)

And then there were three.

Today, Yahoo! announced the public availability of their own Address Book API. Though Plaxo and LinkedIn have been using this API behind the scenes for a short while, today marks the first time the API is available for anyone who registers for an App ID to make use of the bi-directional protocol.

The API is shielded behind Yahoo! proprietary BBAuth protocol, which obviates the need to request Yahoo! member credentials at the time of import initiation, as seen in this screenshot from LinkedIn (from April):

LinkedIn: Expand your network

Now, like Joseph, I applaud the release of this API, as it provides one more means for individuals to have utter control and access to their friends, colleagues and contacts using a robust protocol.

However, I have to lament yet more needless reinvention of contact schema. Why is this a problem? Well, as I pointed out about Facebook’s approach to developing their own platform methods and formats, having to write and debug against yet another contact schema makes the “tax” of adding support for contact syncing and export increasingly onerous for sites and web services that want to better serve their customers by letting them host and maintain their address book elsewhere.

This isn’t just a problem that I have with Yahoo!. It’s something that I encountered last November with the SREG and proposed Attribute Exchange profile definition. And yet again when Google announced their Contacts API. And then again when Microsoft released theirs! Over and over again we’re seeing better ways of fighting the password anti-pattern flow of inviting friends to new social services, but having to implement support for countless contact schemas. What we need is one common contacts interchange format and I strongly suggest that it inherit from vcard with allowances or extension points for contemporary trends in social networking profile data.

I’ve gone ahead and whipped up a comparison matrix between the primary contact schemas to demonstrate the mess we’re in.

Below, I have a subset of the complete matrix to give you a sense for where we’re at with OpenSocial (né GData), Yahoo Address Book API and Microsoft’s Windows Live Contacts API, and include vcard (RFC 2426) as the cardinal format towards which subsequent schemas should converge:

vcard OpenSocial 0.8 Windows Live Contacts API Yahoo Address Book API
UID uid url id cid cid
Nickname nickname nickname NickName nickname
Full Name n or fn name NameTitle, FirstName, MiddleName, LastName, Suffix name
First name n (given-name) given_name FirstName name (first)
Last name n (family-name) family_name LastName name (last)
Birthday bday date_of_birth Birthdate birthday (day, month, year)
Anniversary Anniversary anniversary (day, month, year)
Gender gender gender gender
Email email email Email (ID, EmailType, Address, IsIMEnabled, IsDefault) email
Street street-address street-address StreetLine street
Postal Code postal-code postal-code PostalCode zip
City locality locality
State region region PrimaryCity state
Country country-name country CountryRegion country
Latitutude geo (latitude) latitude latitude
Longitude geo (longitude) longitude longitude
Language N/A N/A
Phone tel (type, value) phone (number, type) Phone (ID, PhoneType, Number, IsIMEnabled, IsDefault) phone
Timezone tz time_zone TimeZone N/A
Photo photo thumbnail_url N/A
Company org organization.name CompanyName company
Job Title title, role organization.title JobTitle jobtitle
Biography note about_me notes
URL url url URI (ID, URIType, Name, Address) link
Category category, rel-tag tags Tag (ID, Name, ContactIDs)