It’s high time we moved to URL-based identifiers

Ugh, I had promised not to read TechMeme anymore, and I’ve actually kept to my promise since then… until today. And as soon as I finish this post, I’m back on the wagon, but for now, it’s useful to point to the ongoing Scoble debacle for context and for backstory.

In a nutshell, Robert Scoble has friends on Facebook. These friends all have contact information and for whatever reason, he wants to dump that data into Outlook, his address book of choice. The problem is that Facebook makes it nearly impossible to do this in an automated fashion because, as a technical barrier, email addresses are provided as opaque images, not as easily-parseable text. So Scoble worked with the heretofore “trustworthy” Plaxo crew (way to blow it guys! Joseph, how could you?!) to write a scraper that would OCR the email addresses out of the images and dump them into his address book. Well, this got him banned from the service.

The controversy seems to over whether Scoble had the right to extract his friends’ email addresses from Facebook. Compounding the matter is the fact that these email addresses were not ones that Robert had contributed himself to Facebook, but that his contacts had provided. Allen Stern summed up the issue pretty well: My Social Network Data Is Not Yours To Steal or Borrow. And as Dare pointed out, Scoble was wrong, Facebook was right.

Okay, that’s all well and fine.

You’ll note that this is the same fundamental design flaw of FOAF, the RDF format for storing contact information that preceded the purposely distinct microformats and :

The bigger issue impeding Plaxo’s public support of FOAF (and presumably the main issue that similar services are also mulling) is privacy: FOAF files make all information public and accessible by all, including the contents of the user’s address book (via foaf:knows).

Now, the concern today and the concern back in 2004 was the exposure of identifiers (email addresses) that can also be used to contact someone! By conflating contact information with unique identifiers, service providers got themselves in the untenable situation of not being able to share the list of identifiers externally or publicly without also revealing a mechanism that could be easily abused or spammed.

I won’t go into the benefits of using email for identifiers, because they do exist, but I do want to put forth a proposal that’s both long time in coming and long overdue, and frankly Kevin Marks and Scott Kveton have said it just as well as I could: URLs are people too. Kevin writes:

The underlying thing that is wrong with an email address is that its affordance is backwards — it enables people who have it to send things to you, but there’s no reliable way to know that a message is from you. Conversely, URLs have the opposite default affordance — people can go look at them and see what you have said about yourself, and computers can go and visit them and discover other ways to interact with what you have published, or ask you permission for more.

This is clearly the design advantage of OpenID. And it’s also clearly the direction that we need to go in for developing out distributed social networking applications. It’s also why OAuth is important to the mix, so that when you arrive at a public URL identifier-slash-OpenID, you can ask for access to certain things (like sending the person a message), and the owner of that identifier can decide whether to grant you that privilege or not. It no longer matters if the Scobles of the world leak my URL-based identifiers: they’re useless without the specific permissions that I grant on a per instance basis.

As well, I can give services permission to share the URL-based identifiers of my friends (on a per-instance basis) without the threat of betraying their confidence since their public URLs don’t reveal their sensitive contact information (unless they choose to publish it themselves or provide access to it). This allows me the dual benefit of being able to show up at any random web service and find my friends while not sharing information they haven’t given me permission to pass on to untrusted third parties.

So screen scrape factoryjoe.com all you want. I even have a starter hcard waiting for you, with all the contact information I care to publicly expose. Anything more than that? Well, you’re going to have to ask more politely to get it. You’ve got my URL, now, tell me, what else do you really need?

Author: Chris Messina

Inventor of the hashtag. #1 Product Hunter. Techmeme Ride Home podcaster. Ever-curious product designer and technologist. Previously: Google, Uber, Republic, YC W'18.

37 thoughts on “It’s high time we moved to URL-based identifiers”

  1. Top post. But, URLs and emails are leased – you never own them. Like the data issue, it’s not possessed, but representational, and thus, data belongs to the system, never the users.

    Date of Birth, phone numbers, age: they are all signifiers, not your ‘personal’ owed signs.

  2. I’m no coder – 😦 – but this makes perfect sense to me, Chris. I’m sure a lot of folks do a human-mediated version of this already: you’re not sure you should give Friend A’s e-mail to Acquaintance B, so you tell B to check out A’s site or blog. It’s helpful to the acquaintance, but it puts the control in your friend’s hands. But your suggestion is far better, because it would do this across the board and as a matter of routine protocol.

  3. I have to admit I don’t entirely understand what’s so morally objectionable about scraping and OCRing the e-mail addresses using software (aside from violating Facebook’s terms of service, but the fact that Facebook themselves scrape Gmail etc makes the ToS more than a little hypocritical). Scoble could have gathered that data himself; he would have had to click through to 5,000 pages and transcribe the address from the image himself, but he had permission to access the data. The argument that “I never gave Scoble permission to export MY e-mail address” doesn’t hold for me – when you friended Scoble, you gave him permission to view your e-mail address. If he automates that process how does that change the fact that you gave him that permission?

  4. I can appreciate the point that people needs to be afforded some control over their data, such as contact data, but i cannot agree that it is a design flaw of FOAF that the social network and contact details are encoded in the same vocabulary. I see vocabularies for data and control of that data as two orthogonal issues, the problem is not that you mix vocabularies, there should be a different mechanism for declaring policy about who is allowed to use those data.

    There has been a lot of work on that issue, e.g. http://www.policyawareweb.org/ but again, the problem is that the us in the Semantic Web community loves to create prototypes, but is not sufficiently concerned about the applications that people will actually use.

    So, the problem has a good and implemented solution, it just isn’t available on your desktop. 🙂

  5. You have it wrong. Foaf can be used to provide security (using openid for example), because it uses URLs as identifiers and the web service can return apporpriate representations depending on the person viewing the page, if they log in with OpenId. See some of what I write in
    http://blogs.sun.com/bblfish/entry/my_bloomin_friends

  6. Maybe it’s mostly the fact, that people feel that their email addresses are somehow protected in that Facebook Walled Garden. I actually wrote today about some of these questions on my blog. What would have happened if FB itself would have been more open from the beginning with it’s own export function?

    But it’s good this discussion is coming up now as it helps to define what is questionable and what isn’t. And what controls we might need.

    As for the FOAF vs. Microformats debate I think I agree the comment on the DiSo group as I think it’s simply a different way of formatting it. I could also imagine different FOAF files for different users to access depending on their permission settings (I also don’t really get the problems people have with FOAF, to me it seems to me somewhat clearer than Microformats which to me always look like sort of a hack as the repurpose styling information).

  7. The comments about FOAF in this post are misguided, and users on #swig have responded here – http://tinyurl.com/2f8ra5

    RDF doesn’t introduce any new privacy/security/trust issues, nor does not-RDF remove any. It’s all just representations in resources available over HTTP.

    I have been tempted to make a FOAF file that would be available behind an auth-wall using OpenID/OAuth etc.

  8. Chris-my response to you about using URLs vs. emails for identifiers is basically the same as my response to Brian Oberkirch who said we shouldn’t ask users for their raw credentials: I totally agree that’s where we want to go, I’m trying to help us get there, but we’re not there yet and I believe the best way to get there is to demonstrate value today rather than abstaining until the future arrives.

    The fact is: today email addresses are by far the dominant identifiers people use to “find their friends” on social sites. That’s why Facebook and many other sites let you import your e-mail address book to look up people you know, and that’s why people want to get those identifiers back out (in addition to the hopefully obvious point that I want to be able to email my friends using my email application!). So, as we discussed at IIW, we can start to show the benefits of an open social web with friends-list portability today by letting users take their email-based friends-lists and look up their friends on ther sites, and as more people start using OpenID and keeping profile URLs for their friends, we can transition over to a URL-based lookup, which is ultimately a much better solution as you correctly point out. But like I said to Brian, I don’t think we’ll get there faster by denying users access to data they can already get themselves with higher friction, we’ll just delay showing the true value of an open social web.

  9. I think Simon is technically right, but I also think there’s a sort of fuzzy fallacy in saying that things that were technically possible before are therefore OK when automated. Sometimes what appears to be only a quantitative gain (such as with regard to efficiency) is effectively a qualitative gain, changing the fundamental experience for the people involved.

  10. I have a foaf file and it uses a sha1 hash to obscure the email address of my friends. The hash can still be used at a unique identifier. The social network providers I use know my friends email addresses already they provided it to them when they joined. Foaf takes a little more work to figure out than microformats but the ability to expand your vocabularies beyond the original foaf vocabulary makes foaf/rdf a wonderful outlet for publishing information about yourself and your friends. I use my www home page as my openid but delagate it to my openid provider. I have setup auto discovery of my foaf file on my www home page. That seems to me to be the way to go for getting my information out there in a guarded but still public way. my openid provider allows people to send me emails without knowing my email address. This seems like a perfect world to me.

  11. Hi Chris, nice one again! I think technically you are right, but from a user perspective a url isn’t always practical. Not everyone has a url, I’ve actually asked my boss (Dutch Telecom Operator) to provide every Dutch person born with a free e-mail and url, but so far no luck 😉
    The underlying issue is not really about who owns the data, who cares? It’s is about letting the user become free again, allowing him to export things anywhere, and at the same time provide the user witht the responsibility to protect his personal data to a level he is comfortable with. I don’t mind people scraping data off of Facebook, as long as I get to decide which data I provide is scraped. So, better, transparent privacy controls (defaults to protecting the user), and breaking down the walled gardens (which is a business model issue really). This is my first wish for 2008, I have a few more, read them at my blog if you are interested.

  12. Chris is right, switching to URL based identifiers is essential to making progress here.

    A few notes regarding some of the comments.

    Publishing sha1 hashed email addresses is still a problem for both unexpected identity consolidation, and search based privacy attacks. See the microformats page on identity consolidation for more on this.

    The linked #swig response misses the point as well, which is that formats dictate and influence policy. It doesn’t matter what a format *can* be used for, if its very design lends it to abuse (e.g. conflating the publishing of a person’s profile information with all their friends’ profile information).

    This is why it is more important to keep formats and protocols, small, modular and relatively orthogonal, rather than trying to cram everything into a yet another side-file, especially one that unnecessarily reinvents vocabulary rather than reusing highly interoperable vocabulary (e.g. vCard).

  13. Chris, thank you for making stark the distinction between the email and url approaches. Also, I swung by to ask if you might help us get a wider online participation in our Minciu Sodas response to Kenya which has quickly set up a fantastic system whereby we’re sending money by Western Union to calm places in Kenya where people can assuredly pick up cash (in many places they couldn’t for several days) and then purchasing prepaid phone cards, loading up their phones and then sending airtime minutes to people in isolated places (this is a cool think about the Kenya mobile phone system), where they can barter them for food, medicine, transport. Furthermore, we’re encouraging such outreach to benevolent people in opposing tribes and focusing on the participants who agree that we publicly post their telephones and help us share news and reach out further. I’ll be improving our web system regarding that and any coders who might help would be much appreciated, see http://www.worknets.org/wiki.cgi?KenyansToCall

    Is telephone like email or like url? Maybe the fact that there’s a cost to the call and it’s paid by the caller is enough to make it more like a url? But there’s an interesting blur here.

  14. On Simon Willisons argument that “if it can be done by hand, so why not with software”:
    apart from Christian Crumlish’s point, there’s the fact that the people who trusted Scoble with their data _in the context of_ Facebook, would not necessarily have done so _in the context of Plaxo.
    In fact, they might not trust Plaxo with their data at all (or vice versa, this is not a judgement on Plaxo versus Facebook).

    This has been brilliantly formulated by Dare Obasanjo in a follow up post:
    http://www.25hoursaday.com/weblog/2008/01/07/BreakingTheSocialContractMyDataIsNotYourData.aspx

  15. A question:

    (your quote):
    “so that when you arrive at a public URL identifier-slash-OpenID, you can ask for access to certain things (like sending the person a message), and the owner of that identifier can decide whether to grant you that privilege or not”

    Johannes Ernst’s LID has messaging capability – I’m wondering why that never took off?

  16. @Pascal Van Hecke: Dunno… Timing? Marketing? Interface? Lack of developer interest? My guess is that it was just too soon for its own good, as many breakthru technologies often are.

  17. I was thinking a bit more about the issue of URL based identificators but I am really not sure that this is ready for mainstream. Everybody on the net probably has an email address and if you signup on any recent new web service you probably have to give your email address sort of as identification (to which some confirmation link is sent).

    To change this is IMHO quite a lot of work. Maybe not so much for us geeks but for mainstream internet. My parents do not have URLs or wouldn’t actually know which one their identification URL is and explaining that to them will be hard. So I really think that trying to somehow e.g. map emails to URIs or that email DNS proposal over at hueniverse might be things to consider.

    Of course there is the spam problem and you probably need to have your identifier somewhat public so it makes sense (as I had problems the other day when playing around with some xfn parser/fetcher of mine to actually map my friends profiles together as it wasn’t clear which profile belonged to which identity) and I honestly don’t have a solution for this. But still I see problems of mass adoptions if we don’t go where the people are.

    Maybe those identifiers should simply look like email addresses but never be able to receive mail but this defeats my point probably 😉

  18. You can basically use whatever kind of identifier you want to enable someone to access an account attached that verified identifier… whether you display it or not is up to relationship between the site and their member.

    So you could use credit card numbers, social security number, phone numbers, email addresses, URLs… doesn’t really matter… as long as someone (ideally ONE person) can prove that it’s them coming back to the site.

    What you call them or how you publicly refer to them on the site (alias, username, full name, etc) is beyond the scope of OpenID.

  19. This makes a lot of sense. There has to be many people that go through the process manually which i agree has to stop. I’ll keep thinkin about this.

Leave a comment