Larry (@lhalff) and I have been recording a podcast for the past year called Citizen Garden that covers various topics related to the web, technology, and social networking.
Well, given Ma.gnolia’s recent catastrophe, we decided that episode 11 would dedicated to exactly what went down and why, and what lessons Larry has learned that others should heed in order to avoid facing a similar crisis.
I think the basic take-away is that, four years ago, when Larry started Ma.gnolia, your IT options were pretty much to use commodity shared hosting or to do it yourself. If you used Ruby on Rails — in which Ma.gnolia is written — your options were even more limited. And so Larry chose to do it himself.
Today, with services like Amazon S3 & EC2, Joyent Accelerators and Google AppEngine, reliable, scalable hosting is no longer as much a problem, as these services have risen to meet the needs of applications like Ma.gnolia. But these are services that Larry did not take complete advantage of and the burden of taking care of over half a terabyte of data eventually caught up with him.
All is not lost necessarily, and Larry hopes that Ma.gnolia will someday return, perhaps as an invite-only service to start, in order to give him time to earn back people’s trust and scale the service slowly. I’m also confident that he’s decided to completely outsource his IT, taking the lessons from this current situation deeply to heart.
This episode is also downloadable as an MP3.
74 Comments
On just 4 Mac minis amazing!
Make sure you employ a pessimist.
Wow. I like his honesty.
But then I use Delicious.
Hi guys – thanks for the discussion. Can you recommend outsourcing options for start-ups?
What should go out, what shouldn’t?
Cheers,
-Robin.
I think it’d be interesting to have a site–kind of like http://highscalability.com but focused on the smaller guys–where startups or web app creators could disclose the infrastructure they use to power their sites.
There could even be a sort of rating system to help potential users determine the reliability or disaster preparedness of sites before “investing” in them.
It might even serve as an incentive for some startups to get their stuff together so that they can “win over” new users.
Just an idea.
Great podcast, and I really appreciate Larry’s honesty and humility. I agree that it’s really amazing and telling that such great services can be built with (relatively) inexpensive off the shelf hardware. Good luck with Ma.gnolia 2.0.
I just wanted to take a moment to say thank you for doing this. I really miss ma.gnolia and would love for it to come back.
Great job putting this together.
This is a fabulous discussion, I really appreciate your candor and tips. I’m reminded of a case I read about the creator of MultiMaps, a UK start-up similar to Mapquest. He had the server in his bedroom. I’m so glad I have Amazon available to me nowadays.
thanks Chris and Larry for this post – very interesting. What the interview seems to say is that the hardware and disks were working fine and this corruption is a problem with MySQL 5. Can you say more about this? Larry says he made a ‘huge mistake’ but he also says he doesnt know what went wrong – he’s waiting on recovery experts.
Using cloud services is a great idea but self-hosting has its advantages. Storage is cheap and having it local will always be faster than on the other end of the Internet. Using four mac-minis as the webservers is awesome since pre-corruption, the site was running fine.
Also, whats the 2^8th doing on the wall?
Delicious isn’t the heavyweight because of Yahoo’s involvement, it’s there because it has legacy, credibility and stability. Those of us who used it when it was del.icio.us trusted it then and trust it now. I did use ma.gnolia but it never had ’stickability’ for me.
If nothing else this frank and illuminating interview reveals the ’smoke and mirrors’ nature of emerging social technology.
Better late than never. Thanks Larry for granting us insight into the loss of our bookmarks. It’s good to hear you take responsibility and provide details. I’m really looking forward to using ma.gnolia again!
Outsourcing is about the worst mistake you can make. Your best bet is to hire a unix geek, one who built a home network incorporating as many enterprise features as they could “for fun.” Even a few years ago when magnol.ia started there were VPS’s and managed hosting with automated backups. A good unix geek will know about those and be able to find one at a reasonable price.
Outsourcing means you pay more everytime you need something different. Having your own unix geek on staff means you have someone who has a vested interest in keeping things going at lower cost, who can identify when your current resources are sufficient to do that new thing you want to do, and most importantly can identify when you can turn things costing you money off.
This is excellent! Thank you for sharing, lessons learned are one of the rare gems that are not shared enough or listened to.
There are also great nuggets of information in this are a broad array of subjects.
I am missing Ma.gnolia, but I like the idea of an iTunes Genius for bookmarks to aggregate link lists (similar to DevonThink, but in my context) and other new related materials I have not seen.
I am with Don Park. Very curious as to what happened since you hinted in the video that it wasn’t hardware and it was due to some data corruption.
Larry, could you at least let us know if you felt it was due to some MySQL specific issues?
the burden of half a terabyte of data.
i understand that burden. i have half a terabyte of data in my macbook although soon i’ll nearly double that with a second drive to replace my optical drive. i use time machine to back it up onto 1.5 terabyte drives.
i am a professional, but if all my data was lost, it’d just be my ass, few people would scream at me. most everything important is somewhere out there. obviously, if i was taking care of other people’s data, i’d take pains to ensure it was safe.
i don’t get it. half a terabyte of data. no backups.
my own personal site uses a mysql database over 130mb in size. it is backed up every day.
i guess half a terabyte of data is just so much bigger… :p
come back magnolia!
I think it is inescapable – there was a level of negligence in developers rushing an application to production without any IT engineering expertise involved. IT engineers, the guy recommending the On Staff Unix geek, the guy talking about backing up his 150GB mysql DB daily, can simply see through this. Those of us who do this professionally for enterprise class organizations know, there are no excuses for failure to protect data – there is generally simply job termination. I suppose, in a bizzare sense, Larry terminated himself.
This is not “griefing” or being “unduly negative” – it is how IT engineering *operates*. There are certain skillsets you need to bring to the table when you’re dealing hands-on with production environments. This is why developers should stick to the scratchbox machines and not have production access.
I can see a challenge here – and it is complex. Who regulates who, how, and when a developer with a good idea makes it available to the public, and what is that developer’s liability? We don’t want to stifle innovation, but there needs to be responsibility and accountability as well. Outsourcing sucks, as an IT professional, I can imagine why you don’t want to outsource to the server version of a “puppy mill” where a bunch of nameless IT engineers babysit your server with no real involvement in what it is or what it does. But as a startup, you can’t afford a guy like me to be on staff, can you?
A good place to start, though, is knowing your limitations, and I believe Larry didn’t know his until TOO late. He might have wanted to partner up with an IT professional. Ultimately, that is the problem. Ma.gnolia was running in production but was only 75% thought out. This illustrates the basic failure of the cloud, and while fat, local computing will still remain the most viable, frequent model of computing as we move forward.
Very Informative and Knowledgeable Interview.and good tips for new born companies. i forward this interview for my all friends and my orkut list.
Regards
Nilma Azeem
I just wanted to take a moment to say thank you for doing this.
The short version: the file system got corrupted. The backup was just a file-sync over a firewire network to another machine. Meaning the bad data was backed up and presumably overwriting the older, good data. They had a RAID but the problem was a software filesystem so the errors just got stored.He seems to understand how terrible of a design decision he made in regards to the back-up system, and he appears physically affected when having to admit, publicly, the details of the infrastructure (or lack thereof) that caused this.
This comment was originally posted on Hacker News
A one-liner to add insult to injury:sed ’s/rsync/rdiff-backup/g’ <bin/my-backup.sh >bin/my-real-backup.sh
This comment was originally posted on Hacker News
Discussion starting over at earlier submission:http://news.ycombinator.com/item?id=483320
This comment was originally posted on Hacker News
After watching the vid I have to take that back. Apparently there was a SQL database involved…But rdiff is highly recommended nonetheless.
This comment was originally posted on Hacker News
After watching the vid I have to take that back. Apparently there was a SQL database involved…
This comment was originally posted on Hacker News
If you have any kind of staging/testing server I’d highly recommend using your production backups to populate that on a regular basis. That way you test your new code releases with real data, and you know that your backups work.
This comment was originally posted on Hacker News
If you were to start a web application from scratch, how would you deal with important database backups?
This comment was originally posted on Hacker News
Anyone know how rdiff-backup compares to duplicity?I know duplicity has an option to turn off encryption, if one wants to remove that overhead…
This comment was originally posted on Hacker News
http://forums.theplanet.com/index.php?showtopic=91115&vi…This looks like a simple and safe way of handling repeated MySQL backups.
This comment was originally posted on Hacker News
This is what I do. Using Amazon’s EC2+EBS makes it dead simple. Every time we test a release I setup a staging env (from scratch) and restore the latest EBS snapshot. That way I’m testing both the database backup and that I’ve kept all of the server images up to date as well. It doesn’t take much time either (about 30 minutes) and is done weekly.
This comment was originally posted on Hacker News
My recommendation for basic backup needs: rsnapshot. I backup our public server to our internal network as well as my desktop machine to an encrypted portable drive using it: http://www.rsnapshot.org/It’s probably similar to rdiff-backup, which I haven’t used. If you’re fine with daily or hourly backups and don’t have too much data (<100 GB), rsnapshot together with regular SQL dumps works fine.
This comment was originally posted on Hacker News
Yeah, I use a similar approach. However, for ma.gnolia we are talking about a database reaching half a terabyte (unless I misunderstood), so my question was more like: what do you people consider the best way to approach database backup so that it’s sustainable, scalable and the most disaster proof? Got any testimoniance (whether personal or of public companies), or case study?
This comment was originally posted on Hacker News
Quick cherry bomb to lob into this conversation: populating insecure test servers with sensitive production data is a classic web app company security failure. It probably doesn’t matter for you, but be cognizant of it.
This comment was originally posted on Hacker News
rdiff-backup keeps a live version of the filesystem available, in addition to backups. This means a full restore is just a `cp -a` operation.FWIW, I’ve used both, and I like the opacity of the duplicity backups, since I store them on S3. If you are syncing to a nearby disk, though, then you might like rdiff-backup better.
This comment was originally posted on Hacker News
He had RAID and was doing filesystem level backup, ie. copying over the entire Mysql DB file. When filesystem-level corruption occured, the backup script overwrote a good (perhaps 1 day old) backup file with a corrupted file, so he’s backup was worthless.The first thing that comes to mind is that he could have used application-level backup, ie. Mysql. The script would have noticed that the DB is corrupted because reads (SELECT) would have failed, and the backup script would have stopped and sent him an email to restore the good backup file.
If he used a cloud service like Amazon SimpleDB, he wouldn’t have to worry about filesystem-level corruption, because that’s abstracted away by Amazon. (And it’s replicated.)
This is still not enough though. What if the site gets hacked and the hacker issues DELETE statements. Then all your data is deleted, and even if you have application-level backup, it will succeed (it will read the empty DB), thus overwriting your old backup.
I guess the conclusion is to keep around several copies of the data, and have sanity-checks in place to avoid overwriting good backups. In his case it was hard (given it’s a homegrown application) to keep around many copies, because his DB was 500G in size.
This comment was originally posted on Hacker News
I think there is an implicit assumption here that if you copy over a live Mysql DB file (I assume that’s what he was doing), you get a consistent (aCid) view of your DB. This is a false assumption.If this is what he was doing, then possibly the older ‘good’ backups weren’t any good either.
This comment was originally posted on Hacker News
A simple tip for those that run any kind of database:Be sure to replicate them in master-slave (or master-master). And base your backups on taking a slave down for backups.Hot backups only work for very small databases – even those that are based on LVM snapshots, tarsnap, innodb hotbackups etc. With big databases, you will be most likely IO bound and a backup will take your site down.
If you have lots of load and lots of data then re-creating a slave will require lots of downtime. For Plurk.com we have had a 4 hour downtime due to re-creating a slave, so be sure to run a master-slave setup and have fresh slaves replicated at all times (we have learned this the hard way
).
This comment was originally posted on Hacker News
Assuming your dataset or shards are reasonably small (<100GB):1. Set up a replication slave on another server.2. Periodically lock the slave while you back it up.3. Copy files to 2 different secure locations.
This gives you consistent backups that do not interrupt production, as well as a hot copy in case of some catastrophe.
This comment was originally posted on Hacker News
Iwonder how they managed to get to half a terabyte. Delicious’s was smaller even for millions of users.
This comment was originally posted on Hacker News
Yeah indeed… .5TB is huge for bookmarks (title, url, tags, description). I have never had the chance to build anything this big but if you imagine the amount of text you could fit in 500GB, it makes you wonder.I gave a look at your comments and found this one which replies to my question perfectly (I didn’t follow that thread, discovered it right now): http://news.ycombinator.com/item?id=459000 — thank you for sharing your experience!
This comment was originally posted on Hacker News
I agree. One of our big financial clients has an automated tool to scrub such data, but then they have social security numbers as well as lots of other juicy financial data. So they’re worried about all sorts of stuff that most of us never ponder as a business risk.One of the santizing steps is to replace all passwords with a set value, such as six/seven of a letter (like "A") or a number (eg, "111111"). Another sanitizing step is to scramble names and addresses. Usually the first letter gets preserved, and the rest gets replaced with a hash (say, MD5 it, and then base64 it and truncate it to length, that way it preserves max lengths and typical size of words).
example: John Doe, 1313 Mockingbird Lane might get munged intoJiqw Dyh, 1313 Masdfasdfas Lfds
We just have username/password/address/phone, so all we do is set all passwords to a default value (all emails, if any, get set to mine), and munge up telephone numbers. Later this year I’ll cobble up a better sanitizer. Our parent company has to worry about GLBA compliance, but our little apps don’t "collect" enough information to worry about GLBA at this time.
This comment was originally posted on Hacker News
> replace all passwords with a set valueThey’re storing passwords directly and not hashes? Wish I could ask you which company so I could avoid them…
This comment was originally posted on Hacker News
According to the write-up on Wired (http://blog.wired.com/business/2009/01/magnolia-suffer.html), Ma.gnolia also took a snapshot of the page being bookmarked. This may account for the size of the database.
This comment was originally posted on Hacker News
Downvote all you want but I would seriously like to know.http://www.matasano.com/log/958/enough-with-the-rainbow-tabl…
This comment was originally posted on Hacker News
I guess I wasn’t clear enough. The "user" tables get updated so that the password used to log in will be "111111" or such, and that means all the salts and hashes will be the same value.
This comment was originally posted on Hacker News
Who said they’re storing passwords in plain text? They could be paranoid enough to remove hashed passwords. If you know what makes the hash you can reproduce. If it’s a database with financial information, I can see crackers devoting time to do this, making their rainbow tables or whatever to guess the passwords.
This comment was originally posted on Hacker News
Who said they’re storing passwords in plain text? They could be paranoid enough to remove hashed passwords. If you know what makes the hash you can reproduce. If it’s a database with financial information, I can see crackers devoting time to do this, making their rainbow tables and botnets or whatever to guess the passwords.
This comment was originally posted on Hacker News
Oh OK, sorry for the tangent.
This comment was originally posted on Hacker News
I don’t understand this obsession about only storing hashes, as if that’s the primary critical issue with site security. There are plenty of reasons to store the plaintext, and in a well secured database I really don’t think it is much of an issue. Or as I heard someone say once, "If you can break into my database, and show me how, I will quite literally give you a million dollars".Off the top of my head, here’s a couple of very good reasons to store plaintext:
- password recoverability: if the user knows they can recover the password, they’re more likely to use a more complex one
- flexibility with authentication: to use something like HTTP Digest Auth, you need the plaintext to be able to hash it with a one-time nonce
And like many will no doubt point out, hashing it isn’t all THAT secure anyway. If it’s not a very strong hash, or there’s enough information to reset it somehow, they can get what they want anyway. Not to mention that if your database has been cracked they probably have everything they want anyway – why even bother logging in?
I just don’t get it. Sure, defence in depth is the best strategy and everyone should practise it whereever possible. But whether the password is stored hashed or not is not the lynchpin security issue many make it out to be, IMO.
This comment was originally posted on Hacker News
"obsession about only storing hashes, as if that’s the primary critical issue with site security"Can you point me to something I said that implies this is an obsession or that this is what I think the primary critical issue is with site security?
" password recoverability: if the user knows they can recover the password, they’re more likely to use a more complex one"
Why would I as a user care at all if I could retrieve the actual value of a complex password — and why would knowing I could recover it make me then choose a more complex one?
(The user should be given an option of resetting the password via a link sent by email. Sending passwords themselves over email is a great way to have it revealed for someone else to use later.)
"to use something like HTTP Digest Auth"
Good thing no one needs this mediocre authentication method if SSL is available.
The majority of people use the same passwords at different sites. So even if someone’s cracked your database, it’s still a good idea. Storing passwords in plaintext is a non-neighborly thing to do.
This comment was originally posted on Hacker News
"obsession about only storing hashes, as if that’s the primary critical issue with site security"Can you point me to something I said that implies this is an obsession or that this is what I think the primary critical issue is with site security?
" password recoverability: if the user knows they can recover the password, they’re more likely to use a more complex one"
Why would I as a user care at all if I could retrieve the actual value of a complex password — and why would knowing I could recover it make me then choose a more complex one?
(The user should be given an option of resetting the password via a link sent by email. Sending passwords themselves over the email is a great way to have it revealed for someone else to use later.)
"to use something like HTTP Digest Auth"
Good thing no one needs this mediocre authentication method if SSL is available.
The majority of people use the same passwords at different sites. So even if someone’s cracked your database, it’s still a good idea. Storing passwords in plaintext is a non-neighborly thing to do.
This comment was originally posted on Hacker News
A great lesson for every saas service, and how a little bit of transparency goes a long way: http://www.transparentuptime.com/2009/02/magnolia-downtime-s…;
This comment was originally posted on Hacker News
"Can you point me to something I said that implies this is an obsession or that this is what I think the primary critical issue is with site security?"You asked for the financial institution’s name so you could avoid them, based solely on the password storage issue. That counts as obsession to me. Oh and I forgot to write it before, but financial institutions often need to store in plaintext anyway, for telephone authentication.
"Why would I as a user care at all if I could retrieve the actual value of a complex password — and why would knowing I could recover it make me then choose a more complex one?"
If people know they have to remember it, they tend to choose simpler passwords, or they write it down. If you tell users to set a hard password, and they can recover it later if necessary, they would hopefully tend to use better ones. I can’t really back that up with a study, though, so it could just be my experience.
"The user should be given an option of resetting the password via a link sent by email. Sending passwords themselves over the email is a great way to have it revealed for someone else to use later."
This is veering off topic, but you either trust the email or you don’t. What, pray tell, is the difference between sending the password and sending a link to reset the password, if an attacker has access to the victim’s email?
"Good thing no one needs this mediocre authentication method if SSL is available."
Yeah, pity SSL is not an authentication method. You did know that, right?
Digest authentication is heavily used in APIs and other non-browser applications, where you need some authentication but the tunnel is not necessary and you don’t want to maintain heavy sessions. SSL, apart from NOT being an authentication method, is anyway slow and heavy and requires proper certs, so is mainly used only for user-facing web sites. Not to mention intranets, devices, etc.
Anyway, even if HTTP Digest Auth were in fact rare, trying to wave it away with "good thing no-one needs it" is ridiculous. I, personally, need it, and am very far from alone.
I’d like to mention that I do agree in principle, and am playing devil’s advocate to some degree. My point is that password hashing is not a panacea, it is often not even possible, and I would certainly not avoid a site just because they store in plaintext if I otherwise had a good impression of their security practises.
This comment was originally posted on Hacker News
"That counts as obsession to me."It’s a red flag, not an obsession…
"financial institutions often need to store in plaintext anyway, for telephone authentication."
Mine doesn’t. And yes, if they did, I would not be their customer. Just because I may not know exactly what happens behind the scenes somewhere doesn’t mean I can’t react to the red flags I can see.
"If you tell users to set a hard password, and they can recover it later if necessary, they would hopefully tend to use better ones"
How is that any different than if the user can reset the password?
"What, pray tell, is the difference between sending the password and sending a link to reset the password, if an attacker has access to the victim’s email?"
There is a big difference. Anyone who has access to the text of the mail at any point in time now has your password. It’s about mitigating the risks of the crappy vetting channel (email) with a time limited method (a reset URL).
"Yeah, pity SSL is not an authentication method. You did know that, right?"
For password based things, I am referring to the channel used to avoid the well known problems with digest access authentication such as man in the middle attacks.
Besides what I was referring to: used with non-anonymous X509 client certs, yes SSL is in fact used for authentication. Entire infrastructures are built on it. All of the clusters I have access to only let me in by virtue of X509 client certificates over SSL.
""good thing no-one needs it" is ridiculous. I, personally, need it, and am very far from alone."
I said good thing no one needs it if SSL is available not that no one needs it…
I use it myself in software we release that runs behind a firewall, I’m well aware it’s cheaper.
"I would certainly not avoid a site just because they store in plaintext"
I admit it’s a little on the reactionary side for me to say that, it was quick snarky comment.
But I don’t take back that it’s a red flag.
This comment was originally posted on Hacker News
Because you are going to lose your entire database, and everyone’s password along with it, to the first SQL Injection vulnerability you miss in your application.
This comment was originally posted on Hacker News
Two things.First, the finserv organizations we’ve worked with tend not to store plaintext passwords.
Secondly, the difference between sending the password and the reset link in email is that the former compromises every other app the user uses.
This comment was originally posted on Hacker News
Fair enough. I think we agree anyway, I’m just being difficult : )"Mine doesn’t. And yes, if they did, I would not be their customer."
Are you sure about that? However would you know? And how would they do telephone banking?
I wouldn’t expect a bank to store plaintext either, I’d expect them to encrypt it and handle decryption at the terminal (smart cart, usually). But that’s a whole different kettle of fish.
"Anyone who has access to the text of the mail at any point in time now has your password."
Yeah, there is no way I want my passwords going through email either. That argument was a bit flaky.
"avoid the well known problems with digest access authentication such as man in the middle attacks"
Your point is valid, but I wanted to respond by saying we’re talking mainly about large-scale DB theft, 99 times out of 100 done by an insider. You seem to have experience inside a large organisation so you will know that SSL terminates at the load balancer, a password form will pass into the server from the balancer in plaintext. If there’s an attacker on the inside, he can sniff that to his heart’s content. Digest is actually more secure in this setting.
Toss up between more security on the user’s LAN/WLAN (SSL) and more security inside the DC (Digest).. OK, this is a bit whimsical.
"All of the clusters I have access to only work by virtue of X509 client certificates over SSL"
Me too, actually. But, sadly, that’s not appropriate for the public at large.
Anyway, I agree it’s a red flag, just trying to make a point that it’s not as black and white as it seemed you were suggesting.
This comment was originally posted on Hacker News
"First, the finserv organizations we’ve worked with tend not to store plaintext passwords."I am talking about retail banking, specifically those with telephone service. If they don’t store in a recoverable form (encrypted or plaintext) then I would love to know how the telephone operators verify passwords.
"Secondly, the difference between sending the password and the reset link in email is that the former compromises every other app the user uses."
Sorry, I don’t understand the meaning of this sentence.
Anyway, I wasn’t really serious with the "password recovery by email" argument, I was just trying to come up with a list of reasons an org might want to store plaintext, but that was probably a pretty flaky one. Any site that sent me my password in plaintext via unencrypted mail would lose me as a customer pretty damn quickly, too.
This comment was originally posted on Hacker News
I was referring to a careful implementation by competent people. An organisation opting to store in plaintext would have to have special precautions so that could never happen.The implementation I have knowledge of is separate from the "main" DB, accessible only over an internal REST interface and heavily secured. There is no way a simple attack or compromisation of a web server could make it spit out the kind of "jackpot!" list you’re talking about. The infrastructure is layered like a frickin’ matryoshka doll and frankly you would have more luck just robbing a bank.
So it is possible to do well, with due care and attention and a competent paranoid admin. Not advisable or desirable for a "normal" system, I agree completely.
This comment was originally posted on Hacker News
Seconded, both have their place.I have tried pretty much all of them, incl. snapBack, dirvish and various homegrown scripts building on top of rsync, rdup, rcs and so on.
rdiff and duplicity are the most mature of the pack which shows mostly in their handling of corner cases (connection loss during backup, resume of a partial/failed backup, disk full during backup, handling of really large trees) but also in overall convenience and robustness (legibility of on-disk format, configuration sanity, tools to find a specific revision of a file, flexibility in retention/purge intervals etc.).
I generally recommend rdiff as the default tool for backups to a remote spinning disk. duplicity, as parent suggests, is good when you need your archive to be a single large file which helps with handling in some situations.
There is also dar worth mentioning which is less useful for incremental stuff but can add redundancy to archives which is good for archiving to unreliable/decaying media (DVD, Tape). Be aware though that older versions had problems with large archives, use a recent version.
And last no least, if you have a tape library then Bacula is a mighty tool. Easier to use and pretty much on par in terms of features compared to the commercial offerings and the residents like Amanda.
We generally deploy a single backup server here with lots of disks that pulls snapshots from everywhere via rdiff and either mirrors the local repository to a remote location or feeds the precious data to tape via Bacula.
This comment was originally posted on Hacker News
Well then, in your parallel universe, by all means store plaintext passwords. I’ll go on busting up the real apps, and recommending they don’t tempt fate by storing passwords.
This comment was originally posted on Hacker News
Done quite a bit of retail banking work. Stored plaintext passwords? A doc-able finding.
This comment was originally posted on Hacker News
>> An organisation opting to store in plaintext would have to have special precautions so that could never happen.What everyone is trying to say is there are no foolproof measures to securing data. Everyone who thinks their method is safe becomes the case study for the next generation.
Also, social engineering and disgruntled employees trump internal software architecture everytime.
This comment was originally posted on Hacker News
Exactly right, and I’m trying to say that too!My point is that although password hashing is a very wise practise, there are situations in which the plaintext is necessary, and with careful design a plaintext password store can be made no weaker in security than the rest of the system.
This seems to me to be common sense and I have no idea why it’s so controversial.
This comment was originally posted on Hacker News
I recommend that too but sometimes it is unavoidable. When, not if, but when you come across that necessity in your encounters with "real apps", I hope you rightfully feel like a bit of a douche for writing the above.
This comment was originally posted on Hacker News
I meant encrypted for banking, of course. The key point being that the passwords are readable. Two-way, vs the one-way hash discussed before. Maybe I didn’t explain myself properly.Web site passwords might be one-way hashed, I don’t know, but telephone banking passwords must be displayed on screen for the operator to read.
This comment was originally posted on Hacker News
Because I don’t want the developers of your site to be able to find out the password I also use on other sites.
This comment was originally posted on Hacker News
So you trust the developers of a web site’s word that they hashed your password, because you don’t trust them to not look at it? You trust someone’s word that they’re not doing something you don’t trust them not to do?Right. Anyway, for future reference, just remember that if you send your password to someone, and they want to look at it for some reason, they can, regardless of their (claimed) database/authentication design.
This comment was originally posted on Hacker News
Any company that stores data should have a SAS 70.
Anyone storing data they consider crucial on a hosted service should require a SAS 70.
Was Halff irresponsible? Not really. A poor planner? Yes, absolutely. But not irresponsible.
Users need to be more responsible with knowing WHAT is being done with their data.
This comment was originally posted on http://blog.wired.com/business/)“>Epicenter
There are a few issues.A number of hacks involved downloading the database of some websites. Usually this involves defeating the operating system that runs the website (or a server in the same datacenter) and FTPing the database away. Now the thieves have every username and every password, and can log in and abuse the system as needed.
People tend to use the same username/password combination everywhere they go. In the early days of Everquest, there were a number of websites set up just to harvest usernames and passwords, and about 5% of the EQ userbase used the same username/pwd as their game accounts. Steal one forum database (which is probably why the efforts to crack vBulletin and phpBB), or set up your own honeypot, and harvest a number of game accounts to loot and plunder. This applies to other games as well, not just evercrack.
The reason for the love of hashing is that it is one-way. There isn’t any feasible method for reversing it. While things like rainbow tables can make crackers’ lives easier, it still puts the burden on them.
Backup tapes get lost. Crooked employees have been known to sell access, account info, or whole dumps of DBs. External opponents aren’t the only threats, you have malevolant and stupid internal threats to deal with as well.
http://www.csoonline.com/article/print/480589
http://www.csoonline.com/article/print/479038
This comment was originally posted on Hacker News
These are all good points and I agree with you completely.I think I haven’t explained myself well. Firstly I made the mistake of saing "plaintext" when what I really meant was "recoverable", ie encrypted but not hashed. I would never suggest that passwords be stored in plaintext, protected only by operating system and DB passwords. I didn’t make that clear and I think I’ve deservedly gotten some heat for it.
All I am trying to say is that with enough effort, data can be secure, or secure enough. In my job I store customers’ credit card information. This must be in a recoverable form. If it leaks I am dead and probably so is the company. Nothing is perfect but I have gone to a lot of trouble and I have reasonable confidence in my efforts.
Same goes for server private keys, financial records, etc. All of it is "ring 0" secure data and extraordinary efforts are made to keep it that way.
I do not actually store user passwords currently, but I know people who do. I have similar confidence in their precautions and skill. Obviously it can be done badly, just like hashed passwords can be implemented badly. But if done properly, I stand by my assertions that storing user passwords in a recoverable format can be no greater a risk than any other part of the system, and no easier or more likely an attack.
This comment was originally posted on Hacker News
In one of the 5 largest retail banking operations in the US? Use of HTTP Auth — digest or otherwise — at all — a doc-able finding.
This comment was originally posted on Hacker News
Ma.gnolia has always been really slow. But its community-oriented feeling was enough to make it worth using. And the fact that it’s free, along with the fact that bookmarks aren’t really vital assets, means that I don’t feel particularly angry about what happened. What Larry says about it being such a small operation is right: I knew it was smaller than delicious/Yahoo, but hadn’t imagined quite how small it was.
I’d like to hear more about the qualitative research tools Larry used to work on.
This comment was originally posted on http://blog.wired.com/business/)“>Epicenter
I applaud Mr. Halff’s response to a situation most would not take so rationally and productively. His words are wise ones, there are some elements of the digital age that are simply not taken seriously enough by business or the general public. The information/articles on this subject provided by sites like justaskgemalto.com are close to a necessity for the average digital user.
This comment was originally posted on http://blog.wired.com/business/)“>Epicenter
Excellent interview! Great lessons for every company and startup out there. I have been recommending this to many of my startup clients to check out this video, and am also, now recommending my readers check this out and learn from the mistakes of ma.gnolia — http://tpgblog.com/2009/02/19/weekend-magnolia/
Jeremy Horn
The Product Guy
http://tpgblog.com
This comment was originally posted on http://www.centernetworks.com/)“>Welcome to CenterNetworks – The New Standard In Web Technology News and
"…while armchair backup enthusiasts have been quick to call Ma.gnolia irresponsible and worse, the truth is that any number of popular web services could at any moment experience something similar."
Oh, then it’s ok, everybody’s doing it!
Seriously, Ma.gnolia.com was operated irresponsibly, and no amount of openness will make it ok. And if other services are doing (or not doing) the same, they are irresponsible as well.
Reciva’s been off the air all day today, "Could not connect to the database server", if they are open about it will that make it ok? See sqlanywhere.blogspot.com/2009/02/time-for-web-20-database-dead-pool.html
This comment was originally posted on http://blog.wired.com/business/)“>Epicenter