There's been a big shift in how people use the web that caught up with Healthcare.gov and sister sites yesterday. You can build the most beautiful and "scalable" website for web visits, make it open source, put the code up on GitHub, talk about how innovative it is, then watch it crumble under the server strain of people trying to actually do something through your site.
Healthcare.gov's real challenge wasn't to build an alternative to a commercial CMS (content management system), it was to build an application that can handle event-oriented human behavior - for that you need the best systems engineering, not "10,000 authenticated users through GitHub" for your content delivery, as one of the Healthcare.gov contractors highlighted in this Atlantic profile of the project by Alex Howard.
Before the application process bogged down yesterday, Healthcare.gov got lots of nice gov tech insider buzz for its open source nature. But the project still had contractors on board, and based on how the service behaved on opening day of the Affordable Care Act, it could have stood a lot more testing of what people actually wanted to do with it. Kind of like Mitt Romney's Orca system on election day last November.
The Healthcare.gov site loaded fine, but trying to apply through it was kinda like buying first-time Comic-Con badges online.
Open source has changed the technology landscape for the better, underpinning many of our favorite startups. However, simply invoking it like a protection spell is no replacement for the architectural skill and planning required to pull off the systems needed for a successful Healthcare.gov launch. Health and Human Services, which managed the project, needed a little more "Puppet vs. Salt" and a little less "open" in its vernacular.
Adapting to a web where people are participants, not viewers, is the lesson we're all learning. Web infrastructure needs to support people, not publishing.
The reaction to failures of Healthcare.gov under heavy load won't work if the discussion is about how other services fail - it has to be about building infrastructure that's designed for peak interactivity and not for views.
Choice quotes from the Atlantic profile:
Bryan Sivak, CTO at Health and Human Services: "Instead of [running] farms of application servers to handle massive load, you're basically slimming down to two. ... The way it's being built matters."
Dave Cole from HSS contractor Development Seed: "You're just talking about content. There just needs to be one server. We're going to have two, with one for backup. That's a deduction of 30 servers."
Maybe there was a lot more infrastructure work going on behind the scenes, but the project leads' obsessive focus on the content framework is telling.
Healthcare.gov's scaling challenge was never about delivering content like a really popular website, it was the peak activity challenge that Twitter faces on a regular basis. Taking interaction-based scaling challenges seriously is why Twitter is stable now and wasn't in 2009 - those are the issues HHS should have been talking about.
Few updates after a bit of Twitter fun on these issues today:
Not faulting Alex's reporting in any way here - I believe if the HHS team was really focused on the infrastructure for supporting a signup rush at the time of the Atlantic article, that dedication would have shown up in the story. The omission of that kind of discussion (read the article - the project team seems to have an almost flippant approach to back-end server architecture). I also googled around looking for commentary on that front from earlier in the life of the project.
I didn't do a detailed investigation, this is an opinion blog piece not investigative journalism. As I said above, it's quite possible there was more going on - but the fact the site had so much persistent trouble as an actual application (while it functioned fine as what we call in the biz a "brochure site") means whatever was done fell dangerously short.
Finally, if an important initiative like Healthcare.gov is going to get 2.8 million views in a day, I want everyone who wants to apply through that site to do so smoothly. My ding on "open government buzzwords" is that it's really easy to do "innovative" things with government technology and get headlines, without actually delivering for constituents.
Another update from Twitter conversation:
Alex speculates the devs and designers who built the content framework aren't to blame here.
Fair enough. I think it's fairly clear from the above that I blamed HSS and a culture of thinking that web properties are publishing applications and not designing them for interaction. It's really time to stop talking about a "front-end" and a "back-end" for any kind of website. If it doesn't scale for interaction, it doesn't scale. Twitter's infrastructure challenge isn't displaying millions of tweets, it's keeping all of them threaded in real-time.
Open source content frameworks are nice (hey, Twitter released Bootstrap!), but HHS separated that issue from the kind of services needed to effectively scale the application process. It's like building a really shiny muscle car and then giving it a weak 2-liter engine. Fully integrated applications with content delivery and scalable interaction design are really, really hard. And that's where buzzwords fall short.
Sept. 7 update:
On Saturday, I wrote about these issues on GovFresh, "The openwashing of Healthcare.gov" and cited a Reuters article that laid the project on CGI Inc., a giant federal contractor.
Today, the Wall Street Journal quoted an HHS spokeswoman and IT experts regarding flaws in the system. The article mentions CGI and also says Experian had a contract around identity verification. Based on the analyses I've read, it seems like there could be timeouts or critical delays between security question submittal and verification, which would indicate architecture issues again, not an Experian issue per se.