Healthcare.gov: a failure for open gov buzzwords

There's been a big shift in how people use the web that caught up with Healthcare.gov and sister sites yesterday. You can build the most beautiful and "scalable" website for web visits, make it open source, put the code up on GitHub, talk about how innovative it is, then watch it crumble under the server strain of people trying to actually do something through your site.

Healthcare.gov's real challenge wasn't to build an alternative to a commercial CMS (content management system), it was to build an application that can handle event-oriented human behavior - for that you need the best systems engineering, not "10,000 authenticated users through GitHub" for your content delivery, as one of the Healthcare.gov contractors highlighted in this Atlantic profile of the project by Alex Howard.

Before the application process bogged down yesterday, Healthcare.gov got lots of nice gov tech insider buzz for its open source nature. But the project still had contractors on board, and based on how the service behaved on opening day of the Affordable Care Act, it could have stood a lot more testing of what people actually wanted to do with it. Kind of like Mitt Romney's Orca system on election day last November.

The Healthcare.gov site loaded fine, but trying to apply through it was kinda like buying first-time Comic-Con badges online.

Open source has changed the technology landscape for the better, underpinning many of our favorite startups. However, simply invoking it like a protection spell is no replacement for the architectural skill and planning required to pull off the systems needed for a successful Healthcare.gov launch. Health and Human Services, which managed the project, needed a little more "Puppet vs. Salt" and a little less "open" in its vernacular. 

Adapting to a web where people are participants, not viewers, is the lesson we're all learning. Web infrastructure needs to support people, not publishing.

The reaction to failures of Healthcare.gov under heavy load won't work if the discussion is about how other services fail - it has to be about building infrastructure that's designed for peak interactivity and not for views.

Choice quotes from the Atlantic profile:

Bryan Sivak, CTO at Health and Human Services: "Instead of [running] farms of application servers to handle massive load, you're basically slimming down to two. ... The way it's being built matters."

Dave Cole from HSS contractor Development Seed: "You're just talking about content. There just needs to be one server. We're going to have two, with one for backup. That's a deduction of 30 servers."

Maybe there was a lot more infrastructure work going on behind the scenes, but the project leads' obsessive focus on the content framework is telling. 

Healthcare.gov's scaling challenge was never about delivering content like a really popular website, it was the peak activity challenge that Twitter faces on a regular basis. Taking interaction-based scaling challenges seriously is why Twitter is stable now and wasn't in 2009 - those are the issues HHS should have been talking about.

___

Few updates after a bit of Twitter fun on these issues today:

Not faulting Alex's reporting in any way here - I believe if the HHS team was really focused on the infrastructure for supporting a signup rush at the time of the Atlantic article, that dedication would have shown up in the story. The omission of that kind of discussion (read the article - the project team seems to have an almost flippant approach to back-end server architecture). I also googled around looking for commentary on that front from earlier in the life of the project.

I didn't do a detailed investigation, this is an opinion blog piece not investigative journalism. As I said above, it's quite possible there was more going on - but the fact the site had so much persistent trouble as an actual application (while it functioned fine as what we call in the biz a "brochure site") means whatever was done fell dangerously short. 

Second, Matthew Hancock points out that the signup form on launch was loading 92 separate resources, including 56 separate javascript files. Whether you're using a CDN or not, that's just bad. "The site basically DDOSed itself," in Hancock's opinion.

Finally, if an important initiative like Healthcare.gov is going to get 2.8 million views in a day, I want everyone who wants to apply through that site to do so smoothly. My ding on "open government buzzwords" is that it's really easy to do "innovative" things with government technology and get headlines, without actually delivering for constituents.

___

Another update from Twitter conversation:

Alex speculates the devs and designers who built the content framework aren't to blame here.

Fair enough. I think it's fairly clear from the above that I blamed HSS and a culture of thinking that web properties are publishing applications and not designing them for interaction. It's really time to stop talking about a "front-end" and a "back-end" for any kind of website. If it doesn't scale for interaction, it doesn't scale. Twitter's infrastructure challenge isn't displaying millions of tweets, it's keeping all of them threaded in real-time. 

Open source content frameworks are nice (hey, Twitter released Bootstrap!), but HHS separated that issue from the kind of services needed to effectively scale the application process. It's like building a really shiny muscle car and then giving it a weak 2-liter engine. Fully integrated applications with content delivery and scalable interaction design are really, really hard. And that's where buzzwords fall short.

___

Sept. 7 update:

On Saturday, I wrote about these issues on GovFresh, "The openwashing of Healthcare.gov" and cited a Reuters article that laid the project on CGI Inc., a giant federal contractor. 

Today, the Wall Street Journal quoted an HHS spokeswoman and IT experts regarding flaws in the system. The article mentions CGI and also says Experian had a contract around identity verification. Based on the analyses I've read, it seems like there could be timeouts or critical delays between security question submittal and verification, which would indicate architecture issues again, not an Experian issue per se. 

Showing 6 reactions

Please check your e-mail for a link to activate your account.
  • commented 2013-10-03 00:55:43 -0400
    Established vendors also sometimes know what they’re doing (bringing that up because Sivak dinged Percussion, and those guys have built some MONSTER web properties).

    Open source is driving a lot of innovation, but not in the “we crowdsourced the code” way, more in the “we use Postgres and shared libraries” way. IMO, open source isn’t the answer in itself to government technology improvements – the biggest wins will come from procurement reform to bring in more competition-driven advancements in tech infrastructure.
  • commented 2013-10-03 00:17:52 -0400
    The answer is soft launch and testing. I worked with and have crazy respect for Sivak, but as now GSA chief Tangherlini used to say – success without data is just emotion. And Bryan led with emotion in this case.
  • commented 2013-10-03 00:12:35 -0400
    Adriel is right about something else, @mike: if people — users, consumers, citizens — aren’t successfully completing applications and registering, the website is failing to serve its purpose. Full, end stop.

    Whether the issue is insufficient servers, network infrastructure or bad code (or all three — it’s still unclear and officials aren’t commenting http://www.washingtonpost.com/national/health-science/obamacare-site-goes-live-with-some-glitches/2013/10/01/380a4300-2a9d-11e3-8ade-a1f23cda135e_story.html ) — if that continues the mission of the site and those that entrusted with delivering has failed. We’ll learn more in the days and weeks ahead.
  • commented 2013-10-03 00:04:18 -0400
    Enjoyed the sparring on Twitter tonight. Good conversation. But Adriel is right – while the process and approach is to be lauded – getting the basic user experience and basic user requirements right was the most important goal here. Whether it was GitHub or they faxed and mailed each other code via USPS – they had to ensure the use cases were met. It’s not quite shiny object syndrome case here … but its close. Functionality and usability on something this high-profile and important will always trump innovative processes and tech.
  • commented 2013-10-02 21:33:40 -0400
    Alex, thanks for adding your thoughts. My point in citing the original article about the website is that the beautiful, responsive website isn’t the part of Healthcare.gov that most needed to work. Sivak and team got all kinds of kudos for it, then the part that needed to work yesterday didn’t.

    I 100 percent understand they are two different pieces of the infrastructure puzzle. And I also don’t care if a government website scales if all it does is look pretty.

    The one thing that galls me most in the “open government” space is folks taking bows without doing the real work. It happens far too much. In fact, my point in running Gov 2.0 Radio weekly for a couple years was to highlight the folks doing the work who don’t get the headlines.

    Often these kind of systems fail fairly quietly – in this case, decidedly not so. As a gov tech junkie, I had an itch here, scratched it, enough said.
  • commented 2013-10-02 21:22:18 -0400
    Hey there, Adriel. Glad to see you adding your voice to the conversation around one of the most significant website launches in history. You mentioned on Twitter that people never comment on blogs any more, so I’d like to chime in.

    Given that so much of this seems to be you keying off of the feature I wrote for the Atlantic, I wanted to clarify a couple of issues for people who are new to this subject. I’m grateful for your generosity in not faulting my reporting.

    The website described in the piece worked at launch and afterwards, as planned. From what I can tell, it’s the new backend — and, potentially, the network infrastructure that supports it, from switches to throughput — that’s at issue. The site that launched in June scaled fine all summer.

    The code that went live on October 1st is the software application that handles creating new users, posing security questions, calculating subsidies and otherwise getting people registered for health insurance. It has not, as has been widely reported, worked smoothly. http://www.nytimes.com/2013/10/02/us/health-insurance-marketplaces-open.html?hp&_r=0&pagewanted=all

    One of the fascinating aspects of this story is we have users in some 34 states testing out Healthcare.gov, along with an increasing number of people using exchanges the states built. Lots of error messages and problems to watch and collect.

    Regardless, “open gov buzzwords” aren’t at issue for any failures here. From what I can tell, the contractors responsible for backend code are, along with the government officials who sent out the RFP and managed the project, including testing for usability, scalability, bugs and security. (Merici Vinton disagrees with that assessment; more of that story will emerge over time.)

    For now, it looks like you’ve criticized the open source approach to building the front-end of the site without fact checking its relationship to the registration engine and have taken quotes from Cole and Sivak out of context.

    Everyone interviewed in the piece was talking about the front end of the site and the content delivery system behind it, not the subsequent application built to guide people through registration.

    I can see how you and others might misunderstand that point, in retrospect, and I will do what I can to report out what happened as I have time and capacity to do so this month.

    Cheers,
    Alex

Take action Volunteer Support $

connect

get updates