Briefing Doc

This is the briefing document I've provided for prospective lead legislative sponsors (see questions and comments below):

Along with a number of other open government advocates, I've launched a campaign to put a definition of "open data online" into California and San Francisco law. The issue is that often when documents and data are published online, they cannot be accessed or used in a meaningful fashion because they cannot be searched, indexed by Google, or combined in a meaningful way with other documents for analysis. I want to tackle this not by mandating that certain documents and data be published online, but simply by creating a reference standard so that when new mandates pass, or new documents are published online as a matter of course under existing law or regular business, they are in accessible formats.

This has the benefits of making things easier for people who use screen readers, for developer who want to use public data to build applications, for transparency advocates, and is simply good policy. Publishing data in formats that can't be searched, compared to other documents or reused in a meaningful way is as useless as keeping it tucked away in an obscured file cabinets. Publishing in accessible formats online is as simply as education employees in how to properly save and store documents for online publication using the same software they already have on their computers. In an ironic demonstration of the current problem, San Francisco's current open data law was published by the Board of Supervisors as an unsearchable PDF.

Proposal: San Francisco/California Open Data Standard

Draft Text: Heretoforth, any documents or data published online by the State of California/City and County of San Francisco and its employees, departments and agencies must be published in a structured format that can be retrieved, downloaded, indexed, sorted, searched, and reused by commonly used Web search applications and commonly used software.
Background: California/San Francisco would further cement its leadership position as one of the global leaders in open government and accessibility by adopting this standard. It is derived from model open government legislation proposed by the global CityCamp movement (http://opengovernmentinitiative.org/directive/v1/). Much of the existing open data legislation from around the world lacks simple and clear standards definitions such as this (http://wiki.civiccommons.org/Open_Data_Policy). Creating this standard would be the foundation for ensuring that future laws around publication of State/CCSF documents are meaningful. See also background on open data standards around the world:http://wiki.civiccommons.org/Open_Standards_Policy
Associated costs: None, and possibility of savings. This standards legislation would not create a new mandate for publication, rather it would give clear guidance on how data is to be published - using commonly accessible formats without requiring a specific format that could be outdated by technological developments. Passage of this law would reduce the burden of reformatting documents to comply with records requests as documents published under this standard would be easily accessible. It also has the benefit of opening government data to innovators from around the world to build useful applications using public data.
Early support: Since we publicly launched a campaign to enshrine this standard into law in SF and California on Nov. 16, 2011, we have seen significant support across social media channels, and endorsements from open government leaders from San Francisco, California and around the world, including:
  • Javier Muniz, CTO and co-founder, Granicus (based in SoMa and one of the greatest open gov tech company success stories in the U.S.)
  • Steve Ressler, founder, GovLoop
  • Rep. Jason Murphey, Chairman of the House Goverment Modernization Committee, Oklahoma
  • Scott Primeau, OpenColorado
  • Luke Frewell, founder and publisher, GovFresh
  • and many more who can be viewed  online - http://www.wiredtoshare.com/structured_open_data_campaign
The legislative proposal is also supported by CityCampSF, Gov 2.0 Radio, GovFresh and the SF Tech Dems.

Showing 12 reactions

Please check your e-mail for a link to activate your account.
  • Here are some additional comments. I’ll leave the discussion of what is/should be open to others.

    • Consider changing title to: Proposal: San Francisco/California Guidelines for Publishing Open Data or something similar.

    • Change “commonly used Web search applications and commonly used software” to “Web search applications and software that have achieved dominance in the marketplace and are publicly recognized as widely used.
    • I interpret the term “Open Data Standard” in the context of this proposal to mean any data or documents that are, can, may, or should be available to the public without restriction.

    The term “Open Data Standard” does not have a single definition. Commonly agreed definitions include:

    • Published without restriction; the standard is available at no charge or a charge that is reasonable in cost and can be reasonably administered by parties in the implicated industry.
    • Made freely available for adoption by the industry
    • Controlled by a non-profit open industry organization with a well-defined inclusive process for evolution of the standard.
  • Great feedback, Sylvia! Working now on breaking up the transparency and open data pieces for San Francisco based on ongoing discussions about where the problems and needs are. Will be publishing more of that thinking soon.
  • While implementation of this definition is outside of the scope of this effort, it should not be completely ignored. There will be an increased cost to filter data and documents that are published and distributed electronically versus those that are just posted online. Personally I don’t think the definition of “open data” needs to be restricted to data and documents that are available on the web.

    My suggestions:
    • Change “open data online” to “open data exchanged electronically”
    • Change “published online” to “distributed electronically”
    • Change “new documents are published online” to “ new documents are distributed electronically”
    • Change “education employees” to “educating employees”

    Open data standard has two very different semantic meanings. I will cover this in a later post.
  • Kristy, I’m quick to to say preserve library hours over paying for new systems if they aren’t out of life cycle. What kind of language would address these kind of cost concerns without gutting the proposal? Something along the lines of “whenever publishing systems support” and “new systems should support publishing documents and data in structured formats”? Generally government employees have training budgets – what resources exist for inexpensive training on how to publish in structured formats?
  • I’d love to support this proposal, but I have one hang-up – the assertion that there would be no associated costs. I’ve worked in local government for over 10 years and I can tell you that many, many cities use proprietary agenda management systems. These backend systems facilitate the publishing of agendas and staff reports from multiple departments and agencies. Many of these systems publish the final documents as image files, not a searchable or machine-readable document.

    These are the systems that many governments have in place, and there would be costs associated with replacing this enterprise software. This doesn’t mean that governments are against replacing these systems, just that there would be costs involved.

    In cases where there are no proprietary software hangups, there is also the issue of “simply educating employees in how to properly save and store documents for online publication.” Training in government is never a simple task, especially for large governments with decentralized web management systems. Even tiny City of Reno has 80 staff posting to the web. If an internal expert is not on staff, a funding source for a training consultant would need to be identified.

    I think the heart of the proposal is in the right place, just be cautious of encouraging an unfunded mandate.
  • THe W3C Working Draft (8 September 2009) “Publishing Open Government Data” is also helpful. Thinking about long-term open data legislation that has language more meaningful than “should” means institutionalizing a standard that is not tied to a contemporary format (XML, RDF). CSV is a structured data format.

    Here are the guidelines from the W3C – http://www.w3.org/TR/2009/WD-gov-data-20090908/ – including:

    “Make the data both human- and machine-readable:
    enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
    encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
    make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
    use permanent patterned and/or discoverable “Cool URIs”;
    allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks."
  • Let me step back a bit. When I re-read the essential goal: legal definition of open data, my first suggestion is get a lawyer. I reached out to a few to provide comment.

    Since IANAL, I can only guess that the law requires: a logical test, an example that passes the test, an example that fails the test. Open data is… Data that embody openness are… Data that do not embody openness are…

    Is that specifically what you are after?

    If so, does that not boil down to defining the word “open?”

    Under the Open Data Campaign tab you assert that we are hobbled by lack of a legal definition. Do legislators in California understand the assertion, i.e., what “hobbled” means? I’m not sure if this site is meant to stand alone or if it accompanies other material.

    Background leads with “cementing leadership” for CA & SF. That seems vain and not at all tied to any practical motivation for undertaking the advocacy. You have a lot more background from which to draw for this to be the lead statement.

    Sidebar: you mentioned screen readers, above – are you aware of http://www.section508.gov/ ?
  • Change “commonly used Web search applications and commonly used software” to “on the Web.” Consider using the phrase “URL-addressable, human and machine readable, well-formed text.” A definition of well-formed may be needed. I didn’t find one in a first search. Cite examples like thomas.gov (http://thomas.loc.gov/home/gpoxmlc112/h3261_ih.xml). Perhaps prohibit formats that are designed to explicitly inhibit sharing, such as those that require licenses to view and those that block copy & paste.
  • Adriel, great initiative. Please keep us informed since other countries and cities would be interested too.
  • There are two aspects and concepts to this proposed legislation. The first is the definition. There is no cost above the cost to create and approve the proposal. The second aspect and concept is implementation. There is a cost if the definition includes the use of open public standards as well as long term cost savings. It would be a gross mis-statement to claim there are no implementation costs. If using proprietary de-facto standards like Word, Google Docs, and other similar products are all that is definied, then implementation costs may be small. I personally am not in favor of the second option.

    Cost savings are just one reason to implement machine readable formats using data standards and it doesn’t take a rocket scientist to outline the savings. As a taxpayer, It is disturbing to me that an initiative with long term savings would be DOA if costs are assigned to it from the beginning.
  • Initial feedback from legislative staff is that the State Appropriations Committee could assign a cost to this initiative, effectively killing it. Any supporting documentation and evidence showing no cost or cost saving associated with this initiative would be helpful.

    Also, they are asking for further definition of “commonly used Web search applications and commonly used software.”

    Any help from open data advocates responding to these two concerns is appreciated! Thanks.