Government Data Catalog Guidelines

From Sunlabs wiki

Jump to: navigation, search

Come up with a data exchange format for data catalogs. As we prepare the National Data Catalog for release, and begin to write importers, it's clear we need to have a model like Google did with GTFS. Let's create a basic file format for data catalogs, agree to use them so that our platforms will be interoperable, and get government to use it, too.

Contents

Reading Material

Clay's Blog Post

Blog Post by Luigi and David

W3C "Publishing Open Government Data"

Civic OpenMedia

Suggested Solutions

From comments on Clay's blog post:

XRI

OpenSearch

OpenGIS Catalogue

From comments on Luigi and David's blog post:

RDF

Examples of Metadata

data.dc.gov OPML file

Preliminary Vocabulary

Element Name Description
Title Title of data source.
Description Short description of data source.
URI Universally unique identifier. Could be a permanent, unique URL that contains this metadata.
Type Is the data source a dataset, API, or online database?
Downloads File format/URL pairs that point to data files.
Created Creation date of the data source.
Released Release date of the data source.
Last Updated Update date of the data source.
Update Frequency How often the data is updated
Creator Entity (agency, department, or organization) that created the data.
Publisher Entity that published the data.
Maintainer Entity that maintains the data.
Jurisdiction Political jurisdiction of the data.
Geographic Coverage Physical area the data applies to.
Time Period The time period the data refers to.
Grouping Can the data be grouped with a larger set of similar data? Recommended for data sets scoped to a time period or jurisdiction.
Keywords Applicable tags and phrases.
License The license under which the data set is released.
Documentation Any documentation, such as a data dictionary, or a reference (URL) to that documentation.

The vocabulary can then be implemented in a number of formats:

  • XML
  • JSON
  • CSV
  • RDF
  • OPML
  • Microformat (embedded on the webpage)
  • Atom (for updates)

Representations

XML

JSON

RDF

There's been some work already done with government data catalogs and RDF. Particularly, there is the http://vocab.deri.ie/dcat dcat vocabulary]. Richard Cygniak, the author of the dcat vocabulary, recently made this presentation. (Source)

Also, data.gov.uk has been pushing RDF and Linked Data, as have, by extension, the folks behind the Comprehensive Knowlege Archive Network.

Microformat

Context / Philosophy

There are several groups thinking about interoperability of data catalogs. At the Sunlight Labs, our experience has shown:

  • Lightweight technologies are better than heavyweight technologies -- we try to make it easy for open source developers to get involved.
  • Building web platforms and applications is faster than shifting public policy -- developers can quickly build prototypes and experiment. Some succeed, some fail. But this is much faster than waiting for governments to lead with policy first.
  • We are excited to include RDF support in the future, but we don't want to exclusively use RDF -- we love the principles of interoperable data, but we have seen that the Ruby developer community gravitates towards simpler technologies such as JSON.
Personal tools