Government Data Catalog Guidelines
From Sunlabs wiki
Come up with a data exchange format for data catalogs. As we prepare the National Data Catalog for release, and begin to write importers, it's clear we need to have a model like Google did with GTFS. Let's create a basic file format for data catalogs, agree to use them so that our platforms will be interoperable, and get government to use it, too.
Contents |
Reading Material
W3C "Publishing Open Government Data"
Suggested Solutions
From comments on Clay's blog post:
From comments on Luigi and David's blog post:
Examples of Metadata
Preliminary Vocabulary
| Element Name | Description |
| Title | Title of data source. |
| Description | Short description of data source. |
| URI | Universally unique identifier. Could be a permanent, unique URL that contains this metadata. |
| Type | Is the data source a dataset, API, or online database? |
| Downloads | File format/URL pairs that point to data files. |
| Created | Creation date of the data source. |
| Released | Release date of the data source. |
| Last Updated | Update date of the data source. |
| Update Frequency | How often the data is updated |
| Creator | Entity (agency, department, or organization) that created the data. |
| Publisher | Entity that published the data. |
| Maintainer | Entity that maintains the data. |
| Jurisdiction | Political jurisdiction of the data. |
| Geographic Coverage | Physical area the data applies to. |
| Time Period | The time period the data refers to. |
| Grouping | Can the data be grouped with a larger set of similar data? Recommended for data sets scoped to a time period or jurisdiction. |
| Keywords | Applicable tags and phrases. |
| License | The license under which the data set is released. |
| Documentation | Any documentation, such as a data dictionary, or a reference (URL) to that documentation. |
The vocabulary can then be implemented in a number of formats:
- XML
- JSON
- CSV
- RDF
- OPML
- Microformat (embedded on the webpage)
- Atom (for updates)
Representations
XML
JSON
RDF
There's been some work already done with government data catalogs and RDF. Particularly, there is the http://vocab.deri.ie/dcat dcat vocabulary]. Richard Cygniak, the author of the dcat vocabulary, recently made this presentation. (Source)
Also, data.gov.uk has been pushing RDF and Linked Data, as have, by extension, the folks behind the Comprehensive Knowlege Archive Network.
Microformat
Context / Philosophy
There are several groups thinking about interoperability of data catalogs. At the Sunlight Labs, our experience has shown:
- Lightweight technologies are better than heavyweight technologies -- we try to make it easy for open source developers to get involved.
- Building web platforms and applications is faster than shifting public policy -- developers can quickly build prototypes and experiment. Some succeed, some fail. But this is much faster than waiting for governments to lead with policy first.
- We are excited to include RDF support in the future, but we don't want to exclusively use RDF -- we love the principles of interoperable data, but we have seen that the Ruby developer community gravitates towards simpler technologies such as JSON.
