Make data discoverable

Open data is nothing without users. You need to be able to make sure that people can find the source material. This section will cover different approaches.

The most important thing is to provide a neutral space which can overcome both inter-agency politics and future budget cycles. Jurisdictional borders, whether sectorial or geographical, can make cooperation difficult. However, there are significant benefits in joining forces. The easier it is for outsiders to discover data, the faster new and useful tools will be built.

Existing tools

There are a number of tools which are live on the web that are specifically designed to make data most discoverable.

The most prominent is CKAN.net. CKAN stands for the Comprehensive Knowledge Archive Network, and is a catalogue of all datasets in the world. The site makes it very easy for developers to find the material that they’re seeking.

In addition, there are dozens of specialist catalogues for different sectors and places. Many scientific communities have created a catalogue system for their fields, as data are often required for publication.

For government

As it has emerged, orthodox practice is for a lead agency to create a catalogue for the government’s data. When establishing a catalogue, try to create some structure which many departments can keep their own information current easily.

Resist the urge to build the software to support the catalogue from scratch. There are many free and open source software solutions which have been adopted by many dozens of governments already. Investing in another platform will be a waste of resources.

There are a few things that most open data catalogues miss. Your programme could consider the following:

  • Providing an avenue to allow the private and community sectors to add their data. It may be worthwhile to think of the catalogue as the region’s catalogue, rather than the regional government’s.
  • Facilitating improvement of the data by allowing derivatives of datasets to be catalogued. For example, someone may geocode addresses and may wish to share those results with everybody. If you only allow single versions of datasets, these improvements remain hidden.
  • Be tolerant of your data appearing elsewhere. That is, content is likely to be duplicated to communities of interest. If you have river level monitoring data available, then your data may appear in a catalogue for hydrologists.
  • Ensure that access is equitible. Do not create a priviledged level of access for officials or tenured researchers. This will cause resentment and ultimately undermine the goals that you are seeking to achive.

For civil society

Be willing to create a supplementary catalogue for non-official data.

It is very rare for governments to associate with unofficial or non-authoritative sources. Officials have often gone to great expense to ensure that there will not be political embarrasment or other harm caused from misuse or overreliance on data.

Moreover, governments are unlikely to be willing to support activities that mesh their information with information from businesses. Governments are rightfully skeptical of profit motives. Therefore, an independent catalogue for community groups, businesses and others may be warranted.