OGGO Committee Report

If you have any questions or comments regarding the accessibility of this publication, please contact us at accessible@parl.gc.ca.

PDF

CHAPTER THREE: QUANTITY AND QUALITY

The second open data principle – Quantity and Quality – is centered on the release of well-described, high-quality and timely open data. Throughout the Committee’s study witnesses discussed a variety of ways in which to increase the quantity and quality of open government data. These suggestions have informed the Committee’s recommendations in relation to standardized and high-quality metadata and enhanced open government data.

A. Well-described open data

Metadata is data that describes the contents of a dataset. It describes the specific types of information in each field of a dataset. According to the CIO of the Government of Canada, metadata is integral to the potential of open data. Many witnesses agreed that well-described open data is central to the quality of open data. Moreover, several witnesses added that metadata is important to users in order to make open data interoperable and comparable among different jurisdictions.

One barrier to the interoperability and comparability of open data is the lack of standardized metadata that can facilitate the integration of datasets released by different jurisdictions. According to Michael Chui, Partner, McKinsey Global Institute, “[s]tandards also have to be developed in order to make data comparable from multiple sources. Then actually releasing metadata, data about data, can make open data more usable.”

Under its commitments to the G8 Open Data Charter, the federal government contributed to the G8 metadata mapping exercise, which involved defining standards for describing datasets internationally. According to a TBS official, there is still ongoing work for Canada and other OGP countries in relation to developing a standardized view of the metadata that should be used to describe open data. He added that several non-governmental organizations are also feeding into the development of these standards, such as Schema.org, World Wide Web Foundation and Open Knowledge Foundation.

In terms of technical standards, developing consistent metadata may present certain challenges. As the independent researcher responsible for reviewing Canada’s commitments to the OGP, Mary Francoli, Assistant Professor, School of Journalism and Communication at Carleton University, collected feedback from users of the federal government’s open data portal in preparing her first progress report. Ms Francoli informed the Committee that according to some stakeholders, metadata have been removed from certain datasets in the name of standardization and consistency.

Several witnesses agreed that further work needs to be done in terms of standardizing metadata. A representative from the Government of Ontario informed the Committee that there is currently no “set standard or metadata […] that is being used across all the different levels of government.” Nevertheless, while the work on metadata standards may be tedious, his colleague argued that the benefits are huge and that common metadata standards can contribute to efficiencies.

In addition to the technical standards, there are also policy standards related to metadata that need to be considered. For example, a representative from the City of Toronto mentioned that with “different levels of government, even ministries within governments, releasing data sets called poverty, there needs to be a consistent metadata set that defines what all those indicators mean.” This served to illustrate the point that governments should consider whether there is a need to develop consistent policies in defining indicators related to a particular theme, so that data is comparable across multiple jurisdictions.

B. High-quality open data

According to the CIO of the Government of Canada, there is a wealth of data and datasets within departments that have not yet been published. With respect to the role of individual departments, she added that, “the departments have to be in a position to maintain and assure the integrity of the data.”

Most witnesses commented that Canada is doing relatively well in terms of open government data. However, David Eaves, an open data consultant, cautioned that there is a danger in using international comparisons too early on. According to Lyne Da Sylva, Associate Professor, School of Library and Information Science at the Université de Montréal, in terms of comparing Canada to other jurisdictions, “the results are quite variable both in terms of the number of datasets and degree of real openness.” In addition, Mr. Deslauriers questioned whether the volume of open data necessarily equals usefulness. In fact, some witnesses agreed that the federal government should focus its efforts on releasing higher quality and more timely open data.

Users of open data have a role in defining what quality data is. In part, this can be linked to which data is most demanded. Several witnesses shared their views with the Committee on what datasets are most valuable. Colin McKay, Head of Public Policy and Government Relations for Google, commented that the government “needs to focus its efforts on identifying and making available the kind of data that generates the most interest from users and will have an impact.” He added that the datasets should be “well structured, machine readable, and regularly updated.” In addition, Mark Gayler, Technology Strategist for Microsoft Canada, commented that “[l]ocal data has a lot more relevance to the average citizen in many cases than […] national trending data.”

In terms of what datasets are most valuable, John Pineau, Chief Executive Officer of the Canadian Institute of Forestry, mentioned that quality data on natural resources and in particular forestry is essential to making good decisions, both for the economy and the protection of ecosystems. Geospatial data was mentioned by Mr. Mallett as key information that is important to the members of the Canadian Federation of Independent Business.

Currently, the federal government’s open data portal is largely populated with geospatial data. In terms of stakeholder concerns with respect to the diversity of federal datasets, Ms. Francoli informed the Committee that, “[t]here are few to no data sets in many other areas, including employment insurance, health and issues related to specific demographics such as seniors or Aboriginal persons.” In addition, some municipal government representatives expressed that they would like open data from the federal government in relation to transportation, environment, health, government spending and socio-economic trends.

Several witnesses shared the view that there is a demand for open data at a more granular level. Mr. Mallett added that there is also need for microdata, at a small geographical or industrial level. Mr. Deslauriers, from the City of Québec, and a representative from the Government of Newfoundland and Labrador, both agreed that there is a demand for open data at a local or neighbourhood level. According to a representative from the City of Toronto, “[w]e need data at smaller levels of geography.” He noted that “[w]hen we look at the data we need, we're going to need unemployment rates [and] we're going to need health care statistics. Those have to be at a level of granularity that makes it useful for cities in their place-based evidence planning.” Mr. Mallett also mentioned that there is a need for forward-looking data or projections, but admitted this is harder data to obtain. As an example, federal departments and agencies possess forward-looking data such as occupational projections or forecasts related to economic growth.

C. Open linked data

The datasets released by the federal government on its open data portal can be characterized as static files. As an example, Renée Miller, Professor, Department of Computer Science at the University of Toronto, explained to the Committee that the federal charities dataset is a static or “dead file,” with no linked data. She added that “[h]ow much federal money [a charity] gets is in [the] file but other information like whether it gets provincial funding, private funding, who those private funding agencies are and information about them, [is] not there.”

Several witnesses suggested that the quality of open government data could be enhanced with open linked data – in particular with data from the public and data from non-governmental organizations. According to Ms. Miller, open linked data is “dereferenceable,” meaning that users can get important and interesting information about the data, including relationships to other data and other important information. Ms. Miller recommended that “the open data portal should adapt the principles of open link data.” She explained that making data linkable is easy to do with existing technology.

Crowdsourcing is the practice of obtaining services, ideas, or content by soliciting contributions from a large group of people, most often through an online community. Some witnesses suggested that the public should be able to contribute to open government data. According to Ms. Miller, the federal government should “look to activate the power [of] the crowd [and create] datasets where the community itself can contribute to those datasets.” Mr Gayler noted that in terms of crowdsourcing, it’s important to have a feedback loop to ensure that more accurate data is reflected on a timely basis and to ensure that the quality of data is improving over time. A representative from the Government of Ontario noted that “[t]he more sets of eyes on the data the higher the quality becomes.”

Meanwhile, other witnesses raised concerns with respect to allowing users to contribute to the federal open data portal. For example, Mr. Eaves warned that to “crowdsource the creation of data creates an enormous number of methodological problems.” He cautioned that there needs to be accountability, especially around datasets that government is using to make decisions, and highlighted that the federal government already has a large volume of verifiable data that it can leverage.

As mentioned earlier, some witnesses suggested that open government data could be enhanced with data from non-governmental organizations. According to Mr. Gayler, if the federal government established partnerships with third-party commercial sources of data, “transformative things [could] start to happen.” As an example, according to a representative from the City of Toronto, “[t]he federal government is sitting on a gold mine of health data through […] the Canadian Institute for Health Information.” The municipal representative also noted that information from the Canada Mortgage and Housing Corporation would be beneficial in terms of data at a specific level of geography. As well, he urged the federal government to consider the granularity, the frequency and the timeliness of the data that it collects and releases. As an example, the long-form census was of particular importance to many witnesses.

With respect to metadata, the Committee recommends that:

RECOMMENDATION 8

The Government of Canada should work with provincial, territorial and municipal governments to standardize metadata for all of the high value datasets identified in the G8 Open Data Charter.

RECOMMENDATION 9

The Government of Canada should establish working group level bodies organized by sector to develop metadata standards.

With respect to the quality of open data, the Committee recommends that:

RECOMMENDATION 10

Through federal departments and agencies and through Crown corporations, the Government of Canada should provide additional data by demographic group, geographic region and by industry on its open data portal.

RECOMMENDATION 11

The Government of Canada should collaborate with provincial, territorial and municipal governments to provide data at more granular levels of geography and industry.

RECOMMENDATION 12

That the Government of Canada should release forward-looking data, such as projections and forecasts, on its open data portal.