BEYOND LOCAL: What is our health data really being used for?

This article, written by P. Alison Paprica, University of Toronto; Kimberlyn McGrail, University of British Columbia, and Michael J. Schull, University of Toronto, originally appeared on The Conversation and is republished here with permission:

With all the negative media coverage and public concern related to data, it isn’t surprising that governments are responding with laws and policy statements that emphasize the need for consent from the data subject. People’s data shouldn’t be used in ways that they don’t support.

Health data has a special place in this discussion given public concerns around the world, including in the United States, United Kingdom, Denmark and Australia. However, while obtaining public and patient consent can be part of how we achieve transparency and build trust, it isn’t the whole story or a complete solution.

As part of a pan-Canadian group of researchers who have worked with health data for more than a decade, our commentary “Notches on the dial: a call to action to develop plain language communication with the public about users and uses of health data” in the International Journal of Population Data Science aims to raise awareness about situations where health data are used in the absence of expressed consent. Our team also wants to increase the involvement of members of the public in decisions about health data uses.

Uses of health data

News coverage of health research tends to focus on study findings rather than research methods, so people may not be aware of the fact that there are many different uses of health data. Consider how different the following uses are:

An organization may use health data generated through the services that they deliver to inform decisions about their core business. For example, an insurance company could use client data to develop new products or investigate potential fraud. A hospital could use the data it generates to improve the quality of its services).

An organization may provide the health data that they generate or collect to another organization in exchange for money or some other anticipated financial benefit. An example of this would be a company that provides genetic testing services to the general public selling client data to a pharmaceutical company.

There are health research studies in which all participants provide consent for their data to be used for a particular purpose, such as a clinical trial of a drug, conducted at an academic centre but funded by a pharmaceutical company.

Other health research studies make use of large cohort datasets where tens or hundreds of thousands of people provide consent for their data to be used once it has been anonymized or scrubbed of identifying information. An example of this kind of research would be an academic study of interactions between genomes and the environment that uses data from the Canadian Partnership for Tomorrow Project or the UK Biobank).

Finally, there are research studies that use provincial, state or national health datasets in order to get a complete picture of a health issue. These studies often rely on unconsented data for entire populations or sub-populations. For example, academic researchers used unconsented data without identifying information to study the opioid epidemic.

Why we have unconsented health data

Governments and health organizations routinely collect information about health-care system use so they can manage, administer and pay for health services. This information is referred to as administrative data, and a new piece of administrative data is created every time someone has an interaction with the health-care system, like a doctor’s visit.

Many people don’t give this routinely generated health data much thought, but researchers recognize the power of the data. Datasets that cover the whole population over long periods of time can provide essential information about health-care needs and services. Because of the invaluable information that population-wide health datasets can provide, there are processes through which researchers can obtain access to work with the data without patient consent.

Incomplete data

There are significant differences between data that are gathered with expressed consent and data that are routinely gathered automatically (without consent) by organizations involved in delivering health services. Consented data are less likely to include people who are very young or very old, who are very ill, who don’t speak an official language or are marginalized for other reasons.

In fact, studies have shown that consented data are different from unconsented data in terms of the age, sex, race, income, education and health status of the people included.

Historically, we have seen the negative consequences when health studies focus on biased or incomplete datasets. For example, we now know that data and findings from cardiovascular clinical trials which focused mostly on men up until the 1990s resulted in women’s heart attacks not being recognised or treated appropriately.

We can and should learn things from properly designed research studies that only include patients who have provide informed consent. But we also need to understand the limits when health research is based on these relatively small datasets, such as clinical trials that may exclude many of the people intended to benefit from them.

Research based on large consented cohort datasets is more likely to have findings that are unbiased and widely applicable, but even those studies can’t present a complete picture for some of the most vulnerable people in our society.

This is a serious matter. If we use research based on incomplete or biased data to plan health services, evaluate interventions or develop predictive models, it can have negative consequences. Just as people have the “right to be counted” in a census so that they are fairly represented in resource allocation decisions, from an ethical perspective, we need health research that includes and benefits the whole population.

With more and more health data becoming collected, and new methods like artificial intelligence that can turn big data into knowledge, it is essential that we learn more about which uses of unconsented health data have social licence and public support, and which do not.

Failing to do so could result in researchers losing public trust, support and funding for research related to vulnerable sub-populations that are underrepresented in consented datasets.

P. Alison Paprica, Assistant Professor, Institute for Health Policy, Management & Evaluation, University of Toronto; Kimberlyn McGrail, Professor of Health Services and Policy Research, University of British Columbia, and Michael J. Schull, Professor, Department of Medicine, University of Toronto

This article is republished from The Conversation under a Creative Commons license. Read the original article.