Innovation and Big Data in Health Surveillance
Fernanda Dórea, National Veterinary Institute, firstname.lastname@example.org
Crawford Revie, University of Prince Edward Island
This workshop will discuss the priorities and challenges related to building the health surveillance systems of the future.
In an era of ubiquitous electronic collection of data, there is growing opportunity to monitor the health of populations in real-time. This will allow the early detection of signs of disease introduction, natural or man-made, as well as the production of information to support prevention and control. Teaching computers to transform data into actionable information allows us to overcome the challenge of data volume and velocity. This is however conditional on our ability to overcome greater challenges, such as processing a large variety of (noisy) data, interfacing with medical knowledge, and producing valuable outputs for actors in different contexts. Moreover, effective disease prevention and control needs to be operationalized, taking into consideration all the populations that affect and are affected by disease spread.
To ensure that big data innovation contributes to population health, it needs to be incorporated into governments’ routine decision-making processes, building a framework of data-driven surveillance in a One Health context.
The challenges abound: from technical difficulties related to extracting information from Big Data and privacy and ethical issues, to governmental and organizational barriers and funding challenges. Who pays when everybody benefits, but nobody profits?
This workshop will identify and prioritize the challenges related to the use of innovation to achieve data-driven decision-making in population health. Workshop participants will have a chance to discuss the challenges, and prioritize them using objective measures, such as opportunities for improvement and impact.
How do we build the surveillance systems of the future? What are the priority challenges to address in order to achieve the following aims:
• Develop systems capable of transforming data into actionable information for One Health Surveillance (technical challenges)
• Translate research into practice, and combine new and traditional methods in order to incorporate innovation into the daily practice of disease prevention, detection and control (operational challenges)?
• Address cultural and ethical norms and gain public trust to develop data-driven systems (normative challenges)
• Deliver innovation to the right actors in public, animal and environmental health, taking into account funding and organizational barriers in the public sector, including legislation (public good predicament)?
Health surveillance in real time
In an era of ubiquitous electronic collection of data, there is growing opportunity to monitor the health of populations in real-time.
The systematic collection of data from a defined population, with the specific aim of taking ac-tions to mitigate risks and improve well-being, is the goal of health surveillance (1). Public and animal surveillance are concerned with strategies to improve the health of people and animals respectively. In a One Health context, surveillance is carried out in a holistic framework that involves animals, humans and the environment.
Enabling computers to transform data into actionable information allows us to overcome the challenge of data volume and velocity that characterize today’s growing flow of digital data, coined “Big Data” , as defined by US NIH’s program Big Data to Knowledge, BD2K (6). This will allow early detection of signs of disease introduction, natural or man-made, as well as produce information to support prevention and control.
Big data, however, is not just about volume. Extracting value from the complexity and diversity of the data sources available is conditional on overcoming greater challenges, such as processing a large variety of (noisy) data, interfacing with medical knowledge, and producing valuable outputs for actors in different contexts. Moreover, effective disease prevention and control needs to be operationalized taking into consideration all populations which affect and are affected by disease spread. Developing effective systems to convert single data streams into information is only part of the challenge. To support decision-making and action, the ultimate challenge is combining evidence from multiple sources.
Research and development in the field of Big Data have brought innovation to all stages of the surveillance continuum, from data collection, through data analysis, to communication and infor-mation sharing. How do we use this innovation to build the surveillance systems of the future?
Capturing data and producing information for health surveillance is difficult!
In going about our daily lives, we leave a great number of digital footprints. Analyzed at the population level, data concerning people’s mobility, preferences and actions can give great insights about the distribution of health and disease, and even create a “riskcape – maps of the distribution of risk in space” (2). But these small pieces of data are usually disconnected, and a large number of them are not directly related to health. Combining these pieces of evidence is, therefore, not a straightforward matter.
Algorithms need to be trained on a wide variety of data, most of which are not collected for sur-veillance. Relationships and interactions crucial for risk characterization and quantification need to be identified (2), but data are rarely linked and system interoperability remains a big challenge. The technology needed to overcome these issues needs not only to be powerful, but also dynamic (3). The underlying data are constantly changing, as are the goals of surveillance to keep up with evolving populations and pathogens.
Capturing data and producing actionable information for health surveillance is difficult!
Innovation comes fast in big data, and technology is already available to address many of the technical challenges. The issue of access to innovation by governments is discussed below, but incorporation of these solutions into surveillance practice goes beyond technology access.
“Big Data’s strength is in finding associations, not in showing whether these associations
have meaning. Finding a signal is only the first step” (4). How do we validate and investigate these signals? Beyond validity, how do we assess utility? What routines need to be established in order to allow the decision-making process to incorporate this information – in order to allow it to trigger action? There seems to be a consensus that modern information systems need to be combined with traditional epidemiological systems, but there is no straightforward recipe for that.
Sustainability is another important issue to consider. Hay et al. discussed the need to demon-strate the feasibility and sustainability of audi-ence engagement for Web-based technologies such as HealthMap, www.healthmap.org (5). Sustainability of the tools themselves should also be considered, as they depend on collaborations across many disciplines and sectors.
Capturing data and producing actionable information for health surveillance is controversial!
Moving away from population averages, and developing systems capable of processing persona-lized information, has great societal benefits in terms of health control (3). However, public perception of whether the benefits outweigh their privacy concerns will, as Heitmueller et al. stated, “set boundaries on the usage of big data”. The author describes a ‘crisis of confidence’ in the way that personal information and behaviour data are being used, pointing out that although a very large proportion of customers use store loyalty cards, medical information is a far more sensitive issue, and concluding that “views about personal health information are more complex than views about other data”.
Different levels of access to different types of data may need to be considered. A clear ethical framework needs to be established, with norms that set the data usage boundaries. Communi-cation is also essential, as such a framework will not be enough unless there is public trust in the protection of their data confidentiality.
Capturing data and producing actionable information for health surveillance is expensive!
The data privacy concerns discussed above remind us that data is not always seen as a public good. Monetary and non-monetary incentives are probably needed for individuals and organi-zations to share data (3).
In discussing the sustainability of big data opportunities for surveillance, Hay et al. stated that the “ultimate vision is to democratize the plat-form by providing the code to all interested authorities”. As we have argued in the operational challenges however, operationalizing a system is more than generating information. To operate data-driven surveillance systems, governments need to employ multi-disciplinary teams capable of using the tools, communicating results to epidemiologists, and supporting them in the process of combining this information with evi-dence gathered through traditional surveillance methods.
Moreover, to make big data innovation a public good, governments need to balance other pop-ulation interests. Discussing the development of public policies for the use of big data, Heitmueller et al. summarized the issue as two main trade-offs: “The first trade-off concerns the role of government in simultaneously protecting people’s privacy and taking advantage of the benefits of large data sets […]. The second trade-off is related to the tension between realizing the societal benefits involved in sharing data and safeguarding proprietary rights.”
To add to the complexity of the debate, the solutions will probably need to be tailored to individual countries, as the resources and needs of different nations need to be taken into account. It is beyond the scope of this workshop to debate resource access in countries at different development stages.
All challenges considered, Heitmueller et al. suggest that governments should focus on a few main tasks, such as: building support for data sharing; building an evidence base, gathering systematic and robust evidence of how data sharing has resulted in benefits in health care; establishing open data commons of anonymized data; creating demand and capability; and creating trust networks. Legislating wisely to address these tasks, however, is not a straightforward task, as big data evolves fast, and policy makers must deal with great uncertainties regarding the future risks and benefits.
Interconnecting the perspectives on data-driven surveillance
How do we build the surveillance systems of the future? How can we implement systems capable of transforming data into actionable information for One Health Surveillance?
The workshop aims to connect those facing the problem, mainly governmental surveillance officials, with researchers and developers in academia and the private sector, who are most often the drivers of big data innovation. Different sectors – private, public and academia – may have different perceptions regarding the challenges to achieve data-driven surveillance. They may also have different access to solutions. Engaging actors from these sectors in a joint discussion will allow us to share these solutions, and understand how they can be combined to address existing challenges.
Considering the areas listed previously – technical, operational, normative, and the public good predicament – participants will, in a first phase of the workshop, brainstorm any further chal-lenges not already listed in this report.
Listing and prioritizing the challenges
In a second phase, participants will have a chance to prioritize all challenges based on two criteria: how easy it is to solve the challenge; how big is the potential impact to improve population health. In the process, we will take notes on discussed solutions. A full characterization of possible solutions may demand specific discussions and studies that are beyond the goal of this workshop. We expect, however, that the workshop will serve as a networking exercise, opening opportunities for collaboration and new research and development ideas.
The outcome of the workshop will be a prioritized list of challenges. This list will be informative to allow the tackling of readily solvable issues, and guide research, development and implementation of new solutions based on likely impact to improve disease prevention, detection and control in a One Health context. The outcome will help public health officials in the adoption of innovation, and in the operationalization of data-driven health surveillance systems.
- Hoinville, L.J., Alban, L., Drewe, J.A., Gibbens, J.C., Gustafson, L., Häsler, B., Saegerman, C., Salman, M., Stärk, K.D.C. 2013. Proposed terms and concepts for describing and evaluating animal-health surveillance systems. Prev. Vet. Med. 112, 1–12. doi:10.1016/j.pre vetmed.2013.06.006
- Han, B.A., Drake, J.M. 2016. Future directions in analytics for infectious disease intelligence. EMBO Rep. 17, 785–789. doi:10.15252/embr.201642534
- Heitmueller, A., Henderson, S., Warburton, W., Elmagarmid, A., Pentland, A.S., Darzi, A. 2014. Devel-oping Public Policy To Advance The Use Of Big Data In Health Care. Health Aff. 33, 1523–1530. doi:10.1377/hlthaff.2014.0771
- Khoury, M.J., Ioannidis, J.P.A. 2014. Medicine. Big data meets public health. Science 346, 1054–5. doi:10.1126/science.aaa2709
- Hay, S.I., George, D.B., Moyes, C.L., Brownstein, J.S., Flaxman, A. 2013. Big Data Opportunities for Global Infectious Disease Surveillance. PLoS Med. 10, e1001413. doi:10.1371/journal.pmed.1001413
- NIH, Big Data to Knowledge Program, https://datascience.nih.gov/ bd2k/about/what