The emerging field of computational genomics, which uses statistical analysis to unpack the plethora of information harboured inside the human genome, is complicated. The sheer amount of data that comprises the human genome is massive. Meanwhile, the pressure is high: With more people turning to their genes for answers to medical questions, the genomics community faces the seemingly-impossible challenge of cataloguing the world’s genetic information into practical and accessible systems. Enter CanDIG.
The Canadian Distributed Infrastructure for Genomics (CanDIG) is a multi-institutional research cohort that has begun the lengthy process of creating an infrastructure for genomic analysis at the national level. Their mandate is to make genomic data user-friendly through a combination of cutting-edge computational methods and a focus on the public policy of genetic research.
The Canadian medical data sharing system is already stretched to its limits. Hospitals and research centres, which are the current gatekeepers of Canadians’ genetic information, function independently without the ability to share data. The CanDIG team, which brings together geneticists and policy makers from McGill and other institutions across Canada, wants to become a central hub for genomics data collection, using software to relay information in the form of open queries.
“The objective is to connect these vast databases through a patchwork of successful systems across various networks,” Guillaume Bourque, one of the designers of CanDIG and associate professor in McGill’s Department of Human Genetics, said.
The technical challenges of unifying national access to genetic data are only matched by the troubling implications of the social experiment that is big data in the 21st century.
“Politically, it is challenging,” Yann Joly, a member of the CanDIG team and associate professor in the Department of Human Genetics as well as the Bioethics unit, said. “When you have these projects where many hospitals and provinces are collaborating, many of them want to retain control of their patients and their data. There is a real reluctance to release that control.”
The privacy of Canadians who are willing to share their genetic information is a top concern for the CanDIG leadership.
“The point is to make research possible,” Joly said. “At the same time, we must protect data so that there is no identifying information that leads back to the person who supplied it.”
In the wake of large data breaches such as the Facebook-Cambridge Analytica data scandal, public concern over the safety of personal data is growing. Genomic data, which can be linked back to a person through their DNA, presents a sizeable security risk.
“[What] we don’t want is [for] people to lose trust in AI and become unwilling to share their data,” Joly said. “For these things to work and be effective, you need to have thousands of genomes and, with that, the trust of people who rely on the infrastructure.”
Looking forward, CanDIG is a partnering with the Common Infrastructure for National Cohorts in Europe, Canada, and Africa (CINECA), an intercontinental data collaboration project attempting to connect systems such as CanDIG to partners in the European Union and Africa.
“We are trying to develop and contribute to global standards of genomic data collection and use,” Bourque said.
The Canadians face further challenges while trying to integrate into European regulations, which have become increasingly stringent in recent years.
“For everything to interoperate, you need some type of standards,” Bourque said. “To create standards, you need everyone to agree.”
New European data laws risk strong-arming Canada and other nations into compliance, or else risk any prospects of integration.