Over the last few years, there has been a significant growth in interest regarding genomic research and data. Genomic data is an instrumental source of information that has vastly enhanced biomedical research, personalized medicine and drug development. However, there haven’t been many successful methods to de-identify genomes to allow them to be used and disclosed for secondary purposes. For instance, the Health Insurance Portability and Accountability Act (HIPAA) de-identification standards are difficult to apply in a meaningful manner. This is because health information that does not specifically identify an individual is inherently not “identifiable” information.
Organizations generally adhere to a risk-based approach and use administrative, physical and virtual means as a way of maintaining identity-disclosure control when sharing genomic information. This could possibly change as advancements are made in privacy-protected approaches that allow secure computation on genomic data.
Challenges in Genomic Research
The technological requirements of genome sequencing should always be taken into consideration when it comes to genomic research. There are four distinctive subsets with completely different standards and trajectories: The collection, storage, dispersal and evaluation of significant datasets. These datasets demand substantial and unique technological solutions that most of us are not yet accustomed to, such as gene editors and single-cell biology. Still, beyond the common technical problems come the truly burning questions regarding data privacy and security: Who has authorized access to data, and how do researchers, businesses or governments guarantee that genomic data is safe from unauthorized entities? Answers to such questions remain to be seen.
Acquiring, Storing and Sharing Genomic Data
Because a human genome consists of roughly 100 gigabytes of storage space, storage will need to expand from gigabytes to petabytes and eventually exabytes. And for every 3 billion bases of the human genome sequence, 30-fold more data needs to be acquired due to errors in sequencing and configuration. Many researchers expect cloud computing to be the primary storage method with the capacity to provide the expandable scale required for genomic sequencing. In fact, tech titans Google and Microsoft have launched cloud-computing services that allow organizations to store genomic data.
Many organizations use encryption, decryption and data platforms to combat data breaches and hacks. They seem to be effective thus far, as most data breaches have only affected passwords and profiles, not genomic data. But if they did, it would be disastrous.
While researchers use anonymized data, when the DNA of a million people is acquired, it will become possible to generalize anyone’s individuality from the genetic data. The issue is not only data safety, but also privacy and the data management procedures used by entities with access to sizeable genomic datasets, such as direct-to-consumer organizations.
Genomic data represents the most intimate information about the past, present and future of a person. Therefore, the importance of setting the stage to keep this extremely sensitive and burgeoning data secure is unmistakable.