A rise in the popularity of DNA testing, offered by sites like 23andMe and Ancestry.com, could cause a privacy crisis for Americans.
According to a new study published in the journal Science, in a few years, nearly 90 percent of all Americans of European descent will be identifiable by their DNA and just a few personal details. Alarmingly, you don't even have to take said DNA tests yourself — you share enough DNA with your relatives that you can easily be outed with their info. It is the equivalent of having your privacy compromised just because a stranger or family member chose to compromise theirs.
Researchers for the study analyzed the DNA data of 1.28 million people in the database of MyHeritage, one the popular direct-to-consumer (DTC) genetic testing services that includes 23&Me and Ancestry.com. In these genetic testing kits, a customer receives a saliva kit, spits in a tube, and sends the sample back to the company's lab. Using genotyping, researchers are able to assemble a comprehensive report on a person’s genetic makeup, and connect customers with those who share similar DNA. And therein lies the privacy problem.
“Take home message: Your DNA can identify you whether you took or not a DTC test,” Yaniv Erlich, the lead author of the study, said on Twitter. “But with policies and technical measures, the public can continue benefiting from the DTC genomics revolution while reducing the risk of misuse.”
In the study, researchers discovered that 60 percent of Americans of European descent had a "match" with a third cousin or someone closely related, which “can allow their identification using demographic identifiers.” In other words, if you’re a white American and your relatives have taken a test and sent their DNA to genealogy databases to a third party like GEDmatch, these scientists won’t have a hard time identifying you — and the likelihood of that will only increase as more people take these tests.
Consumers often opt to upload their information to the third-party sites with good intentions such as to find long-lost relatives, which is part of the attraction of at-home genetic tests.
The ability to "out" one's genetic privacy vis-a-vis relatives was exploited by police enforcement recently. Decades after the “Golden Gate Killer” raped and killed dozens of women, law enforcement used genetic data from DTC databases to identify and charge Joseph James DeAngelo Jr., 72, a former police officer, with eight counts of murder. As the study explained:
Law enforcement used a long-range familial search to trace the Golden State Killer. Investigators generated a genome-wide profile of the perpetrator from a crime scene sample and uploaded the profile to GEDmatch ~1 million DNA profiles. The GEDmatch search identified a 3rd-degree cousin. Extensive genealogical data traced the identity of the perpetrator, which was confirmed by a standard DNA test.
Police said it was persistence and “emerging” technology that made the arrest possible.
Yet the study’s mission aims to raise awareness about future legal problems rather than humble-brag about its potential to solve crimes.
“Taken together, we posit that our results warrant a reevaluation of the status quo regarding the identifiability of DNA data, especially of US individuals,” the authors stated. “While policymakers and the general public may be in favor of such enhanced forensic capabilities for solving crimes, it relies on databases and services that are open to everyone. Thus, the same technique could also be exploited for harmful purposes, such as re-identification of research subjects from their genetic data.”
The study also study suggested those with “anonymized” data can be easily identified, thanks to a loophole in Health Insurance Portability and Accountability Act of 1996 (HIPAA).
“We also considered a scenario of re-identification of anonymized clinical genetic data. The safe harbor provisions of the HIPAA privacy law permits the release of the year of birth,,” the authors stated. “An age specified at a single year resolution is as expected a more powerful identifier compared to a 10yr interval (Fig. 2D). Together with geography (<100miles) and sex, it is expected to reduce the search space to just 1-2 individuals.”
Graham Coop, a population genetics researcher at the University of California, Davis, wrote a blog post after the Golden Gate Killer arrest in April, and touched on the same disturbing discussion points as stated in the Science study.
“It’s striking that uploading one’s information to a matching database potentially opens up a large number of other people to eventual identification, and that most of these people are distant enough relatives that one would likely never have met them,” Coop wrote.
While this is certainly alarming, the authors of the study are optimistic that policies can restore privacy protections.
“Overall, we believe that technical measures, clear policies for law enforcement in using long-range familial searches, and respecting the autonomy of participants in genetic studies are necessary components for long term sustainability of the genomics ecosystem,” the authors concluded.
Shares