In 2015, a group of researchers hypothesized that our collective love of Facebook surveys could be harnessed for serious genetic studies. Today, the Genes for Good project (@genesforgood) has engaged more than 80,000 Facebook users, collected 27,000 DNA spit-kits, and amassed a trove of health survey data on a more diverse group of participants than has previously been possible. Researchers say their app could work as a model for studies on an even larger scale. Their work appears June 13 in The American Journal of Human Genetics.
“It’s a very important step to allow participation remotely, because it opens the door to a lot of people who historically couldn’t participate in genetic research, even if they had wanted to,” says Katharine Brieger, a first author and MD/Ph.D. student at the University of Michigan School of Public Health. “And having a more diverse population represented in study samples is critical for moving public health and genetic research forward.”
“When I started doing genetic studies in the 90s, most studies just had a few hundred people,” says senior author Goncalo Abecasis, of the University of Michigan School of Public Health. Typically, people in the area would show up to a university lab to answer health surveys and give a blood sample. After that, researchers had a very costly and difficult time following up with those volunteers.
“You quickly got to a point where you exhausted what you could learn from those participants,” Abecasis says. That experience inspired him and his colleagues to start thinking about how to use social media to expand and improve upon their research. The result was Genes for Good’s approach: in exchange for answering surveys, participants receive a free in-home DNA spit-kit, analysis of their ancestry and DNA results, graphs and comparisons of their data, and (if requested) a file of their raw genotype information.
The researchers explored whether the study participants were a good representation of the U.S. population. Using government statistics as a comparison, they found the volunteers had similar disease rates and demographics as the rest of the country—although they were a little younger, had slightly fewer strokes, and skewed female. The participants also had diverse ancestry and were diverse geographically and economically. Most Genes for Good participants fell into the U.S. middle household income bracket of $35,000 to $100,000 a year. In contrast, most 23andMe users have a household income of more than $100,000 a year, according to a poster 23andMe presented in 2011 describing their research cohort demographics.
The researchers then analyzed the genetic data to assess the quality of the study and its data collection methods. Previous studies have identified genetic variants linked to physical traits, such as eye color or skin tone, and to health conditions such as asthma. When researchers compared results from their Genes for Good analyses to those from well-cited papers, they largely matched.
“We were quite pleased with our ability to replicate the findings of other large studies,” says Brieger. “For example, in our sample, we were able to identify previously reported associations between specific genetic variants and traits such as BMI, as well as conditions such as type 1 and type 2 diabetes.”
Genes for Good launched in 2015, and the number of participants quickly grew from a trickle to a deluge. “We get something like 2 or 3 percent growth every week, which corresponds to tripling every year,” says Abecasis. The growth was so massive that researchers have had to pause the distribution of DNA spit-kits until they find more funding. With the proper funding, the researchers say Genes for Good can scale up to reach millions of users.
Social media dramatically increased Genes for Good recruitment, but it also created new privacy concerns. By design, the app only works as a portal on Facebook to connect users to Genes for Good servers. Genes for Good adheres to the University of Michigan’s privacy standards and is subject to the university’s white hat security tests. To access their raw genome data, participants go through two-factor authentication and must retrieve their data within three days. The researchers went further and asked the National Institutes of Health (NIH) for additional layers of privacy protection.
“We have a certificate of confidentiality, meaning we have a promise from the NIH that our data will not be used by the government,” says Abecasis.
The researchers noticed that the participants were more forthright online in answering personal health questions than volunteers typically are in face-to-face interviews. They think connecting with