A number of important genome-wide association studies (GWASs) have come to my attention in the last few weeks. And I anticipate that the current steady stream of them will very soon become a roaring river. These are studies that sort through genomes of large numbers of people looking for systematic gene variation differences, say comparing genomes of people affected with a disease with genomes of people not so-affected. “Association analysis: A method of genetic analysis that compares the frequency of alleles between affected and unaffected individuals; a given allele is considered to be associated with the disease if that allele occurs at a significantly higher frequency among affected individuals(ref).” Association studies may also compare the genomes of specific samples of people (such as aged Ashkenazi Jews living in Brooklyn or older women from Tanegashima island) or the genomes of disease tissues (such as from specific kinds of cancers) against the general human genome to determine possibly causal correlations between genomic variations and effects, such as extended longevity or the presence of a disease.
I have already created a number of blog entries reporting on GWAS studies. My focus here is on the general characteristics of GWAS studies, why they are important, why we will see more and more on them, and where they will lead us.
GWAS studies and why they are important
A good example is discussed in the recent blog post New telomerase finding only a small-medium sized deal. The publication Common variants near TERC are associated with mean telomere length relates: “We conducted genome-wide association analyses of mean leukocyte telomere length in 2,917 individuals, with follow-up replication in 9,492 individuals. We identified an association with telomere length on 3q26 (rs12696304, combined P = 3.72 x 10(-14)) at a locus that includes TERC, which encodes the telomerase RNA component.” I go on in that post to comment that the study says that people who possessed the gene variation (minor allele of rs12696304) had shorter telomere lengths, equivalent to 3.6 years of aging. People who had two copies of the variation had telomere lengths expected for people 7.2 years older. The implication is that people with the gene defect age faster.“ The study required massive efforts to gather the data – mean leukocyte lengths of 2,917 plus 9,492 individuals. Then it required a herculean data processing and pattern-recognition process to end up with a correlation-based association of shorter telomere lengths with a minor allele of rs12696304 instead of millions of other possibilities. And, finally, from this association an inference was drawn that people who have the allele will generally age faster and die sooner.
Another representative 2009 GWAS relates gene polymorphisms to Alzheimer’s disease: Genome-wide association study identifies variants at CLU and PICALM. We undertook a two-stage genome-wide association study (GWAS) of Alzheimer’s disease (AD) involving over 16,000 individuals, the most powerful AD GWAS to date. In stage 1 (3,941 cases and 7,848 controls), we replicated the established association with the apolipoprotein E (APOE) locus (most significant SNP, rs2075650, P = 1.8 x 10(-157)) and observed genome-wide significant association with SNPs at two loci not previously associated with the disease –.”
The 2008 review study Genome-wide association studies for complex traits: consensus, uncertainty and challenges describes progress as of two years ago and highlights problems as seen at that time “The first wave of large-scale, high-density genome-wide association (GWA) studies has improved our understanding of the genetic basis of many complex traits. For several diseases, including type 1 and type 2 diabetes, inflammatory bowel disease, prostate cancer and breast cancer, there has been rapid expansion in the numbers of loci implicated in predisposition. For others, such as asthma, coronary heart disease and atrial fibrillation, fewer novel loci have been found, although opportunities for mechanistic insights are equally promising. Several common variants influencing important continuous traits, such as lipids, height and fat mass, have also been found. — These findings are providing valuable clues to the allelic architecture of complex traits in general. At the same time, many methodological and technical issues that are relevant to the successful prosecution of largescale association studies have been addressed. — However, despite understandable celebration of these achievements, sober reflection reveals many challenges ahead. — Much work remains to obtain a complete inventory of the variants at each locus that contribute to disease risk and to define the molecular mechanisms through which these variants operate. The ultimate objectives — full descriptions of the susceptibility architecture of major biomedical traits and translation of the findings into clinical practice — remain distant.” Much distance still remains but since this was written there has been a significant and steady acceleration in the rate of publication of genome-wide association studies
There are already hundreds of GWAS studies, each providing its own insights. A few more (ref) listed here for flavor are Genetic Determinants of Bone Fragility in European-American Premenopausal Women, Whole Genome Association Study of Visceral Adiposity in the HABC Study, CIDR: Genome Wide Association Study in Familial Parkinson Disease (PD), Collaborative Association Study of Psoriasis, Genome-Wide Association Study of Schizophrenia, Whole Genome Association Study of Systemic Lupus Erythematosus and Genome-Wide Association Study of Leprosy in Chinese Population.
Because of their importance, the National Human Genome Institute has created a Catalog of Published Genome-Wide Association Studies. “The curated, searchable and publically accessible database contains information on over 350 publication, linking around 1,640 single nucleotide polymorphisms (SNPs) to more than 80 different diseases and traits. — This catalogue allows some of the trends and genomic characteristics of trait or disease associated SNPs to be analysed across multiple different publications [Hindorff LA et al. (2009) PNAS doi/10.1073], leading to a number of important insights(ref).”
What is included in the catalog is selective “The genome-wide association study (GWAS) publications listed here include only those attempting to assay at least 100,000 single nucleotide polymorphisms (SNPs) in the initial stage. Publications are organized from most to least recent date of publication, indexing from online publication if available. Studies focusing only on candidate genes are excluded from this catalog. — SNP-trait associations listed here are limited to those with p-values < 1.0 x 10-5 (see full methods for additional details).”
One implication of the studies in the catalog is the critical importance of epigenetic mechanisms of gene regulation. As stated in a phg Foundation article on the catalog “ — the vast majority of genetic variation associated with complex diseases or traits lies outside of the coding regions of the genome – 45% of SNPs are located inside introns, which are located within genes but are spliced out prior to translation into functional proteins, and 43% of SNPs lie between genes. Whilst in some ways this result is unsurprising, as coding genes only account for around 1% of the genome, it is still unexpected and suggests that regulation of gene expression plays an important role in determining common traits and diseases.” The catalog shows other interesting patterns. “Interestingly, amongst those associations that have been attributed to specific genes (which are located near the trait or disease associated SNPs), 18 regions have been linked with multiple different diseases, suggesting a common underlying aetiological pathway. For example, the major histocompatibility complex (MHC), which plays an important role in the immune system, has been implicated in 10 different conditions ranging from autoimmune disorders to lung cancer. Discoveries of a shared underlying genetic basis for different diseases are likely to become increasingly common as more gene-disease associations are uncovered, and raise a complex set of ethical implications with regards to genetic testing(ref).
The 2009 publication Potential etiologic and functional implications of genome-wide association loci for human diseases and traits describes additional associations seen in the catalog. “This new online resource, together with bioinformatic predictions of the underlying functionality at trait/disease-associated loci, is well-suited to guide future investigations of the role of common variants in complex disease etiology.”
Association studies have provided the basis for construction of specific genomic-association databases like RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes. “The RegPrecise database — was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes that were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria.”
Along with the development of databases have been the development of research and computational tools. For example, the publication Platform for accurate semi-automatic inference of regulons by comparative genomics approach provides an approach to “providing effective tools to enable high-quality reconstruction of transcriptional regulatory networks (TRN).” – “We implemented a web-based computational platform for fast and accurate semi-automatic inference of regulons in well-populated groups of closely-related bacterial genomes.”
Why more and more GWAS studies?
There are likely to be more and more GWAS studies and they are likely to involve larger and larger population samples. Factors driving this growth are 1. Knowledge breeds a quest for more knowledge and studies can be built on earlier studies; for example the genome of gliablastoma cells is known(ref) facilitating GWA studies related to gliablastoma, 2. Underlying cost of genome sequencing continues to plummet making GWA studies ever-more economically feasible (see this recent blog post), 3. As more and more-studies are added to the catalog and complete databases like RegPrecise are built up, new studies can be partially based on them, 4. New and ever-better software tools are becoming available for identifying associations(ref)(ref), and 5 ever more-powerful and cheaper computers are allowing association computations which were virtually impossible a few years back when the human genome was first being sequenced. In other words, the factors which empower Giuliano’s Law are at work here and the rate of change is exponential, not linear.
Implications of GWAS studies
Going back to my blog post My personal longevity – the race between death-stalker and life-prolonger, watch out Death Stalker. The men and women doing genome-wide association studies are ultimately working for Life Prolonger, not for you. They are seriously on your case and what they are turning up is going to help convince you to give us lots more years in our life spans.