This project is maintained by dtortosa
After migrating out of Africa, human populations were exposed to a wide range of environmental conditions. One critical factor that differs dramatically among human populations inhabiting different parts of the world is temperature. Over millennia, exposure to varying temperatures could have modified the response to external temperature in these populations. In other words, this exposure could act as a selective pressure, shaping the metabolism of human populations and leading them to adapt to different temperature regimes. For instance, genetic adaptation to cold conditions might occur in a population that migrated to higher latitudes if that population contains genetic variants providing an advantage for coping with cold. In such a scenario, individuals carrying these advantageous variants would experience higher reproductive success, leading to an increased frequency of these cold-beneficial variants (i.e., variants are positively selected). This process leaves detectable signals in the genome, known as signatures of positive selection or adaptation. We can identify these signals using various statistical and machine learning methods. Consequently, the genomes of current populations serve as a record of past selective events experienced by those populations.
Possible migration waves out of Africa along with locations of major ancient human remains and archeological sites. Saioa López, Lucy van Dorp and Garrett Hellenthal - López, S., van Dorp, L., & Hellenthal, G. (2015). Human Dispersal Out of Africa: A Lasting Debate. Evolutionary Bioinformatics Online, 11(Suppl 2), 57–68. Link doi, link ncbi, CC BY 3.0.
The selective events previously described could also have clinical implications today. Considering that the mechanisms for coping with temperature changes involve energy production, they are closely related to energy balance. Consequently, past adaptive events that altered the ability of human populations to expend energy in heat production might contribute to differential risk of obesity between current populations.
ClimAHealth is dedicated to the search of signals of positive selection (i.e., adaptation) caused by climate across human populations. To achieve this, we employ methods capable of detecting signals left in the genome by recent and ancient selective events (spanning up to 120,000 years of human evolution). These methods are complemented by a diverse range of machine learning approaches, allowing us to rigorously test whether genes related to energy metabolism are enriched in positive selection. Our focus lies on genes implicated in functions related to energy production, thus being relevant for coping with temperature changes. An excess of adaptation signals within these genes would suggest the existence of past adaptation to temperature conditions. Furthermore, this would support the functional relevance of genetic variants within these energy metabolism genes. For a genetic variant to undergo positive selection, it should impact a trait that influences an individual’s ability to cope with external factors and contribute to reproductive success. Therefore, the presence of signals of adaptation in energy metabolism genes would support the functional relevance of their variants, i.e., their impact on human traits. Considering the connection of these genes with energy balance, they may also play a role in the genetic architecture of obesity. ClimAHealth will assess whether genes implicated in climate adaptation also tend to associate with obesity-related traits.
The Specific objectives of the project are the following:
Initially, ClimAHealth has focused on the detection of signatures left by selective events occurred recently, i.e., in the last 30,000 years of human evolution. We calculated the probability of adaptation across human coding genes using a well-known statistic for recent adaptation (the integrated haplotype score; iHS). We also quantified multiple genomic characteristics around these genes that should associate with adaptation if the latter was common enough in the first place. In other words, if adaptation was relatively frequent during recent human evolution, signals of adaptation should occur more frequently in genomic regions that are more likely to influence human phenotypes. For example, regions with a higher density of coding bases or regulatory elements are more likely to have an impact on the phenotype, and hence are more susceptible to be under positive selection. In contrast, if signals of adaptation are just false positives, then they should be randomly distributed across the human genome, not correlating with the characteristics of the genome.
We used these genomic features to predict the probability of recent adaptation (iHS) across the genome. To achieve that, we developed a novel machine learning method: Mixture Density Regressions (MDRs). This approach assumes that the distribution of iHS across the genome is more complex than the usual Gaussian distribution. Instead, it considers that iHS follows a mixture of two Gaussian distributions: one accounting for the influence of adaptation and other accounting for non-adaptive processes that can introduce noise in the search for positive selection. In addition, this approach can predict the probability of adaptation (iHS) using multiple predictors as input. In other words, we have been able to assess the influence of multiple genomic factors on adaptation at the same time.
The MDR approach was able to model relatively well the distribution of the recent adaptation statistic, iHS, across several human populations. Importantly, we found multiple genomic factors associated with the probability of adaptation. In other words, signals of recent adaptation were not randomly distributed across the genome. Instead, they tended to associate with factors that influence the probability of a genomic region to impact phenotypes, i.e., human traits. Therefore, it is unlikely that the selection signals detected are false positives. This was fully supported by the computer simulations we performed. We used computational tools to simulate human genomes in silico. We forced our MDR approach to encounter simulated genomes that resembled the human genome, but that were not exposed to positive selection. In other words, we reconstructed, through a computer simulation, the history of genomes of a human population that was not exposed to adaptive events. If our approach is only sensitive to adaptation, then it should not detect signals of positive selection in these simulated human genomes. This indeed was the case, supporting that the MDR approach is not misled by other factors resembling adaptation.
In the next step, we are considering not only recent but also older adaptation events, spanning up to 120,000 years of human evolution. To achieve this, we are leveraging an approach recently developed in the hosting lab of the outgoing phase of this project at the University of Arizona, namely, Flex-Sweep. This novel approach, based on deep learning, outperforms previous methods in detecting complex signals of adaptation, including very old signals. We use the probability of adaptation calculated by Flex-Sweep as input in our analyses. The data generated by this approach is more complex compared to the iHS statistic (used in the previous step). Consequently, we are currently developing a new modeling framework to handle it. This framework follows the same rationale as the MDRs, aiming to model the probability of adaptation across the genome. However, it extends this approach to much more complex distributions, making it flexible enough to analyze data generated by Flex-Sweep. Rigorously comparing multiple modeling approaches, we aim to select the best one for explaining the distribution of the adaptation probability.
In a preliminary analysis with one population, we have modeled the probability of adaptation predicted by Flex-Sweep with a relatively good fit, i.e., models had a good ability to predict the probability of adaptation across the genome. As in the previous step, our models found multiple factors associated with adaptation (recent and old in this case). This further suggests that adaptation has been relatively frequent in human populations. Importantly, one of the factors used as predictor in the models was the distance to genes related to brown adipose tissue (BAT). BAT is implicated in the oxidation of glucose and lipids along with the dissipation of heat. Therefore, this tissue has been possibly implicated in the adaptation of human populations to different temperatures. We have generated a list of genes related to BAT by selecting those closely related (in terms of protein-protein interactions) to Uncoupling Protein 1 (UCP1), which is a hallmark for BAT. We performed several validation analyses showing that this list of genes is strongly cohesive and indeed related to BAT. Notably, we found a higher probability of adaptation around BAT genes compared to the rest of the genome. This was independent of other genomic factors that influence the probability of finding signals of adaptation. Given the link between BAT and environmental temperature, the adaptation signals found in BAT genes provide preliminary evidence for the existence of climate adaptation in humans. These preliminary results also suggest that genetic variants in BAT genes are functionally relevant, as signals of adaptation should occur in genomic regions impacting human phenotypes. Finally, the connection between thermoregulation and energy balance suggests that genes implicated in thermal adaptation could also have an impact on obesity. We have already started to test the association between obesity-related traits and genes implicated in energy metabolism, finding instances of such association in some cases.
ClimAHealth is advancing the analysis of adaptation signals beyond the state of the art. Previous work in this field has relied on classical correlation and regression approaches, which do not align well with the distribution of adaptation statistics. Leveraging a novel modeling framework, we analyze complex adaptation statistics, considering the impact of multiple factors on the probability of adaptation.
The heterogeneity of the human genome has greatly complicated the search for recent adaptation signals, resulting in a controversial topic. However, our innovative modeling approaches have provided robust evidence indicating relatively frequent adaptation in recent evolutionary times. As a result, ClimAHealth is contributing to answering critical questions in human evolutionary biology. It is also opening new avenues of research by applying novel modeling techniques to model positive selection.
This project is funded by the European Union’s Horizon 2020 research and innovation programme (Marie Skłodowska-Curie Actions; grant agreement number 101030971).