Category Archives: Genetics

UW initiative aims to bring together social sciences and genetics – Wisbusiness.com

Integrating the fields of genetics and social science is putting us on the right track for understanding the world better, a UW-Madison expert says.

The universitys La Follette School of Public Affairs yesterday held a panel discussion on its Initiative in Social Genomics, which aims to bring together these disciplines to explore how genetics are connected to behavior, socio-economic outcomes and other factors.

Jason Fletcher, a professor of public affairs with the school, underlined the complexity involved with combining two nuanced areas of study into one discipline. Still, conducting research focused on just one while excluding the other fails to recognize that both genetics and social factors interact with one another, he said.

By focusing on your one domain, youre not including all of the relevant factors, he said, noting that only in the past two decades or so has information from both fields been combined into the same data structures.

He said the university is making major investments into training more people to wrangle this firehose of data to conduct meaningful social genomics research.

Because it is so complicated, the solutions so far have not been obvious, theyve required a lot of work, he said. And were not there. We dont have the solutions yet, but I think thats the enterprise here, is that we need collaborations to build this bridge where both sides are building at the same time, and coming together.

Lauren Schmitz, an assistant professor of public affairs with the school, said that ever since the human genome was first fully mapped in 2003, we have many more questions than answers about what makes us tick. She noted rapid advancements in computing and genome sequencing have led to a flood of new genetic data that scientists are still working to understand.

In part, sequencing the human genome wasnt the silver bullet humanity hoped for, because we realized that we cant study the human genome in isolation, she said yesterday. If we want to gain a better understanding of how genetic diversity shapes who we are, we need to understand and get outside the lab, to study genetic diversity and our genes in the wild.

Conditions of work, environmental factors and even economic trends also powerfully shape our life outcomes, she noted.

Her own research, focused on aging and longevity, explores how social conditions and disadvantages affect biological age. She said scientists can now calculate biological age quite accurately based on analysis of epigenetics, or how various factors affect gene expression. With just a blood sample, they can calculate how life circumstances are accelerating or slowing down the aging process.

This scientific explosion of data is really allowing us to see the impacts of public policy on the cellular level, she said.

In a 2022 study focused on the Great Depression, Schmitz sought to understand how this period of economic hardship affected biological aging.

What we found is that individuals who were in utero, who were in the womb during the Great Depression, were aging faster decades later, she said. And so here we see this really important connection between early life conditions and late-life aging.

Watch the video.

Excerpt from:
UW initiative aims to bring together social sciences and genetics - Wisbusiness.com

Women have a higher genetic risk for PTSD, according to study by VCU and Swedish researchers – VCU News

By Olivia Trani

Women are twice as likely as men to develop post-traumatic stress disorder, but the factors contributing to this disparity have largely remained unsettled. A research team led by Virginia Commonwealth University and Lund University in Sweden conducted the largest twin-sibling study of PTSD to date to shed light on how genetics may play a role. Their results, published Tuesday in theAmerican Journal of Psychiatry, are the first to demonstrate that women have a higher genetic risk for the disorder compared with men.

By analyzing health data from over 16,000 twin pairs and 376,000 sibling pairs, the research team discovered that heritability for PTSD was 7 percentage points higher in women (35.4%) than in men (28.6%). They also found evidence that the genes that make up the heritable risk for PTSD vary between the two sexes.

The researchers say their findings could inform strategies for PTSD prevention and intervention following a traumatic event, as well as help address stigmas related to womens mental health.

Women are at higher risk for developing PTSD than men, even when controlling for the type of trauma, income level, social support and other environmental factors. Some of the theories as to why that is have frankly been unkind to women, such as attributing the sex difference to a weakness or lack of ability to cope, saidAnanda B. Amstadter, Ph.D., a professor in theVCU School of Medicinesdepartments ofPsychiatryandHuman and Molecular Geneticsand lead author of the study. I think this study can help move the narrative that people can have an inherited biological risk for PTSD, and that this genetic risk is greater in women.

Nearly 70% of the global population are exposed to at least one traumatic event in their lifetime, such as physical or sexual assault, a motor vehicle accident, exposure to combat or a natural disaster. About 6% of those who are exposed to trauma develop PTSD.Amstadters research focuses on understanding the conditions that might increase or decrease a persons risk of experiencing PTSD, particularly how a persons genes impact their risk.

If you think of risk for PTSD like a pie chart, were trying to better understand what factors make up the pieces of this pie, she said. Some of the risk is influenced by a persons environment, such as the experiences they have while growing up. On the other hand, some of the risk will be influenced by the genes they inherit from their parents.

Previous research has looked into how genes influence the likelihood of developing PTSD, but the study conducted by Amstadter and her colleagues is the first of its kind to investigate how genetic risk varies by sex.

For this project, the research team examined anonymized clinical data from Swedish population-based registries. Their analysis consisted of more than 400,000 pairs of twins or siblings born up to two years apart in Sweden between 1955 and 1980. Studies on twins and siblings, because of their genetic similarities, can help researchers determine how a persons genes influence their risk for mental illnesses.

Every time a person within this age group interacts with Swedens health care system, whether thats visiting their primary care doctor, filling a prescription or going to the hospital, that information is recorded in their national registries. This kind of data is a really powerful tool for addressing questions related to genetic risk for medical conditions, Amstadter said. Prior PTSD studies involving twins and siblings have typically only included a few thousand individuals. Because our sample size was so large in comparison, we were able to make calculations with a higher degree of certainty.

Through statistical modeling, the researchers calculated how much a persons genetic makeup influenced their likelihood of developing PTSD following a traumatic event. In finding that PTSD was 35.4% heritable in women but only 28.6% heritable in men, they demonstrated that women have a higher biological risk for PTSD.

Their models also revealed that the genes associated with PTSD were highly correlated (0.81) but not entirely the same between men and women. This suggests that the genetic underpinnings of sex hormones, like testosterone, estrogen and progesterone, may be involved in the development of PTSD. The research team is collaborating with the Psychiatric Genomics Consortium to identify the molecular genetic variants that may contribute to sex-specific pathways of risk.

Amstadter conducted the research at theVirginia Institute for Psychiatric and Behavioral Geneticsat VCU alongside co-authors Shannon Cusack, Ph.D., a postdoctoral scholar; and Kenneth Kendler, M.D., the institutes director, professor of psychiatry and eminent scholar. They collaborated with Lund University co-authors Sara Lnn, Ph.D.; Jan Sundquist, M.D., Ph.D.; and Kristina Sundquist, M.D., Ph.D.

Subscribe to VCU News at newsletter.vcu.edu and receive a selection of stories, videos, photos, news clips and event listings in your inbox.

Read the rest here:
Women have a higher genetic risk for PTSD, according to study by VCU and Swedish researchers - VCU News

Genetics study points to potential treatments for restless leg syndrome – University of Cambridge news

Restless leg syndrome can cause an unpleasant crawling sensation in the legs and an overwhelming urge to move them. Some people experience the symptoms only occasionally, while others get symptoms every day. Symptoms are usually worse in the evening or at night-time and can severely impair sleep.

Despite the condition being relatively common up to one in 10 older adults experience symptoms, while 2-3% are severely affected and seek medical help little is known about its causes. People with restless leg syndrome often have other conditions, such as depression or anxiety, cardiovascular disorders, hypertension, and diabetes, but the reason why is not known.

Previous studies had identified 22 genetic risk loci that is, regions of our genome that contain changes associated with increased risk of developing the condition. But there are still no known biomarkers such as genetic signatures that could be used to objectively diagnose the condition.

To explore the condition further, an international team led by researchers at the Helmholtz Munich Institute of Neurogenomics, Institute of Human Genetics of the Technical University of Munich (TUM) and the University of Cambridge pooled and analysed data from three genome-wide association studies. These studies compared the DNA of patients and healthy controls to look for differences more commonly found in those with restless leg syndrome. By combining the data, the team was able to create a powerful dataset with more than 100,000 patients and over 1.5 million unaffected controls.

The results of the study are published today in Nature Genetics.

Co-author Dr Steven Bell from the University of Cambridge said: This study is the largest of its kind into this common but poorly understood condition. By understanding the genetic basis of restless leg syndrome, we hope to find better ways to manage and treat it, potentially improving the lives of many millions of people affected worldwide.

The team identified over 140 new genetic risk loci, increasing the number known eight-fold to 164, including three on the X chromosome. The researchers found no strong genetic differences between men and women, despite the condition being twice as common in women as it is men this suggests that a complex interaction of genetics and the environment (including hormones) may explain the gender differences we observe in real life.

Two of the genetic differences identified by the team involve genes known as glutamate receptors 1 and 4 respectively, which are important for nerve and brain function. These could potentially be targeted by existing drugs, such as anticonvulsants like perampanel and lamotrigine, or used to develop new drugs. Early trials have already shown positive responses to these drugs in patients with restless leg syndrome.

The researchers say it would be possible to use basic information like age, sex, and genetic markers to accurately rank who is more likely to have severe restless leg syndrome in nine cases out of ten.

To understand how restless leg syndrome might affect overall health, the researchers used a technique called Mendelian randomisation. This uses genetic information to examine cause-and-effect relationships. It revealed that the syndrome increases the risk of developing diabetes.

Although low levels of iron in the blood are thought to trigger restless leg syndrome because they can lead to a fall in the neurotransmitter dopamine the researchers did not find strong genetic links to iron metabolism. However, they say they cannot completely rule it out as a risk factor.

Professor Juliane Winkelmann from TUM, one of senior authors of the study, said: For the first time, we have achieved the ability to predict restless leg syndrome risk. It has been a long journey, but now we are empowered to not only treat but even prevent the onset of this condition in our patients.

Professor Emanuele Di Angelantonio, a co-author of the study and Director of the NIHR and NHS Blood and Transplant-funded Research Unit in Blood Donor Health and Behaviour, added: "Given that low iron levels are thought to trigger restless leg syndrome, we were surprised to find no strong genetic links to iron metabolism in our study.It may be that the relationship is more complex than we initially thought, and further work is required."

The dataset included the INTERVAL study of Englands blood donors in collaboration with NHS Blood and Transplant.

A full list of funders can be found in the study paper.

Reference Schormair et al. Genome-wide meta-analyses of restless legs syndrome yield insights into genetic architecture, disease biology, and risk prediction. Nature Genetics; 5 June 2024; DOI: 10.1038/s41588-024-01763-1

See more here:
Genetics study points to potential treatments for restless leg syndrome - University of Cambridge news

Genetic association mapping leveraging Gaussian processes | Journal of Human Genetics – Nature.com

Gaussian Process (GP)

Gaussian Process (GP) is a type of stochastic processes, whose application in the machine learning field enables us to infer a nonlinear function f(x) over a continuous domain x (e.g., time and space). Precisely, f(x) is a draw from a GP, if {f(x1), , f(xN)} follows a N-dimensional multivariate normal distribution for the N input data points ({{{x}_{i}}}_{i = 1}^{N}). Let us denote (X={({x}_{1},ldots ,{x}_{N})}^{top }) and (f={(f({x}_{1}),ldots ,f({x}_{N}))}^{top }), a GP is formally written as

$$f sim {{{{{{{mathcal{N}}}}}}}}(m(X),k(X,X)),$$

where m() denotes the mean function and k(,) denotes the kernel function [11]. The simplest kernel function would be the linear kernel, such that k(X, X)=2XX, while the automatic relevance determination squared exponential (ARD-SE) kernel is defined as

$$k({x}_{j},{x}_{k})={sigma }^{2}exp left[-{sum }_{q=1}^{Q}frac{{({x}_{jq}-{x}_{kq})}^{2}}{2{rho }_{q}}right]$$

for the (j, k) element of k(X, X), where ({x}_{j},{x}_{k}in {{mathbb{R}}}^{Q}) are Q-dimensional input vectors. Here 2 is the kernel variance parameter and (rho ={({rho }_{1},ldots ,{rho }_{Q})}^{top }) is the vector of characteristic length scales, whose inverse determines the relevance of each element of the input vector. Typically, the mean function is defined as m(X)=0.

Because the GP yielding f(x) has various useful properties inherited from the normal distribution, GP can be used to estimate a nonlinear function f(X) from output data (y={({y}_{1},ldots ,{y}_{N})}^{top }) along continuous factor X. The extended linear model y=f(X)+ is referred to as the GP regression and widely used in the machine learning framework [12]. This model can be used to map dynamic genetic associations for normalized gene expression or other common complex quantitative traits (e.g., human height) along the continuous factor x (e.g., cellular states or donors age). Let us denote the genotype vector (g={({g}_{1},ldots ,{g}_{N})}^{top }) and the kinship matrix R among N individuals, the mapping model, as proposed by us or others [8, 10] can be expressed as follows:

$$y=alpha +beta odot g+gamma +varepsilon ,$$

(1)

where

$$alpha sim {{{{{{{mathcal{N}}}}}}}}(0,K),quad beta sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{g}K),quad gamma sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{d}Kodot R)$$

are all GPs with similar covariance matrices, where denotes element wise product between two vectors or matrices with the same dimensions, K=k(X, X) denotes the covariance matrix with a kernel function, and denotes the residuals. Intuitively, models the average baseline change of y in relation to x, while represents the dynamic genetic effect along x. The effect size is multiplied by the genotype vector g, indicating that the output yi varies between different genotype groups (gi {0, 1, 2}). In fact, the effect size (xi) is additive to the baseline (xi) at each xi, which is the same as the standard association mapping. Here statistical hypothesis testing is performed under the null hypothesis of g=0, as the strength of genetic association is determined by g.

It is important to note that the model (1) includes a correction term that accounts for the between-donor variation of dynamic changes along x, particularly when multiple data points are measured from the same donor or samples are taken from related donors. This term is essential for statistical calibration of the genetic effect , because other genetic associations scattered over the genome (trans effects) can confound the target genotype effect. Therefore, to adjust for the confounding effect, we need to include the extra GP , which is drawn from a normal distribution with the covariance matrix of K multiplied by the kinship matrix R.

Here, the kinship matrix is estimated by (hat{R}=sumnolimits_{l = 1}^{L}{tilde{g}}_{l}{tilde{g}}_{l}^{top }/L) using genome-wide variants gl(l=1, ,L), where ({tilde{g}}_{l}) is a standardized genotype vector (centered and scalced) based on the allele frequency at genetic variant l, while L denotes the total number of all variants across the genome [6]. The matrix is initially a NN dense matrix, but it can be simplified if donors are (sufficiently) unrelated. Let us introduce a design matrix of donor configuration, (Zin {{mathbb{R}}}^{Ntimes {N}_{d}}), for the Nd donors (i.e., zij=1 if the sample i is taken from the donor j; otherwise zij=0), the kinship matrix can then be approximated as R=ZZ. Thus, can be expressed as a linear combination of Nd independent GPs ({{gamma }_{j} sim {{{{{{{mathcal{N}}}}}}}}(0,{delta }_{d}K);j=1,ldots ,{N}_{d}}), such that (gamma =mathop{sum }nolimits_{j = 1}^{{N}_{d}}{gamma }_{j}odot {z}_{j}), where zj denotes the jth column vector of Z. This approximation is particularly useful for parameter estimation with large Nd (as discussed in section 2.4).

When the sample size N is large, an ordinary GP faces a severe scalability issue due to the dimension of the dense matrix K being NN, resulting in a total computational cost of ({{{{{{{mathcal{O}}}}}}}}({N}^{3})). As a result, the application of GP in the GWAS field is hindered, as the sample sizes often reach a million these days. However, there are several alternatives to approximate the full GP model, including Nystrm approximation (low-rank approximation), Projected Process approximation [13], Sparse Pseudo-inputs GP [14], Fully Independent Training Conditional approximation and Variational Free Energy approximation [15]. In this section, we introduce a sparse GP approximation proposed by [16].

The sparse GP is a scalable model using the technique of inducing points [14]. Since the computational cost of the sparse GP is ({{{{{{{mathcal{O}}}}}}}}(N{M}^{2})) with M inducing points, we can greatly reduce the computational cost, which is essentially linear to N under the assumption of MN. Let us denote M inducing points by (T={({t}_{1},ldots ,{t}_{M})}^{top }) and corresponding GPs by (u={(u({t}_{1}),ldots ,u({t}_{M}))}^{top }), the joint distribution of f and u becomes a multivariate normal distribution. Therefore a lower bound of the conditional distribution p(yu) can be written as

$$log p(y| u) = log int,p(y| f)p(f| u)dfge intleft[log p(y| f)right]p(f| u)df\ = log {{{{{{{mathcal{N}}}}}}}}(y| bar{f},{sigma }^{2}I)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}equiv {{{{{{{{mathcal{L}}}}}}}}}_{1},$$

where

$$bar{f}={K}_{NM}{K}_{MM}^{-1}u,quad {tilde{K}}_{NN}={K}_{NN}-{K}_{NM}{K}_{MM}^{-1}{K}_{MN},$$

and

$${K}_{NN}=k(X,X),quad {K}_{NM}=k(X,T),quad {K}_{MM}=k(T,T).$$

Therefore, the marginal distribution of the output y is approximated by

$$p(y) = int,p(y| u)p(u)duge intexp {{{{{{{{{mathcal{L}}}}}}}}}_{1}}p(u)du\ = log {{{{{{{mathcal{N}}}}}}}}(y| 0,V)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}equiv exp {{{{{{{{{mathcal{L}}}}}}}}}_{2}},$$

where (V={sigma }^{2}I+{K}_{NM}{K}_{MM}^{-1}{K}_{MN}). The lower bound ({{{{{{{{mathcal{L}}}}}}}}}_{2}) is referred to as the Titsias bound and can be used for parameter estimation as well as statistical hypothesis testing.

Selecting the optimal number of inducing points M and their coordinates is crucial for accurately approximating a GP. Although a larger value of M provides a better approximation of GP, it is not feasible to increase M when N reaches hundreds of thousands in large-scale genetic association studies. Additionally, the accuracy of the GP is influenced by the complexity of nonlinearity of y and the dimension Q of input points x. There are few approaches inferring an optimal value of M from data [17], but the size of the example used in the study is too small (48 genes437 samples) to be applied to real-world data. However, it is worth noting that the optimal coordinate of inducing points with a fixed M can be easily learned from data, as described in the next section.

Genetic association mapping involves performing tens of millions of hypothesis tests. Therefore, it is almost impossible to estimate the parameters of GPs from each pair of trait and variant across the genome, even with use of the sparse approximation mentioned in the last subsection. Furthermore, both the baseline and the correction term share the characteristic length parameter (rho ={({rho }_{1},ldots ,{rho }_{Q})}^{top }) and the inducing points T. This can lead to unstable optimization and prolonged parameter estimation times. To address this issue, we have previously proposed a three-step parameter estimation strategy for performing the statistical hypothesis testing [10]. Especially, optimizing with respect to using a quasi-Newton approach (such as the BFGS method) is sufficient in the first step, because the variance explained by is typically much smaller than that explained by . The three steps are:

y=+ (baseline model: H0) to estimate and T.

y=++ (baseline model: H1) to estimate variance parameters d and 2. Here (hat{rho }) and (hat{T}) estimated in H0 are plugged into H1.

y=+g++ (full model: H2) to test whether g=0. Here ({hat{rho },hat{T},{hat{delta }}_{d},{hat{sigma }}^{2}}) estimated in H0 and H1 are used.

Here the Titsias bounds for these models are given by

$${{{{{{{{mathcal{L}}}}}}}}}_{2}^{h}=left{begin{array}{ll}log {{{{{{{mathcal{N}}}}}}}}(y| 0,V)-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}},hfill &h={H}_{0},\ log {{{{{{{mathcal{N}}}}}}}}(y| 0,{V}_{d})-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{(1+{delta }_{d}){tilde{K}}_{NN}},hfill&h={H}_{1},\ log {{{{{{{mathcal{N}}}}}}}}(y| 0,{V}_{g})-frac{1}{2{sigma }^{2}}{{{{{{{rm{tr}}}}}}}}{(1+{delta }_{d}){tilde{K}}_{NN}+{delta }_{g}G{tilde{K}}_{NN}G},&h={H}_{2},end{array}right.$$

where

$${V}_{d}=V+{delta }_{d}({K}_{NM}{K}_{MM}^{-1}{K}_{MN})odot R,quad {V}_{g}={V}_{d}+{delta }_{g}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G,$$

and G=diag(g) denotes the diagonal matrix whose diagonal elements are given by the elements of g. The estimators (hat{rho }) and (hat{T}) are obtained by maximizing ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{0}}) with respect to and T, and ({hat{delta }}_{d}) and ({hat{sigma }}^{2}) are obtained by maximizing ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}) with respect to d and 2 given (hat{rho }) and (hat{T}).

It is worth noting that, when the kinship matrix R can be expressed as R=ZZ with a lower rank matrix (Z=({z}_{1},ldots ,{z}_{{N}_{d}})) with Nd

$${V}_{d}=V+{delta }_{d}left({K}_{NM}{K}_{MM}^{-1}{K}_{MN}right)odot (Z{Z}^{top })={sigma }^{2}I+A{B}^{-1}{A}^{top },$$

where

$$A = , (C,{{{{{{{rm{diag}}}}}}}}({z}_{1})C,ldots ,{{{{{{{rm{diag}}}}}}}}({z}_{D})C),quad \ B = , {{{{{{{rm{diag}}}}}}}}({K}_{MM},{delta }_{d}{K}_{MM},ldots ,{delta }_{d}{K}_{MM}),$$

and (C={K}_{NM}{K}_{MM}^{-1}), and B becomes a M(Nd+1)M(Nd+1) block diagonal matrix. Since the computational complexity of H1 or H2 is ({{{{{{{mathcal{O}}}}}}}}({N}_{d}^{2}{M}^{2}N)), for large Nd such as MNd>N, the total complexity is over ({{{{{{{mathcal{O}}}}}}}}({N}^{3})) and we again face the scalability issue.

However, if the donors in the data are unrelated, we can significantly reduce the memory usage and the computational burden to be ({{{{{{{mathcal{O}}}}}}}}({N}_{d}{M}^{2}N)). This is because the matrix A becomes a sparse matrix, with ({z}_{i}^{top }{z}_{{i}^{{prime} }}=0) for (ine {i}^{{prime} }), resulting in NM(Nd1) elements out of NMNd bing 0. Additionaly, non-zero elements of A are repeated and identical to the elements of C, and the block diagonal element of B is essentially ({K}_{MM}^{-1}).

To perform GWAS with GP, it is crucial to reduce the computational time required to map a genetic association for each variant. The Score statistic to test g=0 can be computed from the first derivative of ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}) with respect to g, and the variance parameters ({{hat{sigma }}^{2},{hat{delta }}_{d}}) of Vd are estimated from ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}) once for every single variant to be tested. Therefore, it is ideal to test tens of millions of variants independently. To use the fact that the first derivative of ({V}_{g}^{-1}) given g=0 depends only on Vd, such that

$${left.frac{partial {V}_{g}^{-1}}{partial {delta }_{g}}rightvert }_{{delta }_{g} = 0}=-{V}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G{V}_{d}^{-1},$$

the Score statistic can be explicitly written as

$$S={y}^{top }{hat{V}}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-1}{K}_{MN}G{hat{V}}_{d}^{-1}y,$$

(2)

whose distribution is the generalized 2 distribution, that is, the distribution of the weighted sum of M independent 2 statistics, such as (mathop{sum }nolimits_{m = 1}^{M}{lambda }_{m}{chi }_{m}^{2}) [8, 10]. It is known that the weights m(m=1, , M) are given by the non-negative eigenvalues of

$${K}_{MM}^{-1/2}{K}_{MN}G{hat{V}}_{d}^{-1}G{K}_{NM}{K}_{MM}^{-top /2},$$

where ({K}_{MM}^{-1/2}) can be computed using the Cholesky decomposition of ({K}_{MM}={K}_{MM}^{top /2}{K}_{MM}^{1/2}).

To compute the p-value from S, we can use the Davies exact method, implemented in the CompQuadForm package on R. Note that, if we use a linear kernel, S can be simplified as described [8]. Although the Score based approach is an easy and quick solution for genome-wide mapping, to check the asymptotic behavior and the statistical calibration of the Score statistics, we should use a QQ-plot to verify that the p-values obtained from multiple variants follow a uniform distribution under the null hypothesis.

If the collocalisation analysis [18] or Bayesian hierarchical model [19] is considered as a downstream analysis using the test statistics, a Bayes factor can also be computed using the Titsias bounds, such as

$$log (BF)={{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}-{{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}.$$

Here we would use some empirical values g={0.01, 0.1, 0.5} to average the Bayes factor, instead of integrating out g from ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{2}}) [20].

In a real genetic association mapping, most of genetic associations are indeed static and ubiquitous over the factor x. To capture such a static association, we can come up with the following model

$$y={alpha }_{0}{1}_{N}+alpha +{beta }_{0}g+beta odot g+{gamma }_{0}+gamma +varepsilon ,$$

where 0 denotes the intercept, 1N denotes the N-dimensional vector of all 1s, 0 denotes the effect size of the static genetic association, and ({gamma }_{0} sim {{{{{{{mathcal{N}}}}}}}}(0,{sigma }^{2}{delta }_{d0}R)) denotes the donor variation which confounds 0. For instance, in [8], the static genetic association 0 is modeled as a fixed effect, and the dynamic effect is tested using the Score statistic. On the other hand, in [10], the authors modeled both the static and dynamic associations as a random effect to test via a Bayes factor. In this case, the covariance matrix K can be rewritten as

$${K}^{* }={sigma }^{2}{e}^{-{rho }_{0}}{1}_{N}{1}_{N}^{top }+K$$

to estimate the model parameters in (1), and then the variance g=0 for is tested.

Note here that, the kernel parameter 0 is not necessarily common and shared across , and . Indeed, in [10], the authors estimated ({hat{rho }}_{0}^{alpha }) and ({hat{rho }}_{0}^{gamma }) independently in ({{{{{{{{mathcal{L}}}}}}}}}_{2}^{{H}_{1}}). To compute the Score statistic, the authors assumed that ({hat{rho }}_{0}^{beta }={hat{rho }}_{0}^{gamma }) for and , because the ratio of the static effect to the dynamic effect can be the same for cis and trans genetic effects.

In longitudinal studies, the factor x is typically observed explicitly (e.g., donors age or physical locations where samples were taken). This makes it straightforward to perform genetic association mapping along x using the Score statistics or Bayes factors, as described above. However, this is not often the case for the molecular studies, and therefore we need to estimate the underlying biological state from the data.

In single-cell biology, typically, the hidden cellular state x is often referred to as pseudotime", and the principal component analysis is normally used to estimate it as part of dimension reduction [21]. Gaussian process latent variable model (GPLVM) is a strong alternative to extract the pseudotime when the molecular phenotype gradually changes along pseudotime x in a nonlinear fashion [22, 23].

We have also proposed a GPLVM that uses the baseline model H0 to estimate the latent variable X from the single-cell RNA-seq data (see Section 3 for more details). Let Y=(y1,,yJ) be the gene expression matrix of J genes, whose column is a vector of gene expression for the gene j, the Titsias lower bound of the GPLVM based on the baseline model H0 can be written as

$$p(Y| X)ge {{{{{{{mathcal{MN}}}}}}}}(Y| 0,Sigma ,I+{K}_{NM}{K}_{MM}^{-1}{K}_{MN})-frac{J}{2}{{{{{{{rm{tr}}}}}}}}{{tilde{K}}_{NN}}={{{{{{{{mathcal{L}}}}}}}}}_{2}.$$

To obtain the optimal cellular state (hat{X}), this lower bound can be maximized with respect to {, X, T, } [10, 24]. Here (Sigma ={{{{{{{rm{diag}}}}}}}}({sigma }_{j}^{2};j=1,ldots ,J)) denotes the residual variance parameters of J genes, and ({{{{{{{mathcal{MN}}}}}}}}(cdot )) denotes the matrix normal distribution. Due to the uniqueness of the model parameters, the variance parameter in the kernel function is set to be 2=1. In addition, to maintain the uniqueness of the latent variable estimation, a prior probability on X is required. It is quite common to assume independent standard normal distributions for each of the elements of (X sim {{{{{{{mathcal{MN}}}}}}}}(0,I,I)) [24], although there are multiple alternatives to consider depending on the nature of the modeled data [10, 23].

In the parameter estimation, the limited-memory BFGS method can be used to implement GPLVM for large N. In addition, the stochastic variational Bayes approach can be used to fit GPLVM to larger data sets, while reducing the fitting time [25,26,27].

For the non-Gaussian output y, the Titsias bound ({{{{{{{{mathcal{L}}}}}}}}}_{2}) is not analytically available. However, for the Poisson distribution case, a lower bound of the conditional probability p(yu) can be computed as follows:

$${{{{{{{{mathcal{L}}}}}}}}}_{1}=mathop{sum}_{i}left[-log ({y}_{i}!)+{y}_{i}{bar{f}}_{i}-exp left({bar{f}}_{i}+frac{{tilde{k}}_{ii}}{2}right)right],$$

where ({tilde{k}}_{ii}) denotes the ith diagonal element of ({tilde{K}}_{NN}). Let i and wi be the working response and the iterative weight of GLM for the ith sample, such that

$${nu }_{i}={bar{f}}_{i}+({y}_{i}-{w}_{i})/{w}_{i}quad {{{{{{{rm{and}}}}}}}}quad {w}_{i}=exp left({bar{f}}_{i}+frac{{tilde{k}}_{ii}}{2}right)$$

for i=1, , N, the optimal (hat{u}) which maximizes (exp {{{{{{{{{mathcal{L}}}}}}}}}_{1}}p(u)) satisfies

$$left({K}_{MM}^{-1}+{K}_{MM}^{-1}{K}_{MN}W{K}_{NM}{K}_{MM}^{-1}right)u=Wnu ,$$

(3)

where W=diag(wi; i=1, , N), which suggests

$$nu | u sim {{{{{{{mathcal{N}}}}}}}}(bar{f},{W}^{-1})$$

as described in elsewhere [28]. Therefore, we can maximize

$${{{{{{{{mathcal{L}}}}}}}}}_{2}={{{{{{{mathcal{N}}}}}}}}(nu | 0,{W}^{-1}+{K}_{NM}{K}_{MM}^{-1}{K}_{MN})$$

with respect to {2, } where (u=hat{u}) is iteratively updated as in (3). Thus, to obtain the Score statistic for non-Gaussian y, we replace y= and ({hat{V}}_{d}={W}^{-1}+A{B}^{-1}A) in (2).

For a binary output y, it is more complicated than the Poisson case, bacause it is even impossible to analytically compute the ({{{{{{{{mathcal{L}}}}}}}}}_{1}) bound with logit or Probit link function. For logit link function, several useful alternatives to the ({{{{{{{{mathcal{L}}}}}}}}}_{1}) bound have been proposed [29]. For Probit link function [30], proposed an approximation of ({{{{{{{{mathcal{L}}}}}}}}}_{1}) using the Gauss-Hermite quadrature. However, in both cases, the computational cost is much higher than the Poisson case and it is rather impractical to conduct a large genome-wide association mapping at this moment.

See the rest here:
Genetic association mapping leveraging Gaussian processes | Journal of Human Genetics - Nature.com

Minimally destructive hDNA extraction method for retrospective genetics of pinned historical Lepidoptera specimens … – Nature.com

Pyke, G. H. & Ehrlich, P. R. Biological collections and ecological/environmental research: A review, some observations and a look to the future. Biol. Rev. 85, 247266 (2010).

Article PubMed Google Scholar

Wheeler, Q. D. et al. Mapping the biosphere: Exploring species to understand the origin, organization and sustainability of biodiversity. Syst. Biodivers. 10, 120 (2012).

Article CAS Google Scholar

Lane, M. A. Roles of natural history collections. Ann. Mo. Bot. Gard. 83, 536545 (1996).

Article Google Scholar

Raxworthy, C. J. & Smith, B. T. Mining museums for historical DNA: Advances and challenges in museomics. Trends Ecol. Evol. 36, 10491060 (2021).

Article CAS PubMed Google Scholar

Cavill, E. L., Liu, S., Zhou, X. & Gilbert, M. T. P. To bee, or not to bee? One leg is the question. Mol. Ecol. Resour. 22, 18681874 (2022).

Article CAS PubMed Google Scholar

OBrien, D. et al. Bringing together approaches to reporting on within species genetic diversity. J. Appl. Ecol. 59, 22272233 (2022).

Article Google Scholar

Pearman, P. B. et al. Monitoring of species genetic diversity in Europe varies greatly and overlooks potential climate change impacts. Nat. Ecol. Evol. 2024, 115. https://doi.org/10.1038/s41559-023-02260-0 (2024).

Article Google Scholar

Gauthier, J. et al. Museomics identifies genetic erosion in two butterfly species across the 20th century in Finland. Mol. Ecol. Resour. 20, 11911205 (2020).

Article CAS PubMed PubMed Central Google Scholar

Jensen, E. L. et al. Ancient and historical DNA in conservation policy. Trends Ecol. Evol. 37, 420429 (2022).

Article CAS PubMed Google Scholar

Fountain, T. et al. Predictable allele frequency changes due to habitat fragmentation in the Glanville fritillary butterfly. Proc. Natl. Acad. Sci. USA 113, 26782683 (2016).

Article ADS CAS PubMed PubMed Central Google Scholar

Dabney, J., Meyer, M. & Pbo, S. Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5, 012567 (2013).

Article Google Scholar

Briggs, A. W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA 104, 1461614621 (2007).

Article ADS CAS PubMed PubMed Central Google Scholar

Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3e3 (2012).

Article CAS PubMed Google Scholar

Campos, P. F. & Gilbert, M. T. P. DNA extraction from keratin and chitin. Methods Mol. Biol. 1963, 5763 (2019).

Article CAS PubMed Google Scholar

Chen, F., Shi, J., Luo, Y. Q., Sun, S. Y. & Pu, M. Genetic characterization of the gypsy moth from China (Lepidoptera, Lymantriidae) using inter simple sequence repeats markers. PLoS ONE 8, e73017 (2013).

Article ADS CAS PubMed PubMed Central Google Scholar

Palma, J., Valmorbida, I., da Costa, I. F. D. & Guedes, J. V. C. Comparative analysis of protocols for DNA extraction from soybean caterpillars. Genet. Mol. Res. 15, 15027094 (2016).

Article Google Scholar

Blaimer, B. B., Lloyd, M. W., Guillory, W. X. & Brady, S. G. Sequence capture and phylogenetic utility of genomic ultraconserved elements obtained from pinned insect specimens. PLoS ONE 11, e0161531 (2016).

Article PubMed PubMed Central Google Scholar

Marn, D. V., Castillo, D. K., Lpez-Lavalle, L. A. B., Chalarca, J. R. & Prez, C. R. An optimized high-quality DNA isolation protocol for spodoptera frugiperda J. E. smith (Lepidoptera: Noctuidae). MethodsX 8, 101255 (2021).

Article PubMed PubMed Central Google Scholar

Thomsen, P. F. et al. Non-destructive sampling of ancient insect DNA. PLoS ONE 4, e5048 (2009).

Article ADS PubMed PubMed Central Google Scholar

Lalonde, M. M. L. & Marcus, J. M. How old can we go? Evaluating the age limit for effective DNA recovery from historical insect specimens. Syst. Entomol. 45, 505515 (2020).

Article Google Scholar

Latorre, S. M. et al. Museum phylogenomics of extinct Oryctes beetles from the Mascarene Islands. bioRxiv https://doi.org/10.1101/2020.02.19.954339 (2020).

Article Google Scholar

Cavill, E. & Liu, S. To bee, or not to bee? One leg is the question. Mol. Ecol. Resour. 22, 18681874 (2022).

Article CAS PubMed Google Scholar

Starks, P. T. & Peters, J. M. Semi-nondestructive genetic sampling from live eusocial wasps, Polistes dominulu and Polistes fuscatu. Insectes Soc. 49, 2022 (2002).

Article Google Scholar

Gilbert, M. T. P., Moore, W., Melchior, L. & Worobey, M. DNA extraction from dry museum beetles without conferring external morphological damage. PLoS ONE 2, e272 (2007).

Article ADS PubMed PubMed Central Google Scholar

Korlevi, P. et al. A minimally morphologically destructive approach for DNA retrieval and whole-genome shotgun sequencing of pinned historic dipteran vector species. Genome Biol. Evol. 13, 226 (2021).

Article Google Scholar

Kristensent, N. P., Scoble, M. J. & Karsholt, O. Lepidoptera phylogeny and systematics: The state of inventorying moth and butterfly diversity. Zootaxa 1668, 699747 (2007).

Google Scholar

Neff, F. et al. Different roles of concurring climate and regional land-use changes in past 40 years insect trends. Nat. Commun. 13, 7611 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar

Wagner, D. L., Grames, E. M., Forister, M. L., Berenbaum, M. R. & Stopak, D. Insect decline in the Anthropocene: Death by a thousand cuts. Proc. Natl. Acad. Sci. USA 118, 39891188 (2021).

Article Google Scholar

Casas-Marce, M., Revilla, E. & Godoy, J. A. Searching for DNA in museum specimens: A comparison of sources in a mammal species. Mol. Ecol. Resour. 10, 502507 (2010).

Article CAS PubMed Google Scholar

Silva, P. C., Malabarba, M. C., Vari, R. & Malabarba, L. R. Comparison and optimization for DNA extraction of archived fish specimens. MethodsX 6, 14331442 (2019).

Article PubMed PubMed Central Google Scholar

Tsai, W. L. E., Schedl, M. E., Maley, J. M. & McCormack, J. E. More than skin and bones: Comparing extraction methods and alternative sources of DNA from avian museum specimens. Mol. Ecol. Resour. 20, 12201227 (2020).

Article CAS PubMed Google Scholar

Fernndez-Vizarra, E., Enrquez, J. A., Prez-Martos, A., Montoya, J. & Fernndez-Silva, P. Tissue-specific differences in mitochondrial activity and biogenesis. Mitochondrion 11, 207213 (2011).

Article PubMed Google Scholar

Menail, H. A. et al. Flexible thermal sensitivity of mitochondrial oxygen consumption and substrate oxidation in flying insect species. Front. Physiol. 13, 897174 (2022).

Article PubMed PubMed Central Google Scholar

Ferrari, G. et al. Developing the protocol infrastructure for DNA sequencing natural history collections. Biodivers. Data J. 11, 102317 (2023).

Article Google Scholar

Hundsdoerfer, A. & Kitching, I. A method for improving DNA yield from older specimens of large Lepidoptera while minimizing damage to external and internal abdominal characters. Arthropod. Syst. Phylogeny 68, 151155 (2010).

Article Google Scholar

Twort, V. G., Minet, J., Wheat, C. W. & Wahlberg, N. Museomics of a rare taxon: Placing Whalleyanidae in the Lepidoptera Tree of Life. Syst. Entomol. 46, 926937 (2021).

Article Google Scholar

Bhler-Cortesi, T. & Wymann, H. P. Schmetterlinge: Tagfalter Der Schweiz. (Verlag Paul Haupt, 2019).

Smith, A. D. et al. Recovery and analysis of ancient beetle DNA from subfossil packrat middens using high-throughput sequencing. Sci. Rep. 11, 12635 (2021).

Article CAS PubMed PubMed Central Google Scholar

Gutaker, R. M., Reiter, E., Furtwngler, A., Schuenemann, V. J. & Burbano, H. A. Extraction of ultrashort DNA molecules from herbarium specimens. Biotechniques 62, 7679 (2017).

Article CAS PubMed Google Scholar

Caldern-Corts, N., Quesada, M., Cano-Camacho, H. & Zavala-Pramo, G. A simple and rapid method for DNA isolation from xylophagous insects. Int. J. Mol. Sci. 11, 50565064 (2010).

Article PubMed PubMed Central Google Scholar

Caligiuri, L. G. et al. Optimization of DNA extraction from individual sand flies for PCR amplification. Methods Protoc. 2, 115 (2019).

Article Google Scholar

El-Ashram, S., Al Nasr, I. & Suo, X. Nucleic acid protocols: Extraction and optimization. Biotechnol. Rep. 12, 3339 (2016).

Article Google Scholar

Healey, A., Furtado, A., Cooper, T. & Henry, R. J. Protocol: A simple method for extracting next-generation sequencing quality genomic DNA from recalcitrant plant species. Plant Methods 10, 21 (2014).

Article PubMed PubMed Central Google Scholar

Poinar, H. N. et al. Molecular coproscopy: Dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science 1979(281), 402406 (1998).

Article ADS Google Scholar

Jaenicke-Desprs, V. et al. Early allelic selection in maize as revealed by ancient DNA. Science 1979(302), 12061208 (2003).

Article ADS Google Scholar

Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA 110, 1575815763 (2013).

Article ADS CAS PubMed PubMed Central Google Scholar

Schubert, M. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 10561082 (2014).

Article CAS PubMed Google Scholar

Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: Rapid adapter trimming, identification, and read merging. BMC Res. Notes 9, 88 (2016).

Article PubMed PubMed Central Google Scholar

Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 314324 (IEEE, 2019). https://doi.org/10.1109/IPDPS.2019.00041.

Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics 31, 20322034 (2015).

Article CAS PubMed PubMed Central Google Scholar

Link, V. et al. ATLAS: Analysis tools for low-depth and ancient samples. bioRxiv. https://doi.org/10.1101/105346 (2017).

Jnsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage20: Fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 16821684 (2013).

Read the original:
Minimally destructive hDNA extraction method for retrospective genetics of pinned historical Lepidoptera specimens ... - Nature.com

Restless legs syndrome tied to 140 ‘hotspots’ in the genome – Livescience.com

Researchers have uncovered more than 140 sections of the human genome tied to restless legs syndrome (RLS), a neurological condition that affects up to 10% of the U.S. population.

These stretches of DNA in the genome are known as genetic risk loci, and prior to the new study, only 22 were known to be tied to RLS. The new research, published Wednesday (June 5) in the journal Nature Genetics, increases that number to 164.

Three of the newfound risk loci are located on the X chromosome, which females typically carry two of in each cell while males carry only one. RLS is more common among women than men, but based on their new results, the researchers don't think this difference is explained by the trio of risk loci on the X chromosome.

"This study is the largest of its kind into this common but poorly understood condition," Steven Bell, co-senior study author and an epidemiologist at the University of Cambridge, said in a statement. "By understanding the genetic basis of restless legs syndrome, we hope to find better ways to manage and treat it, potentially improving the lives of many millions of people affected worldwide."

Related: 10 unexpected ways Neanderthal DNA affects our health

This discovery could also be used to help predict a person's risk of developing RLS, the study authors wrote in their paper.

RLS, also called Willis-Ekbom disease, causes people to experience an unpleasant crawling or creeping sensation in their legs, as well as the irresistible urge to move them. These sensations are often more intense in the evening or at night, while people are resting. The condition is thought to be underdiagnosed, and when it is diagnosed, its exact cause is often unknown. RLS can arise due to another condition, such as iron deficiency, kidney disease or Parkinson's, and it's likely tied to dysfunction in part of the brain that uses dopamine to control movement.

Get the worlds most fascinating discoveries delivered straight to your inbox.

There is currently no cure for RLS, but certain treatments, such as anti-seizure drugs, can help ease a person's symptoms.

In the new study, the researchers pooled the data from several, enormous genome-wide association studies, which compare the DNA of people with a given disease to that of people without it. In all, the new research included data from more than 116,000 people who had RSL with more than 1.5 million people without the condition.

Notably, all those included were of European ancestry, which may limit the relevance of the findings in other demographics.

The researchers found no strong differences in genetic risk factors between the sexes, even though RLS is more common in women. They think this suggests that RLS is governed by a combination of genetic, environmental and hormonal factors, so the genetic risk loci don't dictate a person's risk in isolation.

Among the newfound risk loci, the team hunted for genes that might already be targeted by existing approved drugs the goal was to find treatments that potentially be given to patients in the near future.

They found 13 risk loci targeted by existing drugs, including two genes that code for so-called glutamate receptors. These receptors are proteins found on nerve cells that play a vital role in the transmission of signals throughout the nervous system. Preliminary clinical trials suggest that targeting these two receptor genes with anti-epileptic drugs namely, perampanel and lamotrigine can benefit some patients with RLS.

In addition to identifying potential drugs, the team ran a statistical analysis to see if RLS raises the risk of any other conditions. This suggested that RLS may be a risk factor for developing type 2 diabetes, although past studies on the potential link have found mixed results. As such, "these results should not be overinterpreted," the researchers cautioned they need to be confirmed in future research.

Despite their limitations, the findings may bring doctors one step closer to being able to predict someone's risk of developing RLS and understanding the wider impacts the condition has on people's health, the team said.

Ever wonder why some people build muscle more easily than others or why freckles come out in the sun? Send us your questions about how the human body works to community@livescience.com with the subject line "Health Desk Q," and you may see your question answered on the website!

Read the rest here:
Restless legs syndrome tied to 140 'hotspots' in the genome - Livescience.com

Paired tumor-germline testing can enhance patient carewith guidance from genetics specialists – The Cancer Letter

Data from the IMROZ phase III trial demonstrated Sarclisa (isatuximab) in combination with standard-of-care bortezomib, lenalidomide and dexamethasone followed by Sarclisa-Rd (the IMROZ regimen) significantly reduced the risk of disease progression or death by 40%, compared to VRd followed by Rd in patients with newly diagnosed multiple myeloma not eligible for transplant.

Originally posted here:
Paired tumor-germline testing can enhance patient carewith guidance from genetics specialists - The Cancer Letter