Protein-truncating variants in BSN are associated with severe adult-onset obesity, type 2 diabetes and fatty liver disease – Nature.com

Ethics

Our research complies with all relevant ethical regulations. All studies included in this research were approved by the relevant board or committee. UK Biobank has approval from the North West Multi-centre Research Ethics Committee (REC reference 13/NW/0157) as a Research Tissue Bank (RTB) approval, and informed consent was provided by each participant. This approval means that researchers do not require separate ethical clearance and can operate under RTB approval. This RTB approval was granted initially in 2011 and is renewed every 5 years; hence, UK Biobank successfully renewed approval in 2016 and 2021. The MCPS was approved by the Mexican Ministry of Health, the Mexican National Council for Science and Technology and the University of Oxford. The PGR study was approved by the institutional review board at the Center for Non-Communicable Diseases (IRB: 00007048, IORG0005843, FWAS00014490) and all participants provided informed consent. The SCOOP cohort was approved by the Multi-regional Ethics Committee and the Cambridge Local Research Ethics Committee (MREC 97/21 and REC number 03/103). Participants (or parents for individuals <16 years old) provided written informed consent; minors provided oral consent. The INTERVAL study received ethics committee approval from the National Research Ethics Service Committee (11/EE/0538), and all participants provided informed consent before joining the study.

We used the same processing strategies as those outlined in our previous paper to analyze the WES data and perform quality control steps19. We queried WES data from 454,787 individuals in UK Biobank39, excluding those with excess heterozygosity, those with autosomal variant missingness on genotyping arrays of 5%, or those not included in the subset of phased samples as defined by Bycroft et al.13.

WES data were stored as population-level variant call format (VCF) files, aligned to GRCh38 and accessed through the UK Biobank Research Analysis Platform (RAP). In addition to the quality control measures already applied to the released data, as described by Backman et al.39, we conducted several additional quality control procedures. First, we used bcftools v1.14 norm40 to split the multiallelic sites and left-correct and normalize indels. Next, we filtered out variants that failed our quality control criteria, including those with: (1) read depth of <7; (2) genotype quality of <20; and (3) binomial test P value for alternative allele reads versus reference allele reads of 0.001 for heterozygous genotypes. For indel genotypes, we kept only variants with read depth of 10 and genotype quality of 20. Variants that failed quality control criteria were marked as missing (that is, ./.). After filtering, variants where more than 50% of the genotypes were missing were excluded from downstream analyses19.

The remaining variants underwent annotation using Ensembl Variant Effect Predictor (VEP v104)41 with the -everything flag and additional plugins for REVEL14, CADD42 and LOFTEE43. For each variant, a single Ensembl transcript was prioritized on the basis of whether the annotated transcript was protein-coding, MANE select v0.97 (ref. 44) or the VEP canonical transcript. The individual consequence for each variant was then prioritized on the basis of severity as defined by VEP. Stop-gained, splice acceptor and splice donor variants were merged into a combined PTV category, while annotations for missense and synonymous variants were adopted directly from VEP. We included only variants on autosomes and the X chromosome that were within Ensembl protein-coding transcripts and transcripts included in the UK Biobank WES assay in our downstream analysis.

Our analyses focused primarily on individuals of European genetic ancestry, and we excluded those who withdrew consent from the study, resulting in a final cohort of 419,668 individuals.

We used BOLT-LMM v2.3.6 (ref. 15) as our primary analytical tool to conduct the gene-burden test. To run BOLT-LMM, we first queried a set of genotypes with minor allele count (MAC) >100, which was derived from the genotyping arrays for the individuals with the WES data to build the null model. To accommodate BOLT-LMMs requirement for imputed genotyping data rather than per-gene carrier status, we developed dummy genotype files in which each gene was represented by a single variant. We then coded individuals with a qualifying variant within a gene as heterozygous, regardless of the total number of variants they carried in that gene. We then created dummy genotypes for the HC PTVs with MAF < 0.1% as defined by LOFTEE, missense variants with REVEL >0.5 and missense variants with REVEL >0.7. We then used BOLT-LMM to analyze phenotypes using default parameters, except for the inclusion of the lmmInfOnly flag. In addition to the dummy genotypes, we included all individual markers in the WES data to generate association test statistics for individual variants. We used age, age2, sex and the first ten principal components (PCs) as calculated by Bycroft et al.13 and the WES release batch (50k, 200k, 450k) as covariates.

To check whether there was a single variant driving the association, we performed a leave-one-out analysis for BSN and APBA1 using linear regression in R v3.6.3 by dropping the HC PTVs contained in our analysis one by one. In addition, we also checked the geographic distribution of APBA1 and BSN HC PTV carriers.

We sought replication of our findings for the four new genes in two independent predominantly non-European exome-sequenced cohorts: the MCPS and the PGR study.

MCPS is a cohort study of 159,755 adults of predominantly admixed American ancestry. Participants aged 35 years or older were recruited between 1998 and 2004 from two adjacent urban districts of Mexico City. Phenotypic data were recorded during household visits, including height, weight, and waist and hip circumferences. Disease history was self-reported at baseline, and the participants were linked to Mexican national mortality records. The cohort has been described in detail elsewhere17,18.

The PGR study has been recruiting participants aged 15100 years as cases or controls via clinical audits for specific conditions since 2005 from over 40 centers around Pakistan. Participants were recruited from clinics treating patients with cardiometabolic, inflammatory, respiratory or ophthalmological conditions. Information on lifestyle habits, medical and medication history, family history of diseases, exposure to smoking and tobacco consumption, physical activity, dietary habits, anthropometry, basic blood biochemistry and electrocardiogram traits was recorded during clinic visits. DNA, serum, plasma and whole blood samples were also collected from all study participants.

Exome sequencing data for 141,046 MCPS and 37,800 PGR participants were generated at the Regeneron Genetics Center and passed Regenerons initial quality control, which included identifying sex discordance, contamination, unresolved duplicate sequences and discordance with microarray genotype data for MCPS. Genomic DNA was subjected to paired-end 75-bp WES at Regeneron Pharmaceuticals using the IDT xGen v1 capture kit on the NovaSeq 6000 platform. Conversion of sequencing data in BCL format to FASTQ format and the assignments of paired-end sequence reads to samples were based on 10-base barcodes, using bcl2fastq v2.19.0.

These exome sequences were processed at AstraZeneca from their unaligned FASTQ state. A custom-built Amazon Web Services cloud computing platform running Illumina DRAGEN Bio-IT Platform Germline Pipeline v3.0.7 was used to align the reads to the GRCh38 genome reference and perform single-nucleotide variant (SNV) and insertion and deletion (indel) calling. SNVs and indels were annotated using SnpEff v4.3 (ref. 45) against Ensembl Build 38.92. All variants were additionally annotated with their gnomAD MAFs (gnomAD v2.1.1 mapped to GRCh38)43.

To further apply quality control to the sequence data, all MCPS and PGR exomes underwent a second screening using AstraZenecas bioinformatics pipeline, which has been described in detail previously46. Briefly, we excluded from the analysis sequences that had a VerifyBamID freemix (contamination) level of more than 4%, those for which inferred karyotypic sex did not match self-reported gender or those for which less than 94.5% of the consensus coding sequence (CCDS release 22) achieved a minimum tenfold read depth. We further removed one individual from every pair of genetic duplicates or monozygotic twins with a kinship coefficient of >0.45. Kinship coefficients were estimated from exome genotypes using the kinship function from KING v2.2.3 (ref. 47). For the MCPS, we additionally excluded sequences with an average CCDS read depth of at least 2 s.d. below the mean. After the above quality control steps, 139,603 (99.0%) MCPS and 37,727 (99.3%) PGR exomes remained.

For the MCPS, we predicted the genetic ancestry of participants using PEDDY v0.4.2 (ref. 48), with 1000 Genomes Project sequences as population ref. 49, and retained individuals with a predicted probability of admixed American ancestry of 0.95 who were within 4 s.d. of the means for the top four PCs. In the PGR study, we retained individuals with a predicted probability of South Asian ancestry of 0.95 who were within 4 s.d. of the means for the top four PCs. Following ancestry filtering, 137,059 (97.2%) MCPS and 36,280 (95.5%) PGR exomes remained.

We assessed the association of BMI and weight quantitative traits with genotype at the four proposed new genes of interest using a previously described gene-level collapsing analysis framework implementing a PTV collapsing analysis model46. We classified variants as PTVs if they had been annotated by SnpEff as follows: exon_loss_variant, frameshift_variant, start_lost, stop_gained, stop_lost, splice_acceptor_variant, splice_donor_variant, gene_fusion, bidirectional_gene_fusion, rare_amino_acid_variant and transcript_ablation.

We applied MAF filters to target rare variants: MAF <0.001 in gnomAD (overall and every population except OTH) and leave-one-out MAF <0.001 among our combined case and control test cohort. For variants to qualify, they had to also meet the following quality control filters: minimum site coverage of 10; annotation in CCDS transcripts (release 22); at least 80% alternative reads in homozygous genotypes; a percentage of alternative reads for heterozygous variants of 0.25 and 0.8; a binomial test of alternative allele proportion departure from 50% in the heterozygous state result of P >1106; GQ of 20; FS of 200 (indels) or 60 (SNVs); MQ of 40; QUAL of 30; read position rank sum score of 2; MQRS of 8; DRAGEN variant status =PASS; and test cohort carrier quality control failure of <0.5%. If the variant was observed in gnomAD exomes, we also applied the following filters: variant site achieved tenfold coverage in 25% of gnomAD exomes; variant site achieved exome z-score of 2.0; exome MQ of 30; and random forest probability that the given variant is a true SNV or indel of >0.02 and >0.01, respectively50.

For the quantitative traits and for each gene, the difference in mean between the carriers and noncarriers of PTVs was determined by fitting a linear regression model, correcting for age and sex. In addition to calculating individual statistics for the MCPS and the PGR study, we also meta-analyzed the individual study effect sizes to generate a combined replication statistic using an inverse variance-weighted fixed-effect meta-analysis using the rma.uni() function from the metafor package v3.8-1 (ref. 51) in R v3.6.3.

To test whether there was an association between pLOF variants in the BSN gene and severe early-onset obesity, we studied 927 exomes from white British participants with severe early-onset obesity recruited to the Genetics of Obesity Study (GOOS) (SCOOP cohort) and 4,057 control exomes from the INTERVAL cohort of UK blood donors. SCOOP comprises UK patients with severe obesity (BMI more than 3 s.d. above the mean for age and sex) of early onset (<10 years) recruited to the GOOS. Exome sequencing in a subset of people of white British ancestry (the SCOOP cohort) was performed as described previously52,53,54. INTERVAL comprises predominantly healthy blood donors in the UK55 (https://www.intervalstudy.org.uk).

SCOOP and INTERVAL variants were joint-called and filtered for variant-level and sample-level quality control, as previously described52. A total of 927 cases (SCOOP) and 4,057 controls (INTERVAL) passed the quality control filters53. After splitting multiallelic variants and left normalizing, we annotated variants using VEP with Ensembl v96 (GRCh37) and identified high-impact variants (predicted protein-truncating, null or splice-disrupting) in the gene BSN (transcript ENST00000296452) using VEP IMPACT=HIGH. This definition includes stop-gain variants (SNVs resulting in stop codons), frameshifts and splice donor/acceptor variants. We verified that the predicted consequences and stop codon positions were maintained in the latest minor version of the transcript (ENST00000296452.5, NM_003458.4) using VEP v110 after lifting over to GRCh38. Missense variants were detected in almost all BSN exons among SCOOP exomes (7/10 coding exons) and INTERVAL exomes (8/10 coding exons), suggesting that BSN stop-gain detection rates in cases and controls are unlikely to be driven by differential read coverage within the BSN gene.

The one PTV identified in INTERVAL (p.Trp3926*) is located at the final amino acid of the bassoon protein and is therefore unlikely to affect expression levels (note that the LOFTEE in silico stop-gain filter for low-confidence loss of function based on the 50-bp rule does not apply to the BSN gene because the termination codon is itself >55 bp from the final exonexon boundary56). After excluding this variant on the basis of low confidence for loss of function, we performed a nested gene-burden analysis on the remaining three variants: n =3 pLOF carriers in SCOOP and n =0 carriers in INTERVAL controls (OR (95% CI) = inf (1.8inf), P =0.006, Fishers exact test; adding +0.5 to each cell, OR =31). Studies in vitro are required to establish the effect of each stop-gain variant on bassoon protein expression levels and localization.

We included binary and quantitative traits made available in the June 2022 UK Biobank data release, harmonizing the phenotype data as previously described46. This resulted in 11,690 phenotypes for analysis, which are available on https://azphewas.com. On the basis of clinical relevance, we derived three additional phenotypes.

For UK Biobank phenome-wide analyses of the four putatively new genes, the same data generation and quality control processes described for the MCPS and PGR study were applied to UK Biobank exomes. Following the Regeneron and AstraZeneca quality control steps, 445,570 UK Biobank exomes remained. The phenome-wide analysis was performed in UK Biobank participants of predominantly European descent, whom we identified based on a PEDDY-derived predicted probability of European ancestry of 0.95 and were within 4 s.d. of the means for the top four PCs. On the basis of predicted ancestry pruning, 419,391 UK Biobank exomes were included in the phenome-wide analyses of the four prioritized genes.

As described previously, we assessed the association of the 11,693 phenotypes with genotypes at the four genes of interest, using a PTV collapsing analysis model46, and classifying variants as PTVs using the same SnpEff definitions as described for the MCPS and PGR analyses. For variants to qualify for inclusion in the model, we applied the same MAF and quality control filters used in the MCPS and PGR analyses, with the exception that due to the larger sample size of UK Biobank, only <0.01% of the test cohort carriers were permitted to fail quality control.

We ran association tests of APBA1 and BSN HC PTV carriers and carriers of a BMI-associated common variant (rs9843653) at the BSN locus with a list of anthropometric phenotypes available in UK Biobank using R v3.6.3 (Supplementary Table 5), including the same covariates we used in our exome-wide gene-burden tests. We acquired normalized protein expression data generated by the Olink platform from the UK Biobank RAP23,24. The detailed Olink proteomics assay, data processing and quality control were described by Sun et al.23. For the association tests of APBA1 and BSN PTV carriers and BMI-associated common variant (rs9843653) at the BSN locus carriers with expression levels for 1,463 proteins, we added age2, agesex, age2sex, Olink batch, UK Biobank center, UK Biobank genetic array, number of proteins measured and the first 20 genetic PCs as covariates, as suggested by Sun et al.23. We chose the Bonferroni-corrected P value (P <3.42105 (0.05/1,463)) as the threshold for significance.

Identified genes were queried for proximal BMI GWAS signals, using data from UK Biobank, for signals within 500 kb upstream of the genes start site to 500 kb downstream of the genes end site. Such signals were further replicated in an independent BMI GWAS9.

We also performed colocalization tests, using the approximate Bayes factor method in R v4.0.2 using the package coloc v5.1.0 and blood gene expression data from the eQTLGen study16. Genomic regions were defined as the regions 500 kb around each gene, and loci exhibiting an H4 posterior probability of >0.5 were considered to show evidence of colocalization.

Finally, we used the GWAS data to calculate gene-level common variant associations, using MAGMA v1.09 (ref. 57). To do this, we used all common but nonsynonymous (coding) variants within a given gene. Gene-level scores were further collapsed into pathway-level associations where appropriate.

To examine whether there is an interaction effect between PTV carrier status for BSN and APBA1 and the PGS, we included an interaction term between the PGS and the carrier status for BSN and APBA1 PTVs in a linear regression model adjusted for sex, age and age2, and the first 10 PCs.

The PGS was constructed for 419,581 individuals of white European ancestry who had both genotype and exome sequencing data and a BMI record in UK Biobank. We used summary statistics of BMI from Locke et al.9, which included samples not in UK Biobank. Data were downloaded from the GIANT consortium. The summary statistics included 2,113,400 single-nucleotide polymorphisms (SNPs) with at least 500,000 samples in a cohort of 322,154 participants of European ancestry. For the genotype data of UK Biobank participants, a light quality check procedure was applied, where SNPs were removed if they had a MAF of <0.1%, HardyWeinberg equilibrium P <110-6 or more than 10% missingness. In addition, SNPs that were mismatched with those in the summary statistics (with the same rsID but different chromosomes or positions) were excluded. We used the package lassosum v4.0.5 (ref. 58) in R v3.6.0 to construct the PGS. The R2 of the model including the PGS regressed on rank-based inverse normal-transformed BMI and adjusted for sex, age and age2, and the first 10 PCs as covariates was 11%.

A detailed description of the methods used in cellular work and single-cell analyses can be found in the Supplementary Note.

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

The rest is here:
Protein-truncating variants in BSN are associated with severe adult-onset obesity, type 2 diabetes and fatty liver disease - Nature.com

Related Posts